CPE Quarterly Update Q3 2021

This is a summary of the work done on initiatives by the Community Platform Engineering (CPE) Team in Red Hat. Each quarter, the CPE Team—together with CentOS and Fedora community representatives—chooses initiatives to work on in the quarter. The CPE Team is then split into multiple smaller sub-teams that will work on chosen initiatives, plus the day-to-day work that needs to be done.

Following is the list of sub-teams in this quarter:

Infra & Releng
CentOS Stream/Emerging RHEL
Datanommer/Datagrepper
DNF Counting
Metrics for Apps on OpenShift

Infra & Releng

About

The purpose of this team is to take care of day-to-day business regarding CentOS and Fedora Infrastructure and Fedora release engineering work. It’s responsible for services running in Fedora and CentOS infrastructure and preparing things for the new Fedora release (mirrors, mass branching, new namespaces etc.). This sub-team is also investigating possible initiatives. This is done by the Advance Reconnaissance Crew (ARC), which is formed from the Infra & Releng sub-team members based on the initiative that is being investigated.

Issue trackers

Documentation

Members of sub-team for Q3 2021

Mark O’Brien (Team Lead) (Fedora Operations, CentOS Operations) (mobrien)
Michal Konecny (Agile Practitioner) (Developer) (zlopez)
Kevin Fenzi (Fedora Operations) (nirik)
Fabian Arrotin (CentOS Operations) (arrfab)

Tomas Hrcka (Fedora Release Engineering) (humaton)
Lenka Segura (Developer) (lenkaseg)

Emma Kidney (Developer) (ekidney)
Ben Capper (Developer) (bcapper)

What the sub-team did in Q3 2021

Fedora Infrastructure

In addition to the normal maintenance tasks (reboots, updates for security issues, creating groups/lists, fixing application issues) we worked on a number of items:

Cleaned up nagios checks to stop alerting on swap on hardware machines
Moved the vast majority of our instances to use linux-system-roles/networking to configure networking via ansible
Got broken openqa-p09-worker02 back up and working with a lot of firmware upgrades and help from IBM techs.
Archived off ~35TB of space from our netapp to a storinator
Got zodbot (our IRC bot) moved to python3 and pointed to the new account system
Upgraded the wiki to the latest stable version.
Fixed an issue with OSBS building 0ad, needed a larger than default container.
Setup new fedora matrix hosted server rooms/etc.
Started on EPEL9 setup, mirroring centos9stream buildroot content, etc
Got vmhost-x86-copr04’s motherboard replaced and back in service.
Kinoite website deployed

CentOS Stream

prepared the new mirror network to accept CentOS Stream 9
modified koji/cbs.centos.org to allow building for CentOS Stream 9, including new tags
importing 9-stream content
modified SIG process to include/support stream 9 for modified requirements (directory layout, included sources and debuginfo vs what we had before )
prepare the needed infra for AWS for EC2 testing and replication across all regions for CentOS Stream 9 images

CentOS common/public infrastructure

converting all deployed CentOS Linux 8 to CentOS Stream 8
relocated the armhfp community builders to other DC/hardware
started investigation about migrating from Pagure 5.8 on CentOS 7 to Pagure 5.13 on CentOS Stream 8
created https://docs.infra.centos.org doc website, and working in pairing mode to share infra knowledge within the team
collaboration with artwork SIG to prepare some *.dev* variants of websites to have a “playground” to test Ansible role changes directly and then having corresponding PR for deployments in .stg. and then prod
Business As Usual (BAU)
- koji tags creation
- hardware issues to fix/follow

CentOS CI infrastructure

updated openshift to 4.8.x stable branch
moved/onboarded new tenants on CI infra
moved some workload in CI infra for better resiliency and backup plans
expanded the existing cloud.cico (opennebula) infra with new hypervisors (x86_64)
reorganized the slow nfs storage box (out of warranty) with raid10 layout to speed up/help with containers in openshift (for PersistentVolumes)

Fedora Release Engineering

While taking care of day to day business like nightly composes, package retirements and unretirements, new scm requests and occasional koji issues, we worked on new Fedora release.

Mass rebuild of rpms and modules in Fedora Rawhide
Branching of Fedora 35 from Rawhide
Fedora Linux 35 Beta release

ARC

Investigated upgrading the Frontend Web UI for the CentOS mailing list. The investigation came to the conclusion that Mailman3, Postorius and Hyperkitty would need to be packaged for EPEL8. A new server would need to be deployed with the current CentOS mailing list migrated to it.

CentOS Stream/Emerging RHEL

About

This initiative is working on CentOS Stream/Emerging RHEL to make this new distribution a reality. The goal of this initiative is to prepare the ecosystem for the new CentOS Stream.

Issue trackers

Bugzilla tracker

Documentation

CentOS documentation

Members of sub-team for Q3 2021

Brian Stinson (Team Lead) (bstinson)
Adam Samalik (Agile Practitioner) (asamalik)
Aoife Moloney (Product Owner) (amoloney)
Carl George
James Antill
Johnny Hughes
Mohan Boddu (mboddu)
Merlin Mathesius
Stephen Gallagher (sgallagh)
Troy Dawson (tdawson)
Petr Bokoc (pbokoc)

What the sub-team did in Q3 2021

One thing we tackled was enabling side tag builds for Fedora ELN. Initially, we wanted to implement proper side tags for ELN, but we eventually settled for a simpler approach where we tag the Rawhide builds in, and then rebuild them in ELN. This ensures that we get all the packages built in ELN, with the Rawhide build as a backup should it fail in ELN. And we can even use this as a health metric for ELN — how many ELN packages are actually ELN builds.

For CentOS Stream 9, we have cloud images in AWS available. You can get it by searching for “centos stream 9” in AWS, and to make sure you get the latest you can add this month (so “202110” for October 2021).

Also, CentOS Stream 9 repositories are now available through mirrors using a meta link. Existing systems get this set up automatically with an update, as the centos-release package will include this metalink. This will take some load off the CentOS infra and potentially even make your updates faster.

Datanommer/Datagrepper

About

Goal of this initiative is to update and enhance Datanommer and Datagrepper apps. Datanommer is the database that is used to store all of the fedora messages sent in the Fedora Infrastructure. Datagrepper is an API with web GUI that allows users to find messages stored in Datanommer database. Current solution is slow and the database data structure is not optimal for storing current amounts of data. And here is when this initiative comes into play.

Issue trackers

Application URLs

Datagrepper

Members of sub-team for Q3 2021

Aurelien Bompard (Team Lead) (abompard)
Aoife Moloney (Product Owner) (amoloney)
Ellen O’Carroll (Product Owner)
Ryan Lerch (ryanlerch)
Lenka Segura (lsegura)
James Richardson (jrichardson)
Stephen Coady (scoady)

What the sub-team did in Q3 2021

Datanommer and Datagrepper have been upgraded to use TimescaleDB, an open-source relational database for time-series data. TimescaleDB is a PostgreSQL extension that takes care of sharding the large amount of data that we have (and keep generating!), and maintains an SQL-compatible interface for applications.

Datagrepper and the Datanommer consumer are now running in OpenShift instead of dedicated VMs.

DNF Counting

About

DNF Counting is used to obtain data on how Fedora is consumed by users. The current implementation experiences timeouts and crashes when the data are obtained. This initiative is trying to make the retrieval of counting data more reliable and efficient.

Issue trackers

Mirrors CountMe issue tracker

Documentation

DNF Counting Documentation

Members of sub-team for Q3 2021

Nils Phillipsen (Team Lead) (nils)
Aoife Moloney (Product Owner) (amoloney)
Ellen O’Carroll (Product Owner)
Adam Saleh (asaleh)
Patrik Polakovic
With special shout-out to Stephen Smoogen that provided vital fixes even though he wasn’t officially part of the initiative

What the sub-team did in Q3 2021

Scripts that create the statistics for https://data-analysis.fedoraproject.org/ were cleaned up and refactored, making them stable enough, so that they don’t require more manual intervention.

The code on https://pagure.io/mirrors-countme/ now has tests running in CI and is packaged as an rpm to avoid further mishaps in package installation. The deployment scripts were cleaned-up as well, alongside the actual deployment on log01 machine, with it’s hard-to-track manual interventions for last minute bug-fixes replaced by ansible-scripts.

Cron-jobs that run the batch-jobs now only send notification emails on failures and to see the overall health of the batch-process you can see the simple dashboard on – https://monitor-dashboard-web-monitor-dashboard.app.os.fedoraproject.org/

Metrics for Apps on OpenShift

About

Goal of this initiative is to deploy OpenShift 4 in Fedora Infrastructure and start using Prometheus as a monitoring tool for apps deployed in OpenShift. This initiative should also define what metrics will be collected.

Issue trackers

Metrics for Apps Board

Documentation

Sysadmin Guide

Members of sub-team for Q3 2021

David Kirvan (Team Lead) (dkirwan)
Aoife Moloney (Product Owner) (amoloney)
Ellen O’Carroll (Product Owner)
Vipul Siddharth (siddharthvipul1)
Akashdeep Dhar (t0xic0der)

What the sub-team did in Q3 2021

Infrastructure prep work to install Red Hat CoreOS on nodes for OpenShift Container Platform (OCP)
Deployed OCP4.8 in staging and production
Configuring cluster with OAuth, OpenShift Container Storage (OCS) and other important needed operators/configs to support Fedora workloads
Automate the process of OCP deployment with Ansible
Deployed and configure the User Workload Monitoring stack
Investigate app migration from older cluster to new

Epilogue

If you get here, thank you for reading this. If you want to contact us, feel free to do it in #redhat-cpe channel on libera.chat.

Comments

zbyszek says:

I’m looking at the first graph: https://data-analysis.fedoraproject.org/csv-reports/images/fedora-os-latest.png. “fed35” is reported as growing to 900k unique IPs and then dropping to 200k and staying there. What’s the story behind this?
zbyszek says:

No mention was made of the rpmautospec initiative. It seems stalled: it has been deployed in our infra, but functionality that is crucial to support more complicated packages is not being handled. Issues and pull requests go months without even a single comment from the project owners… Is reviving this on the agenda?
zlopez says:

The rpmautospec was not an initiative CPE team worked on in Q3, this is why it’s not mentioned in this report.
mattdm says:

That makes sense, but, hmmm. We need a way to support these kind of things on an ongoing basis. I agree that rpmautospec is crucial to the future — a PR workflow is basically impractical without it.
mattdm says:

That’s not F35 — it’s “unknown release”, which just happens to be the same color. I’m not exactly sure what’s going on there, but it could be a lot of things, including possibly misfiled EPEL requests.

Note that that chart isn’t DNF Better Counting data. It’s the old IP/day method, and the graphs are of the raw data.

If you’re interested in looking at that, rather than the raw graphs except when actually looking for data problems, I recommend running Overview - velociraptorizer - Pagure.io, which has (ugly but functional) smoothing for known (and usually explained) weirdnesses in the data.

I’m not sure we really should expose the raw charts in this way — they’re more confusing than helpful in my opinion.
mattdm says:

The countme data, by the way, is at Index of /csv-reports/countme, in CSV and Sqlite formats. This requires more processing to visualize — my scripts to do that are in progress. (I have a logic bug I really need to resolve before sharing…)
zbyszek says:

It would be nice to have the graphs online for the new countme stats… (Or was there an ipython notebook somewhere?)
mattdm says:

Yeah, see Will Woods’ work here: GitHub - wgwoods/fedora-countme-data: Fedora "countme" data, plus docs & examples of how to analyze/graph it

Note that countme reports independently for each repo, so Will’s approach is to look each week at the highest repo for a given pattern. My approach is to just only look at fedora-updates for most things. The difference isn’t really significant, and I feel more like I know what’s going on that way.

My scripts to graph from the new data are at Tree - brontosaurusifier - Pagure.io, but as mentioned they’re not really in a good state (see the TODO list). I hope to make some time to work on them. Definitely before DevConf.cz. Maybe before Christmas.
zbyszek says:

It’d be great to have a page with interactive graphs showing that data. There’s a lot of good stuff in those repos. Somebody with good knowledge of the python plotting frameworks could probably make something useful out of this quite quickly…
mattdm says:

I am open to help from any volunteers. I have a pretty strong idea of what I want, but not a lot of time to implement it.

Infra & Releng

About

Members of sub-team for Q3 2021

What the sub-team did in Q3 2021

Fedora Infrastructure

CentOS Stream

CentOS common/public infrastructure

CentOS CI infrastructure

Fedora Release Engineering

ARC

CentOS Stream/Emerging RHEL

About

Members of sub-team for Q3 2021

What the sub-team did in Q3 2021

Datanommer/Datagrepper

About

Members of sub-team for Q3 2021

What the sub-team did in Q3 2021

DNF Counting

About

Members of sub-team for Q3 2021

What the sub-team did in Q3 2021

Metrics for Apps on OpenShift

About

Members of sub-team for Q3 2021

What the sub-team did in Q3 2021

Epilogue

Like this:

About the Author

zlopez

Comments

Continue the discussion at discussion.fedoraproject.org

Participants

Code of Conduct

Recent Posts

Recent Comments