This is a summary of the work done on initiatives by the Community Platform Engineering (CPE) Team in Red Hat. Each quarter, the CPE Team—together with CentOS and Fedora community representatives—chooses initiatives to work on in the quarter. The CPE Team is then split into multiple smaller sub-teams that will work on chosen initiatives, plus the day-to-day work that needs to be done.
Following is the list of sub-teams in this quarter:
- Infra & Releng
- CentOS Stream/Emerging RHEL
- Datanommer/Datagrepper
- DNF Counting
- Metrics for Apps on OpenShift
Infra & Releng
About
The purpose of this team is to take care of day-to-day business regarding CentOS and Fedora Infrastructure and Fedora release engineering work. It’s responsible for services running in Fedora and CentOS infrastructure and preparing things for the new Fedora release (mirrors, mass branching, new namespaces etc.). This sub-team is also investigating possible initiatives. This is done by the Advance Reconnaissance Crew (ARC), which is formed from the Infra & Releng sub-team members based on the initiative that is being investigated.
Issue trackers
Documentation
Members of sub-team for Q3 2021
- Mark O’Brien (Team Lead) (Fedora Operations, CentOS Operations) (mobrien)
- Michal Konecny (Agile Practitioner) (Developer) (zlopez)
- Kevin Fenzi (Fedora Operations) (nirik)
- Fabian Arrotin (CentOS Operations) (arrfab)
- Tomas Hrcka (Fedora Release Engineering) (humaton)
- Lenka Segura (Developer) (lenkaseg)
- Emma Kidney (Developer) (ekidney)
- Ben Capper (Developer) (bcapper)
What the sub-team did in Q3 2021
Fedora Infrastructure
In addition to the normal maintenance tasks (reboots, updates for security issues, creating groups/lists, fixing application issues) we worked on a number of items:
- Cleaned up nagios checks to stop alerting on swap on hardware machines
- Moved the vast majority of our instances to use linux-system-roles/networking to configure networking via ansible
- Got broken openqa-p09-worker02 back up and working with a lot of firmware upgrades and help from IBM techs.
- Archived off ~35TB of space from our netapp to a storinator
- Got zodbot (our IRC bot) moved to python3 and pointed to the new account system
- Upgraded the wiki to the latest stable version.
- Fixed an issue with OSBS building 0ad, needed a larger than default container.
- Setup new fedora matrix hosted server rooms/etc.
- Started on EPEL9 setup, mirroring centos9stream buildroot content, etc
- Got vmhost-x86-copr04’s motherboard replaced and back in service.
- Kinoite website deployed
CentOS Stream
- prepared the new mirror network to accept CentOS Stream 9
- modified koji/cbs.centos.org to allow building for CentOS Stream 9, including new tags
- importing 9-stream content
- modified SIG process to include/support stream 9 for modified requirements (directory layout, included sources and debuginfo vs what we had before )
- prepare the needed infra for AWS for EC2 testing and replication across all regions for CentOS Stream 9 images
CentOS common/public infrastructure
- converting all deployed CentOS Linux 8 to CentOS Stream 8
- relocated the armhfp community builders to other DC/hardware
- started investigation about migrating from Pagure 5.8 on CentOS 7 to Pagure 5.13 on CentOS Stream 8
- created https://docs.infra.centos.org doc website, and working in pairing mode to share infra knowledge within the team
- collaboration with artwork SIG to prepare some *.dev* variants of websites to have a “playground” to test Ansible role changes directly and then having corresponding PR for deployments in .stg. and then prod
- Business As Usual (BAU)
- koji tags creation
- hardware issues to fix/follow
CentOS CI infrastructure
- updated openshift to 4.8.x stable branch
- moved/onboarded new tenants on CI infra
- moved some workload in CI infra for better resiliency and backup plans
- expanded the existing cloud.cico (opennebula) infra with new hypervisors (x86_64)
- reorganized the slow nfs storage box (out of warranty) with raid10 layout to speed up/help with containers in openshift (for PersistentVolumes)
Fedora Release Engineering
While taking care of day to day business like nightly composes, package retirements and unretirements, new scm requests and occasional koji issues, we worked on new Fedora release.
- Mass rebuild of rpms and modules in Fedora Rawhide
- Branching of Fedora 35 from Rawhide
- Fedora Linux 35 Beta release
ARC
Investigated upgrading the Frontend Web UI for the CentOS mailing list. The investigation came to the conclusion that Mailman3, Postorius and Hyperkitty would need to be packaged for EPEL8. A new server would need to be deployed with the current CentOS mailing list migrated to it.
CentOS Stream/Emerging RHEL
About
This initiative is working on CentOS Stream/Emerging RHEL to make this new distribution a reality. The goal of this initiative is to prepare the ecosystem for the new CentOS Stream.
Issue trackers
Documentation
Members of sub-team for Q3 2021
- Brian Stinson (Team Lead) (bstinson)
- Adam Samalik (Agile Practitioner) (asamalik)
- Aoife Moloney (Product Owner) (amoloney)
- Carl George
- James Antill
- Johnny Hughes
- Mohan Boddu (mboddu)
- Merlin Mathesius
- Stephen Gallagher (sgallagh)
- Troy Dawson (tdawson)
- Petr Bokoc (pbokoc)
What the sub-team did in Q3 2021
One thing we tackled was enabling side tag builds for Fedora ELN. Initially, we wanted to implement proper side tags for ELN, but we eventually settled for a simpler approach where we tag the Rawhide builds in, and then rebuild them in ELN. This ensures that we get all the packages built in ELN, with the Rawhide build as a backup should it fail in ELN. And we can even use this as a health metric for ELN — how many ELN packages are actually ELN builds.
For CentOS Stream 9, we have cloud images in AWS available. You can get it by searching for “centos stream 9” in AWS, and to make sure you get the latest you can add this month (so “202110” for October 2021).
Also, CentOS Stream 9 repositories are now available through mirrors using a meta link. Existing systems get this set up automatically with an update, as the centos-release package will include this metalink. This will take some load off the CentOS infra and potentially even make your updates faster.
Datanommer/Datagrepper
About
Goal of this initiative is to update and enhance Datanommer and Datagrepper apps. Datanommer is the database that is used to store all of the fedora messages sent in the Fedora Infrastructure. Datagrepper is an API with web GUI that allows users to find messages stored in Datanommer database. Current solution is slow and the database data structure is not optimal for storing current amounts of data. And here is when this initiative comes into play.
Issue trackers
Application URLs
Members of sub-team for Q3 2021
- Aurelien Bompard (Team Lead) (abompard)
- Aoife Moloney (Product Owner) (amoloney)
- Ellen O’Carroll (Product Owner)
- Ryan Lerch (ryanlerch)
- Lenka Segura (lsegura)
- James Richardson (jrichardson)
- Stephen Coady (scoady)
What the sub-team did in Q3 2021
Datanommer and Datagrepper have been upgraded to use TimescaleDB, an open-source relational database for time-series data. TimescaleDB is a PostgreSQL extension that takes care of sharding the large amount of data that we have (and keep generating!), and maintains an SQL-compatible interface for applications.
Datagrepper and the Datanommer consumer are now running in OpenShift instead of dedicated VMs.
DNF Counting
About
DNF Counting is used to obtain data on how Fedora is consumed by users. The current implementation experiences timeouts and crashes when the data are obtained. This initiative is trying to make the retrieval of counting data more reliable and efficient.
Issue trackers
Documentation
Members of sub-team for Q3 2021
- Nils Phillipsen (Team Lead) (nils)
- Aoife Moloney (Product Owner) (amoloney)
- Ellen O’Carroll (Product Owner)
- Adam Saleh (asaleh)
- Patrik Polakovic
- With special shout-out to Stephen Smoogen that provided vital fixes even though he wasn’t officially part of the initiative
What the sub-team did in Q3 2021
Scripts that create the statistics for https://data-analysis.fedoraproject.org/ were cleaned up and refactored, making them stable enough, so that they don’t require more manual intervention.
The code on https://pagure.io/mirrors-countme/ now has tests running in CI and is packaged as an rpm to avoid further mishaps in package installation. The deployment scripts were cleaned-up as well, alongside the actual deployment on log01 machine, with it’s hard-to-track manual interventions for last minute bug-fixes replaced by ansible-scripts.
Cron-jobs that run the batch-jobs now only send notification emails on failures and to see the overall health of the batch-process you can see the simple dashboard on – https://monitor-dashboard-web-monitor-dashboard.app.os.fedoraproject.org/
Metrics for Apps on OpenShift
About
Goal of this initiative is to deploy OpenShift 4 in Fedora Infrastructure and start using Prometheus as a monitoring tool for apps deployed in OpenShift. This initiative should also define what metrics will be collected.
Issue trackers
Documentation
Members of sub-team for Q3 2021
- David Kirvan (Team Lead) (dkirwan)
- Aoife Moloney (Product Owner) (amoloney)
- Ellen O’Carroll (Product Owner)
- Vipul Siddharth (siddharthvipul1)
- Akashdeep Dhar (t0xic0der)
What the sub-team did in Q3 2021
- Infrastructure prep work to install Red Hat CoreOS on nodes for OpenShift Container Platform (OCP)
- Deployed OCP4.8 in staging and production
- Configuring cluster with OAuth, OpenShift Container Storage (OCS) and other important needed operators/configs to support Fedora workloads
- Automate the process of OCP deployment with Ansible
- Deployed and configure the User Workload Monitoring stack
- Investigate app migration from older cluster to new
Epilogue
If you get here, thank you for reading this. If you want to contact us, feel free to do it in #redhat-cpe channel on libera.chat.
I’m looking at the first graph: https://data-analysis.fedoraproject.org/csv-reports/images/fedora-os-latest.png. “fed35” is reported as growing to 900k unique IPs and then dropping to 200k and staying there. What’s the story behind this?
No mention was made of the rpmautospec initiative. It seems stalled: it has been deployed in our infra, but functionality that is crucial to support more complicated packages is not being handled. Issues and pull requests go months without even a single comment from the project owners… Is reviving this on the agenda?
The rpmautospec was not an initiative CPE team worked on in Q3, this is why it’s not mentioned in this report.
That makes sense, but, hmmm. We need a way to support these kind of things on an ongoing basis. I agree that rpmautospec is crucial to the future — a PR workflow is basically impractical without it.
That’s not F35 — it’s “unknown release”, which just happens to be the same color. I’m not exactly sure what’s going on there, but it could be a lot of things, including possibly misfiled EPEL requests.
Note that that chart isn’t DNF Better Counting data. It’s the old IP/day method, and the graphs are of the raw data.
If you’re interested in looking at that, rather than the raw graphs except when actually looking for data problems, I recommend running Overview - velociraptorizer - Pagure.io, which has (ugly but functional) smoothing for known (and usually explained) weirdnesses in the data.
I’m not sure we really should expose the raw charts in this way — they’re more confusing than helpful in my opinion.
The countme data, by the way, is at Index of /csv-reports/countme, in CSV and Sqlite formats. This requires more processing to visualize — my scripts to do that are in progress. (I have a logic bug I really need to resolve before sharing…)
It would be nice to have the graphs online for the new countme stats… (Or was there an ipython notebook somewhere?)
Yeah, see Will Woods’ work here: GitHub - wgwoods/fedora-countme-data: Fedora "countme" data, plus docs & examples of how to analyze/graph it
Note that countme reports independently for each repo, so Will’s approach is to look each week at the highest repo for a given pattern. My approach is to just only look at
fedora-updates
for most things. The difference isn’t really significant, and I feel more like I know what’s going on that way.My scripts to graph from the new data are at Tree - brontosaurusifier - Pagure.io, but as mentioned they’re not really in a good state (see the TODO list). I hope to make some time to work on them. Definitely before DevConf.cz. Maybe before Christmas.
It’d be great to have a page with interactive graphs showing that data. There’s a lot of good stuff in those repos. Somebody with good knowledge of the python plotting frameworks could probably make something useful out of this quite quickly…
I am open to help from any volunteers. I have a pretty strong idea of what I want, but not a lot of time to implement it.