About two years ago the Fedora Engineering team merged with the CentOS Engineering team to form what is now called the Community Platform Engineering (CPE) team. For the team members, the day to day work did not change much.
The members working on Fedora are still fully dedicated to work on the Fedora Project, and those working on CentOS are still fully dedicated to CentOS. On both projects its members are involved in infrastructure, release engineering, and design. However, it brought the two infrastructures and teams closer to each other, allowing for more collaboration between them.
There are 20 people on this consolidated team. The breakdown looks like this:
- In Fedora:
- 3 dedicated system administrators
- 5 dedicated developers
- 1 doing both development and system administration
- 1 doing both release engineering and system administration
- 1 person dedicated to Fedora CoreOS
- 2 release engineers
- 1 person dedicated to documentation
- 1 designer
- In CentOS:
- 1 system administrator
- 2 doing both development and system administration
- 1 dedicated to the build systems
- There is also one additional person working on projects internal to Red Hat
So as you can see, of the CPE team itself is composed of 19 people working on Fedora or CentOS of which there are only 7 system administrators and 7 dedicated developers. There are no dedicated database administrators and no dedicated network engineers even though most of the tools use a database as a backend or need sophisticated network tools for clustering, or both.
This team was under the supervision of a single manager, Jim Perrin, last year. But the team is too big for a single manager, so, earlier this year the team got an additional manager, Leigh Griffin.
Leigh is new to Fedora and CentOS, so he started by looking to see what services/applications we are running. The outcome of this research was quite impressive:
This team of 19 persons is maintaining 112 services!
And 590 physical machines (140 for Fedora, 450 for CentOS) and 516 virtual machines (486 for Fedora, 30 for CentOS)
As you can imagine, this means we are quite swamped and that we do not have many cycles to take up new things (technology stack, applications, onboarding…). In addition, developers are split across multiple applications. This creates a situation where they often work alone, with little cross-knowledge and many single point of failure. Finally, we have to acknowledge that the number of people required to properly maintain all of these services has grown much faster than our ability to make the team grow.
So, in order for this team to improve, have fewer points of failure, increased reliability, be better upstream for the application we maintain, and maximize the value we bring to our communities, we need to change how we work.
This week, (June 10th to June 14th) the CPE team is meeting, face to face, to discuss what and how we can change the way we work. In the next article we will share the outcome of these discussions.