The Fedora Project has instances in a number of datacenters and clouds all over the world, but a majority of instances are in a datacenter located in Virginia, USA. This datacenter space, along with the majority of servers in it, were generously provided by our primary sponsor, Red Hat. We moved to our current space from another Red Hat datacenter back in 2020, and now it’s time to move again.
So why would we want to move? Well, there’s a number of reasons:
- We have expanded to fill all the physical rack space available. This leaves no room to expand capabilities, like RISC-V builders, etc..
- We are hitting power limits. Several of our racks are close to the point where if one of the two power circuits went down, some machines would power off abruptly.
- Many of our machines were purchased during the previous data center move in 2020, and this new move will provides another opportunity to invest in more power efficient, faster, and denser hardware.
After a bunch of discussion and planning, we will be moving to a new datacenter near Raleigh, NC. This site will give us room to expand and has much more available power, allowing for higher densities.
The good news – most of the new hardware has already been purchased! We plan to install and set up the new hardware in the new datacenter, logically switch to the new site with slightly temporarily diminished capacity ( mostly in staging ).Then, we will ship the newer machines from the old datacenter to the new one, bringing everything back to greater than 100% capacity.
Our goal is to complete this move over the course of a few weeks, and have everything back up with greater capacity than before, and with as minimal impact to the project as possible.
We are looking at mid May to do the switchover, after Fedora 42 has been released. Timing is still tentative, but we will provide more detailed information as the plan dates solidify. Our next key milestone is to use the Beta Go/No-Go as an indicator that this is the best time to execute the move. At the end of this transition we expect everybody’s experience will be faster builds, faster tests, and to have room for further expansion in the future.
We should talk about this on the Fedora Podcast!
Is the power used generated from fossil or renewable/green?
If we ignore how we allocate resources acrosss the wider grid, and such things as paid for ‘green on paper’ energy - looks like it is largely natural gas and nuclear powered North Carolina Electricity Generation Summary
Harris Nuclear plant is just outside of town beside Harris Lake. Wake County, NC Electricity Generation Summary
I’d love to, but later when things are less busy.
Excellent question. I don’t know, but I will ask!
Can we consider - if it’s too late for this move, then for the next one - distributing the key infrastructure, especially those that have proven to be capacity bottlenecks, across multiple data centers?
eg during a move, keep the old DC, bring up new ones in the new DC, then retool the old one, rinse and repeat and you have one extra DC presence in time.
Reliability wise this will let us survive outages affecting only one DC (builds will spend more time queuing, and some would be lost, but availability will be better) - as is best practice in the cloud world.
cc @blc
It would be good to do this especially with our OpenShift applications. Having an OpenShift cluster in only one DC doesn’t really help us when there are DC problems. It’d be great to be able to support multi-DC geographical access, failover, etc.
And maybe having multiple DCs will make us think more about stuff like caching by design.
Well, it’s definitely out this time.
Vacating the existing DC so others can use that space/power was part of
this migration. Additionally, many of our machines there are pretty old,
so if we did keep them there they would need a refresh anyhow, and thats
a pretty expensive operation.
While it’s a nice idea, it’s not as easy as you might think.
In particular we have databases and builds that would have to be in sync
accross WAN links. While you can do so of course, all the times the wan
link was down would add a LOT of pain. Do you have split brain? Do you
take one down?
So … yeah, if this is seen as the goal, the components that are difficult to run distributed should probably be replaced over time
It’s really too bad CockroachDB is now totally not open source - it would have helped with the database replication issues
As for what to do in case of a partition - I suppose if you have 2 DC designate one as a primary, if you have 3, if two agree that another is down then one of them take over? It’s not as easy as flipping a switch, certainly (I didn’t mean to imply it is)