Framework upgrade, Monday Sept. 23rd

Bubbles, FC Sunnyvale Developer 23 September 2019, 10:16

Update 16:40 GMT

Several times today we put a server running the new version into the rotation, waited a couple of minutes to collect data from various crashes, and then removed it again while we fixed the issues. This has been working quite well, and hopefully it has meant that most of you haven't noticed any errors.

We will leave the site running just the old version during the night, so that we won't have to worry about everything crashing, and then we'll continue where we left off tomorrow morning.

As I have mentioned before, during the past several months we have had a number of major tasks that we have unfortunately had to work on concurrently.

Last month's server relocation was one of those tasks, and over the next few days another one of them will fall into place as we upgrade the framework that the game runs on. We will be upgrading from Ruby 2.3.8 to Ruby 2.4.7, and from Ruby on Rails 3.2 to Ruby on Rails 4.2. Ruby is the programming language and Rails is the framework.

It is important to get this upgrade done for several reasons.

First off, our version of the framework is now so old that it no longer receives security updates. Addressing is obviously of critical importance.

Second, especially in connection with the development of youth squads, we have found that the old framework has been slowing us down and made several things more difficult to develop than if we had had the newer version.

And, whenever we need to roll out a change to the site, the old version of the framework takes more than 20 minutes to pre-compile all of our stylesheets and other assets, whereas with the new version it takes less than 1 minute.

However, the new version of the framework has a lot of "breaking changes". This means that we have had to make very extensive changes to our own code in order to run on the new version of the framework, and the update today is the result of several months of work.

We do not expect any significant downtime, but since it's a huge update that affects virtually every part of the game, it cannot be avoided that errors will occur in the coming days.

What do we do to reduce errors?

Our first measure against errors is our suite of automatic tests. Automatic tests are small pieces of code that we have written over the years to test many different parts of the game's functionality. Each test sets up a scenario, runs a small portion of the code with this data, and then checks if the result is what we expected. With a single command, we can ask the system to run all the tests and give us an overview of which tests are failing.

It's impossible to test everything, but so far we have a collection of 1590 automatic tests, which check 5060 different things in the code. But even if all these tests are green, it doesn't mean that the code is free of bugs - it just means that out of all the things that can go wrong, there are at least 5060 things that won't ;)

In our server setup we have 3 smaller web servers behind a load balancer, that distributes the load. Each time you click on a link or perform an action, the task is performed by the least burdened web server.

We will utilize this for today's update. A couple of weeks ago, we set up 3 new web servers on which we tested the new version. When we start the roll-out, we will begin by adding 1 of the new servers to the load balancer, so that it has 3 servers with the old version and 1 with the new version. That way, only 25% of all requests coming in will be handled by the server with the new version. That means that we can introduce the new version in a more gentle way, and if you experience an error, it may disappear if you try again and your request is handled by one of the old servers instead.

Once it seems that no more errors are coming in, we can then add more of the new servers into the rotation and remove the old ones.

We will follow the same procedure with the simulator servers, which are 6 separate servers that share the load.