Phew, what a kerfuffle!

Bubbles, FC Sunnyvale Developer 5 October 2021, 13:41

I have mentioned it before: when the blog is quiet, it's because we're working on boring technical or administrative things that are not visible in the game as new features.

These are often tasks that are pushed on us by forces beyond our control, and which force us to drop everything and attend to them immediately. These are things that ruin my planning and make me unable to predict when the next new game feature will arrive.

Throughout 2021 and into the latter half of 2020, we've been hit by an almost endless number of these kinds of tasks, and it's been rather frustrating.

There are also things like optimizations, reporting and security which are necessary to ensure the game's future in the long term.

I don't usually write about these things because they are often boring or very technical. But, it seems that we're now finally over the hump, and since there's been so little to write about for a long time, I'd like to tell yo about some of the things we've been working on.

Finishing off youth squads

The beta test of youth squads ended late summer of last year, and we spent the following weeks on bug fixes and additions, such as separate VIFA rank for the youth leagues.

Integration of error messages in Slack

Slack is a chat system that I and my freelance developer use for our daily communication. Because I wanted to keep on top of the small, rare bugs, and also be able to respond quickly to sudden disasters, we implemented the integration of many different types of error messages into our Slack chat.

Exceptions, failed background jobs, and congestion of the servers now give us direct messages in Slack and I get a notification on my phone so that I can investigate what is happening no matter where I am.

Many small, rarely occurring errors, of the type that typically only affect 1 user at a time, have been corrected because we're now notified of them immediately.

Fixed old, rare bugs in the tactics editor

Over the years, we have occasionally heard from managers who discovered that they had been playing without a full lineup for some time, and insisted that they were sure they had set a complete tactic.

But, peoples' memory is notoriously unreliable, and if we have nothing to go on but an allegation of something that happened a week ago - without any details - then it is impossible to investigate. Sometime people also try to get some kind of compensation for something they know is their own fault by claiming that there was a bug.

It was something we only rarely heard about and it was impossible to recreate, and therefore we could never confirm that there was an actual bug.

The breakthrough came when our admin, FC Razor, experienced the problem himself, noticed it right away and was able to tell me the exact action he had performed with that player.

This gave me a suspicion and after looking through the code I formed a theory about what the problem might be and after many attempts I was able to recreate it.

The problem turned out to occur when you let go of 1 player on top of another. In this case, the javascript application sent 2 requests to our servers:

  1. Remove Jensen from position X
  2. Insert Hansen in position X

However, because two requests sent over the Internet at the same time can take different routes, we cannot be 100% certain that they'll arrive in the order they were sent. In rare cases, the message sent last may arrive first. In addition, we have a load balancer which assigns incoming requests to one of our 3 web servers in order to distribute the load. Therefore, it can also happen that the 2 requests are executed by different web servers, and if the server receiving the first message has a slightly longer queue, then the message sent last could be executed first. Fractions of a second can make the difference.

This kind of thing is of course impossible to detect in development, because we each have our own copy of the game running on our own local machine, so the circumstances that lead to it, aren't present.

So, in rare cases, it could happen that the message to put Hansen in position X was attempted first, which the server would refuse, since Jensen was still in that position. A fraction of a second later, the message to remove Jensen would then arrive, and be carried out - with the result that your lineup would be missing one man. At the same time, you wouldn't notice it because the javascript application was never intended to handle such an error, so graphically it would look like nothing was wrong.

The solution was that we rewrote parts of both the javascript application and the backend, so that an exchange of 2 players is only sent as 1 message: "Swap Jensen and Hansen". We found several situations where something similar could occur because the application sent 2 messages, and made the same correction here.

We also added a fallback solution that simply reloads the tactics editor if the application receives an error message from the server that it doesn't know how to handle.

It took a lot of work, but at long last we fixed a mysterious, rare bug that had been haunting us for many years.

Cookie consent

In the autumn, new rules came from the advertising industry, which forced us to change our cookie consent and advertising system. We had to implement a new cookie selector, and it required different solutions depending on what choices were made and whether it was for Denmark or abroad.

The Danish ads disappeared!

Immediately after we implemented the above, our Danish ads suddenly disappeared.

Both I and our advertising provider initially thought it was because of the new cookie consent system, but it turned out to be a pure coincidence that it happened right after. The systems used in advertising sales are incredibly complex, and involve online advertising exchanges and auctions and many different parties. Therefore, it was very difficult to find out what happened.

Eventually we found out that it was because a major player in the market had blacklisted us from their advertising platform that our provider uses. It happened because of an old case regarding some inappropriate material someone had posted in a user forum almost 9 years ago.

Why that case suddenly reappeared and why they couldn't find our old appeal is still a mystery, but we had to make lots of changes to the control and validation of user-generated content, including:

  • acceptance of rules at signup
  • additions to the forum rules
  • acceptance of forum rules the first time you write in a forum
  • acceptance of forum rules when creating user forums
  • removal of ads from most pages that contain user-generated content
  • automatic cleaning of user forums
  • login required to view most user-generated content
  • disabling of user profiles, etc. of clubs that have been banned or have been inactive for a long time
  • general safeguards against the ability to produce and access user-generated content anonymously

I then had to submit a long account of all these initiatives, as well as those we took at the time 9 years ago, describe how we moderate the different kinds of user-generated content - and then wait for a decision.

Those months without Danish ads cost us dearly, as Danes make up about 80% of our users, but we finally got them back in December.

Christmas calendar

Just before December, I set aside some time for us to finally introduce a new fun Christmas feature - something that has been missing for many years.

The code for the different types of gifts behind the doors was implemented bit by bit at the start of the month, since not all of them could be ready by December 1st.

We also continuously implemented lots of fixes and improvements to the calendar and its UI / design.

Switch to new card payment system

At the beginning of the year, our payment provider informed us that they would soon discontinue their old payment system and associated API due to new requirements from the card providers, and that we therefore had to switch to their new system.

This required a complete rewrite of our entire card payment system, delaying us at least a month.

Mysterious performance issues

We also had to deal with a mysterious performance issue that had plagued the site for some time and always occurred on Sunday night after the season update (but not during).

The problem was mysterious because neither the web servers nor the database seemed to be particularly congested. But we started investigating and optimizing all the places we could find where the code could be improved.

We use the NewRelic service for live analysis of how the application is performing, and it gives us detailed insight into how the code runs on each page lookup. Based on this information, we set about optimizing all the places where it looked like there was something to be gained.

One of the methods we used was fragment caching. Fragment caching works by storing fragments of a page in RAM, thus saving the processing and database lookups required to generate that part.

For example, the league table is now cached, and the same HTML snippet is used in 5 different places on the site from RAM - it is just styled differently with CSS in the different places where it appears. Each time the league position changes, the cache is refreshed.

The trick is to find the places where caching makes sense, and then make sure that the cache is always fresh. The latter gave us some problems for a while because we've never used fragment caching before and there are many pitfalls that can cause you to show outdated data from the cache.

We could see that our optimizations worked and response times became faster overall. But, even after all these optimizations, the problem on Sundays persisted, and was still a mystery.

Therefore, we set up an additional server, where we implemented a system that continuously retrieves and merges the log files from the 3 web servers, analyzes them, and shows us a live dashboard with an overview of a lot of different data points about where and how the site is being burdened.

We then finally found out that the problem was due to a 3rd party site, which seemed to go crazy with a lot of erroneous requests on Sundays after the end of the season due to the way our cup pages would look until the new cup fixtures were posted Monday morning.

It's just fine when someone makes 3rd party sites that can collect different useful statistics, but of course it's unacceptable if they bog down our servers. Therefore, I temporarily blocked them in our firewall while we worked on a solution. Although the problem came from the outside, it was also less than optimal that the cup pages were vulnerable to this kind of load. Therefore, we implemented more performance optimizations, more caching, and a safeguard against that type of lookup burdening the servers that much.

All in all, implementing performance improvements, fragment caching and live dashboard delayed us by a few months.

However, it was not exactly a waste of time, because we gained useful experience with fragment caching, which I've long wanted to start utilizing, and the dashboard will definitely help us in the future. But, it did come at a very inconvenient time on top of all the other delays.

Hack attempt

In the spring, we were hit by a hack attempt, which lasted for several weeks.

We could see that this was the type where someone had acquired a list of stolen logins from other sites, which they then tried to log in to Virtual Manager with.

There are shady places on the web where you can buy lists of millions of logins, which have been acquired from various hacked sites around the world. Because many people use the same login on several different sites, you are at risk if your login ends up on one of these lists.

The attack came from a botnet - that is, thousands of infected machines from all over of the world - which can be rented for a relatively small fee. Then it's a small matter to have them bombard a site with login attempts, where you just go down the list of stolen logins, and hope that you are lucky to find one that works.

Because the attack came from so many different machines, it was difficult to block without disturbing legitimate users. The challenge was to continually tune our firewall rules to block or delay as many suspicious login attempts as possible.

We did not have much experience with this sort of thing, so it took some time and lots of constant changes to the rules before we were able to repel the attack in a way that didn't cause too much inconvenience to you guys.

Although the attack was not aimed at breaking our technical security, but simply taking advantage of people's careless handling of passwords, it still gave me kind of a fright. So after the worst was over, we spent a few weeks reviewing our technical security once again, further hardening ourselves against a future attack.

Although there are already several layers of security around our servers, we added another in the form of a middleware component which examines and possibly rejects requests before they even make it through to the Virtual Manager application itself. This middleware currently performs rate-limiting of logins, thus limiting to how many times one can try to log in from the same IP address in a certain period. It also continuously retrieves an updated list of IP addresses belonging to data centers and blocks login attempts from there.

We also generated new, long passwords for all inactive clubs, so that there would be less chance of success if someone were to try the same thing in the future.

All in all, this attack cost us about 1.5 months of development time.

I highly encourage you to use a unique code for your Virtual Manager account, instead of one you have used elsewhere.

New features related to trading

Despite all the hassles, we still managed to introduce some new game features this spring.

Among other things, the wish list was refreshed and the "standing bid" feature was added.

All of the new stuff was described in this blog post.

Switch to new SMS payment system

Then, yet another one of our providers informed us that they were shutting down their old system and that we had a short deadline to move to a new one. This time it was SMS payments, which we offer in Denmark, and again it required a complete rewrite.

It turned out to be exceedingly difficult to work with, and it led to yet another long delay in my development plans.

Refreshing the odds game

In connection with Euro 2020, we wanted get the odds game going again. However, as it had been some years since it had last been used, there was some old code that did not work with our upgraded framework. So these things had to be fixed first, after which we refreshed a lot of other things in the game, as well as implemented some more automation, so that it did not require nearly as much manual work to run.

Ad-free subscription - VM Pro

Throughout this period, we've also been working on VM Pro, but the work has been interrupted again and again. All the functionality has now been finished for a while, but has been waiting for us to make the necessary changes in the store to be able to sell.

New Framework Agreement on SMS payments from 4T

After we had already rewritten our entire SMS payment system once, we then had to do it again.

This time it was because the Danish association of telecommunications companies, 4T, had entered into a new Framework Agreement on SMS payments, which required further rewriting and a new procedure for SMS payments.

These changes were deployed last week, but we are still missing some refinement as the procedure is a bit of a hassle. But, we had a deadline to get it done, so we had to get it deployed.

The near future

It seems that we are now FINALLY over the big hump of annoying tasks that others have imposed on us, and which time and time again have ruined my development plan. In the near future, the plan looks like this:

Support for VM Pro in the shop

We have now finally returned to the task of adapting the shop to be able to sell VM Pro - and it's just around the corner.

Release and follow-up on VM Pro

Once the shop is ready, we will release VM Pro. Then we'll need to spend some time following up and correcting any mistakes, and maybe see if we can come up with more small, fun things that can be added to Pro.

Framework upgrade

Before we start working on responsive design, we will first upgrade our framework (Ruby on Rails) to the latest version (6.1).

At first glance, it doesn't look like it should present huge challenges, but it always requires some adjustments and a lot of testing.

Blog posts, communication

I know there are some who want me to write more often here in the blog, but I don't think it makes sense to write about framework agreements, cookie consent, fragment caching, and log analysis.

There will also be long periods in the future without many new game features. That's just how it is; it cannot be avoided.

And when we are buried under a mountain of the kind of tasks I've written about here, then blog posts are the last thing on my mind.

Blog posts do not make development go faster.

But, I recognize the need for a bit more life in the game - it just can't be based solely on new game features.

Therefore, I am exploring the possibility of getting someone to help me structure my communication both with you and with the crew, and to create events and buzz in the game, in ways that don't require a lot of extra development.

After VM Pro, I will post a discussion about these things on the forum, to hear if you really want to hear about technical details in the blog, and what things that could make it exciting to log in each day, even when there are no new game features for a period of time.