Proof positive that Dice is indeed testing backend servers during the Beta

there are more than a handful of people here and elsewhere claiming that Dice couldn't possibly be testing servers as they claim because there's no destruction, yada yada yada. 

I found this blogpost by Mikael Kalms.  From what I can gather, Dice changed Database engines to address the issues below, so their explanation that they need to test backend servers would seem to be valid.    

 

What follows is a description of how the stats backend functions for BFBC2, what happens during high load, and what we are doing to resolve it. Consider it a peek 'under the hood' of BFBC2.


System overview

When playing online, all game clients and game servers are permanently connected to the game's backend servers.

There is a separate backend for each of the PC/PS3/360 versions of BFBC2.

A backend is split into two portions - one group of machines which run some custom software, and a database. The database is not directly accessible by game clients/servers; they can only reach it by sending requests to the custom software portion, which in turn talks to the database.

Each database is a cluster of machines which run Oracle 9i with RAC enabled.

There are a few modules in the backend, and a few tables in the database, which are shared between multiple platforms / titles. Those are generally rather low-intensity processes. However those have to be cared for if one wants to perform changes to the physical configuration of the machines that run the backend.


Stats

A stat is a short identifier with an accompanying value. Stats are tracked for each player, and they are saved between game sessions. For BC2 there are approximately 2000 unique stats values. Some of the stats have a direct meaning - your current score with a specific kit, number of kills with a specific weapon and so on - whereas other stats are meaningless on their own and track your progress toward various achievements/trophies, pins and insignias.

The stats are kept in a couple of big tables in the Oracle database.


Game client and stats

The game client only reads from the stats database; it never writes.

Stats reads happen on two occasions: when a player logs in, and when a player exits from a server back to the main menu. The client has a local cache of all stats. When one of the two previous events occur, the game client requests a handful stats (for instance the, the player's total score and accumulated online playtime). If any of those stats are different from the locally cached values, the game client goes out and grabs all stats (approximately 2000 values).

The game client uses these stats to display information in the main menu. It is not used in-game in multiplayer.


Game server and stats

The game server reads and writes to the stats database.

When a player enters a server, the server requests approximately 1000 stats for that player from the database. Anything that has to do with stats and ranks is controlled by the server (for instance, which weapons are unlocked for a specific player).

The server writes back a player's stats when the player leaves the server. Also, all players' stats are written to the database at the end of each round. This is to minimize the risk that player progress is lost because of a server crash. When writing stats, the server will only write those stats that have changed. In addition, whenever possible the server will issue commands like "add 3 to stat named ABCD" rather than "write 27 to stat named ABCD". This minimizes the risk that any bugs in the code or network communications problems will trample stats; the worst that can occur is that a stat is not increased, it will not get lowered or set to zero inadvertently.

Usually the game client will write a lot less than 1000 stats. I don't have figures at hand, but perhaps 100 stats are usually updated after a player has played a full round.


High load scenarios and the backend

Normally the database responds to the custom software's read/write queries very quickly. The database can service requests from a couple of game clients/servers in parallel; if there are too many requests made at once, new ones are put into a queue. Normal turnaround time for retrieving 2000 stats is approximately a second. Requesting 2000 stats takes a bit more time than requesting 1000 stats - probably about twice as long. The database completes the queued-up entries as quickly as it can.

The requests do not come in a steady flow however. Sometimes many servers and clients will ask for stats data at nearly the same time. The database will then service some of those requests a bit slower than usual.

The database is the weaker portion of BFBC2; that is, the custom software can handle more players being active simultaneously, than the database can.

If the clients/servers are doing a lot of requests to the database over a long period of time, then the backlog of queries in the database's queue will get longer and longer. When the queue is so long that the database is unable to service queries in 10 seconds, the custom software will give up on those queries and respond with an error to those clients/servers.


High load scenarios and the game client/server

With the above in mind, let's imagine what happen when the number of simultaneous players increases.

At first, there are not a lot of players. The database will handle any requests quickly and its queue is nearly empty all the time.

As the number of players go up, the database will still be able to keep up with most requests. However, occasionally a lot of servers/clients will happen to perform stat requests at nearly the same time. This causes the queue to fill up a bit more than usual. Some of those queries will then time out when they hit the 10 second cutoff. Since clients normally request more data, it is usually the game client's requests that fail first.

If the game client's request fails, the game client will attempt to retrieve stats for 10 or 20 seconds - and then give up, and the game's main menu will claim that the player is Rank 1 and has zero score etc.

As the load increases further, the game server read requests will also fail more often. When game server read requests fail, the players which are affected will play with rank 1 and no stats-related unlocks. When this happens, the game server will not record & write back progress for the affected players either.

Finally, with a really high load, all requests from game clients & game servers will fail.


High load versus too high load

One important thing to notice about some online systems and load, is that the load does not behave like you would intuitively expect it to. Usually it rises slowly... until it gets to a certain point, and then it all spirals out of control and horror ensues. There are several reasons for this.

One is the human factor: When the load is at such a level that stats requests are failing intermittently, it appears to the player like he/she has lost all his/her progression, but either logging in/out (in the case of no stats in the main menu) or disconnecting/reconnecting (in the case of no stats in the game) has a % of chance to get stats back. People will then naturally do this over and over until they either get stats, or are frustrated enough to give up. This behaviour will cause more load on the backend than normal gameplay behaviour, which worsens the problem overall.

Another can be in the code; sometimes game client/game server code is written to retry a couple of times when an operation fails. This is a good thing when the backend is not under high load - after all, the error might be due to a momentary hiccup. However, when the load is high this will make the problem worse (in just the same way as the "human factor example").

There are also some things happening in the background on databases - like backups, or regularly scheduled maintenance / dataprocessing jobs.

This means that some online systems can seem to be running fine, with a steady load, and then something happens and within minutes they grind to a halt.

How well-behaving the system is depends on what functions it performs, and the behaviours of the users of the system.

BFBC2's custom backend software is well-behaved in most respects. The database suffers a bit from the problems described above - the step between "players are occasionally not getting stats" and "players are never getting stats" is smaller than theory would predict.


A closer look at the database itself

Somehow the stats database used to handle considerably more players back when it launched than now. In other words, reads/writes against the database takes more time to complete. There are two main reasons for this.
  • There are stats for much more players in the database now than back when we started. Databases are good and servicing requests like, "give me the contents for user with ID=1234, it is somewhere in that huge table", but performance does go down as the tables grow in size.
  • The tables themselves are becoming fragmented. Several years and several games ago, when the database administrators designed the database setup for the system, they asked what the priorities were for the database. The response was -- runtime performance; the database should be setup to be able to service as many reads/writes per second as possible. One deliberate tradeoff of the highest-performance setup they could create was that the database would gradually acquire small gaps in it. These gaps would not get reclaimed automatically. The amount of "lost" space in the database would grow over time, and after a while the lost space would result in performance loss (due to disk caches not being as efficient anymore). This is sorted out by taking the database offline once every couple of months and rebuilding it - thereby squeezing out all the gaps. However, due to some reason these regular rebuilds have not been happening for any BFBC2 title.


Defining the problem

The problem we will tackle is the following: the current player population is suffering from stats outages. That shouldn't be happening. Stats should be reliable with roughly the player numbers that we have now, plus a bit of headroom. We will not attempt to make it handle 100.000 concurrent users on a single backend.


Tackling the problem

One can attempt to make individual database accesses faster.

Taking the database offline, and rebuilding the tables.
This is certain to help. That is also the first thing that we will do. (And schedule new rebuilds whenever necessary in the future.)

Making disk cache sizes larger.
Memory is faster than disk, so if more of the database is kept in memory then accesses will go faster.
The PC and 360 database clusters have as much memory as is possible. The PS3 cluster has room for more memory though.
We will add it.

Redesigning the tables.
The table layout is not designed specifically for BFBC2; the same design is used by many other EA titles. Changing the design would improve performance for most requests by a fair bit. However, the time required for getting such a modification implemented, tested, and live is far too long.
We will therefore not do it.

Adding more machines to the database clusters.
One might think that doubling the number of machines in a database cluster will also double the performance of a cluster. In reality, all those machines need to coordinate their work with each other. Therefore, adding more machines only helps sometimes. In some cases, performance actually gets worse.
We will therefore not do it.

Moving to a newer Oracle version or another database altogether.
Again, the turnaround time for doing this to a live system is far too long.
We will therefore not do it.


Or one can reduce the amount of database accesses.

Making game clients request fewer stats.
The game client is already doing a small fetch before doing a full fetch (in case score/time or a couple other stats have changed). If the client doesn't update all the stats in its cache, the main menu will not be able to show the player's ingame progression correctly. It is perhaps possible to split the stats fetching into two portions - one portion for showing the most important stuff in the main menu (in the case of BC2 PC, the stats-related items in the main screen), and another portion for showing all the achievements/trophies etc.
It is under consideration.

Making the game servers cache stats for players.
The servers could have a cache like the game clients, but cache stats for many different players. This would help with people who play near-exclusively on one server. It is doubtful if it would make a difference (I don't have statistics on this, just guessing).
We will not do it.

Making the game servers request fewer stats.
Fetching fewer stats will make the game server unable to evaluate the full player progression.
We will therefore not do it.

Making the game servers write fewer stats.
If the game servers would write stats to the backend at each Nth round instead of at each round, then there would be fewer unique stats written. There is a tradeoff here - is there a risk that players lose their progression due to server crashes? - but N=2 or N=3 keeps both risk and impact very small.
We have already implemented this change for both consoles, and will implement it for PC.

Once one set of changes is in place, we will then reassess the situation. Etc.
 

Discussion Info


Last updated July 3, 2018 Views 1 Applies to:

* Please try a lower page number.

* Please enter only numbers.

* Please try a lower page number.

* Please enter only numbers.

Very interesting read! Has almost nothing to do with how they are inefficiently testing the servers (We're talking about the flawed amount of data they're testing for the gameplay, not that they aren't testing stat-tracking, which I'm sure they are), but it was a good explanation of Battlefield: Bad Company 2's system.

-

What it does explain, however, is how Battlefield: Bad Company 2 had a very flawed stat counter. I always wondered why it was so poorly coded, so this makes perfect sense. Time and time again it would count the wrong hours, wrong weapons used, wrong kills/deaths, etc. or just record nothing at all.

-

Thanks!

Apparenty backend db function plays a bigger role than you might think.  What you want addressed is the netcode, which is something else entirely.

Heres a thread -  http://forums.electronicarts.co.uk/battlefield-3-pc/1397402-frostbite-2-netcode-info-dice-plox-comment.html

And the one I pulled the db info from:  forums.electronicarts.co.uk/.../1387445-stats-system-performance-perspective.html  

I realy DO try to be informed, Matt.  

TLDR, but we'll know how it plays soon enough.

Nice read but I fail to see how this proves anything. While it is clear they are (did) did develope a database (backend) that will help track user states, menu items and game state; I see little information about true game play data. The question people are asking is even if they get the current beta running smooth now, how will it be changed once you add vehicles, destruction an many other things to the mix. The issues (concerns) people are having are not about tracking state and earning ribbons, it is about lag and other possible limitations the online play will have

Like Matt said you are still testing very limited data with a completly new engine.

^^The beta you are playing is a two or so month old build. Meaning thats all from two months ago (the glitches, etc) all has been fixed. The beta is where the devs had the game at several months back. (or a piece of the game)............../facepalm. You guys read some of the threads around here?

After reading the link it appears this information only relates to stat tracking in a game based on the old engine. While tracking these does take server resources, it does little to address the concerns some people are having.

We can just hope it all goes ok

Uneasy

Three questions

1. What does you comment have to do with what I said? I simply said they are testing old code now and the op does not

   answer the concern some of us have about testing server load with old (limited functioning) code

2. Why does the age of the beta keep changing, people claim anything from 1 to 8 months? Dice themselves say its only  

      1 month

3. (since you discuss code age) If all this is fixed why don't they fix the little things (like the big hole near point a)? For having all the bugs fixed one would think they would have patched the beta a little

What else you want for free? The beta is fun and good enough to play around on. If you dont like it, dont buy the game. I dont know the exact "age" of the beta, but I do know its where the game was "at least" a month ago, "cuz thats wat i hurd". BTW, you were saying something about getting the current beta running smooth or something like that and I was stating its an old build. Chill, peace out.

Uneasy

  As I always said I plan on getting the game. My point about how smooth the beta runs has to do with server load. Dice has said they are testing server load, yet they are doing this with old, glitchy code with limited game features; it is only natural that some will show concerns about how this will impact their testing. This is what I meant when I said, even if this beta runs smoothly how are they sure the final code will.

* Please try a lower page number.

* Please enter only numbers.

* Please try a lower page number.

* Please enter only numbers.