Sponsored By

Debugging Live Games: Finding and Fixing Issues in Live Games Before Users Find Them

Most top grossing mobile games today are f2p & make money from ongoing operation. However technical issues arise often & do big damage to user retention & monetization. This article addresses how most technical issues can be solved before users find them.

Michael Turner, Blogger

September 8, 2014

20 Min Read

Today, the games that see the biggest success on mobile, console, and PC app stores are treated as persistent online entertainment services that are always updated and live. The developers behind these games make use of analytics tools to understand their users and work to deliver a constant stream of targeted content that keep users engaging and monetizing well.

Illustrative example of how developers work to understand users in order to improve their experience

 

Developers who can maintain this process of continued improvement over a long-term have the best chances of keeping user interest and achieving success in today's competitive market.

 

However, success brings challenges. As a game scales, the severity of technical challenges increases and can cause service interruptions (bugs, crashes, downtime, etc.) that disturb user gameplay. If these challenges are not managed properly, the resulting technical issues can cause significant damage to key retention, engagement, and revenue numbers and greatly diminish the level of success a game can achieve.

 

lori-graph-3.png

DeltaDNA cites technical issues as one of the leading barriers to engagement

 

In this article will examine how to use logging analytics to manage a game's technical challenges so that users have a smooth & uninterrupted gameplay experience.

 

Logging Analytics: A Tool for Monitoring System Health & an Maintaining Uninterrupted Gameplay Experience

In a game’s lifetime, umanaged technical issues can an add up to thousands or millions in lost revenue and users. Therefore, developers optimally want to implement tools that give them a detailed picture of their system so that they can detect issues as soon as they occur and fix them quickly. There are many tools for this, but one of the most essential ones is logging.

 

For those that don't know, logging is the practice collecting select data in log format about the game client and/or server application's behavior (crashes, exceptions, HTTP response time, etc.), any external services it interacts with, and the hardware and network it resides on. Properly instrumented, logging can tell you in granular detail what is happening within your system and give you an indication whether it is healthy or has technical issues which need to be attended to.

 

 

The (Poor) Usage of Logging Analytics in the Game Industry Today

Today many developers, even larger ones, don't use logging or don't manage it well. As a result, they still often detect technical issues only when game KPIs drop or when users report problems to them.

 

 

 

 

Finding out about error through your users is a bad bad BAD way to do things, but with proper logging, it is avoidable. Below we’ll illustrate how to use server and client log aggregation in order to move from being the last to find out about your system issues to being the first.

 

How to Instrument Logging to Prevent Issues Before They Affect Users

Logs show you where the issue is

When an error occurs anywhere in your system, the cause of that error can almost

always be found in system logs generated by the client or server.

System log example

System log example

 

How you collect logs will depend on what languages and frameworks you’re using to develop your game. Most often, developers make use of existing logging libraries such as LOG4J (Java) and LOG4NET (.NET) in order to send logs to their chosen destination (console, file, Loggly) via RESTful or SYSLOG protocols.  

 

What exactly developers will want to log varies from game to game, but in general they will want to think about logging the following general information.

 

  • Information about the performance and health (CPU usage, memory allocation, etc.) of the servers your game exists on

  • Information about your server-side application’s performance and behavior

  • Information about your database operation

  • Information about your client’s behavior, the state of the device it’s being used on,  client-side network conditions, and code that interacts with your server application.

  • If your game is client-only, you should still log crashes, exceptions, and select information on the application’s behavior to help you quickly determine what’s causing bugs and performance issues in your application.

 

 

The problem: There Are LOTS of Logs, It’s Difficult to Find The Right One

Most games generate gigabytes or terabytes of log data that need to be stored and searched to find an issue. For most game developers, properly storing and searching gigabytes of logs in-house requires allocating separate servers and personnel to create a data model for storing all of these logs and custom search tools get the required log data out. It costs a lot in hardware & manpower and generally does not provide the speed required to stay ahead of system issues.

Log Management Tools Speed Up Issue Response Time

Using tools like Loggly (an SaaS log management tool) you avoid the need to manage logs in-house. Tools like Loggly will allow you to centralize your logs from both game clients and game servers in one place and provide you the following tools.

 

  • Log Organization: Logs will automatically be organized by what agent they were sent by (JSON.Error, JSON.Response, JavaGC, Syslog, etc.) OR by custom parameters you define

  • Log search: Ability to search all logs with custom search parameters to identify the causes of issues

  • Visualization: Visualize log results to spot trends

  • Log Monitoring & alerts: Monitor logs automatically and alert developers when issues arise

 

These tools allow your team to find & fix system issues MUCH quicker and in general, maintain a good picture of system health.

 

How to Use Logging to Prevent Issues That Negatively Effect Retention, Engagement, and Monetization

Below we will overview the best practices for implementing logging for games to ensure each step in a user's lifetime is error-free. Loggly will be our reference tool as we step through each part of the user experience.

 

Retention - Monitor Your Initial Experience

When acquiring new users, you want to ensure you’re monitoring logs from the systems responsible for your initial experience.

 

Systems often related to the initial experience include:

  • Servers that serve initial gameplay: Monitor any servers key to game loading and initial user experience. This includes any CDN, lobby or proxy servers used.

  • Initial tasks: Game loading time and game events that occur in the first five minutes

Long-Term Retention & Engagement - Monitor Game Events and User Data Management

Technical issues that cause committed players to churn out of the game are usually related to an inability for them to perform their favorite actions or to them losing data related to in-game items or status they’ve worked hard to earn. To prevent this, you’ll want to log:

 

  • Server-side game events: Log information about server-side game events key to user experience such as error exceptions, response time, or sync errors.

    • Player synchronization & PvP interaction issues: If your game has real-time elements or PvP play, you want to log key client and server events surrounding this. This area is a major source of frustration for users when it fails.

    • Task completion & reward failures: Log server and client events surrounding rewards. Not being rewarded after task completion is a major user complaint.

  • Game client logs: Log any interruption of user experience on the client side  such as game crashes, game reloads, or client side exceptions. Many times, you can match up this client-side data with back-end problems.

  • Game database servers: Monitor the general health of database servers very closely as any loss of user data causes huge retention issues. Also, log events that have to do with data transactions, data consistency, and data integrity.  

Monetization - Monitor Monetization Logic and Data

After a user has paid, they’re likely to spend more in the future, so the first paid conversion is important. It is also important that if a user spends, their item is delivered properly.

  • Transaction failures: Log transaction failures on the front end and any back-end verification steps.  

  • Currency & item data: Sometimes when users spend, they will often get money removed from their account but not receive the items they paid for.  

 

  • Transaction failures: Log transaction failures on the front end and any back-end verification steps.  

  • Currency & item data: Sometimes when users spend, they will often get money removed from their account but not receive the items they paid for.  

 

Third Party Integrations - Log Your Social & Third-party Integrations

Third-party integrations put your reliability in the hands of an outside party’s service. It’s important to log any of your own systems connected to these integrations and consume any logs the third party tool provides. If any third-party tool you use communicates with outside servers, log response times from these servers.

 

Game Scaling - Monitor for Performance Degradation

When a new game gets popular, the amount of users playing your game will swell to tens or hundreds of thousands of daily active users; users your game’s back-end architecture will need to support.  Unless you’ve proven your server architecture can scale, it will likely experience performance issues as your user-base grows.

 

To predict scaling issues, you’ll want to manage the following:

  • Server Load: memory allocation, CPU usage

  • User-facing variables: response times, error exceptions, stack-traces, timeouts, etc.

 

Four Steps to Integrate Logging Into Your Processes

1. Define Healthy Performance & Enforce Internal Service-Level Agreements.

For your entire game’s architecture, you should define what healthy performance looks like in each component of your system. This healthy performance should be re-enforced with internal SLAs that keep your team focused on the right goals.

 

2. Make Use of Alerts to Inform You of Issues Immediately.

Define custom alerts. Set these alerts to monitor any error or violation of internal SLAs so that when they happen, your operations team knows there’s an issue immediately and can solve it.

 

3. Use Logging Before and After Updates.

When an update is pushed live, system state should be checked before and after the update to ensure no issues have been introduced.  

 

4. When Things Show Up in Analytics, Cross-Reference System Logs.

If performance issues show up in your game, cross-reference your system logs.

 

Conclusion: Try an Experiment

The best way to test the benefit of logging is see if it makes a difference in your game’s numbers. Here are the first steps you can take down this path.

 

1. Send Logs from Your Existing Problem Areas

Loggly can be integrated for free and an integration that will begin measuring meaningful  log data takes a few hours at most.  Most people send logs within just 20 minutes.  Identifying a few areas where you have historically had technical problems and send those logs.

 

2. Provide Access to Logging Tools to Your Developers & Operations Team & Let Them Monitor These Logs

Give every team member access to the logging tools.

 

3. Measure Changes

After a 1-2 weeks, determine how quickly issues were solved before & after the integration and examine your behavioral analytics to determine if your KPIs have improved.


If you see improvements, consider a deeper integration of logging & good log management practices into your process!

Read more about:

Blogs
Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like