The ability to visualize, analyze and interpret data is a key competency in driving commercial innovation and success and maximizing return on investment (ROI) and operational efficiency. Understanding your user base and key performance indicators (KPI) requires tools that will empower you to insightfully analyze the data. Businesses in general and in the gaming industry specifically collect data on a scale previously unseen due to technological breakthroughs and the decreasing costs of disk space and cloud services.
Faced with these vast amounts of data, where should one start the analysis? For a basic exploratory analysis many choose to use Excel’s statistical and data manipulation tools. Not only are they available as part of the Microsoft Office suite, but the spreadsheet functionality means that most users find it easy to use. Excel is no doubt a relatively powerful yet simple tool for any analyst or data scientist, whether it is used for visualization solely, pivot table functionality or conducting basic t-tests.
However, there are a number of drawbacks to using Excel; the amount of data it can handle is limited (a typical Excel installation is limited to 1 million lines) and there is hardly any flexibility over the methods used and the output given. One may decide to enhance Excel capabilities by using VBA, but nevertheless you will be limited by its inherent boundaries. There are more sophisticated tools available; the finance and banking industry predominantly uses SAS, and for those in market research, the adoption of SPSS is common. However, if you speak to data scientists and statisticians you will generally find that R is their main software of choice.
What is R and what makes it unique? R is designed specifically for statistical computing and visualization. It is free and open source. R provides a flexible analysis toolkit where all of the standard statistical techniques are built-in. The greatest advantage of all, as I see it, is its ‘R Community’, which regularly contributes new functionality through add-on plug-ins.
Another area where R excels is communicating your data and results to facilitate data driven decision making. R has been developed with visualization in mind and one can select from a wide array of charts, graphs and plots, including links to Google Chart Tools and GoogleVis. Another very useful package for creating elegant and complex plots is ggplot2 (as the example below).
For many users, the main difference between R and other statistical software packages is that it is in itself a programming language and analyses are run by entering commands into a terminal (command line) rather than using drop-down menus.
There are many online resources available to help you learn to master R, ranging from dedicated webpages, such as Quick-R, through online books, to online courses such as are offered in Codeschool or Udacity and Coursera. You will find that having invested a rather small amount of time in learning R, you will have at your fingertips much more functionality and flexibility to really explore your data and understand it, than you previously had.
Resources for learning R
http://www.statmethods.net/ - Quick R
https://www.coursera.org/course/rprog - Coursera
http://tryr.codeschool.com/ - Code school
http://www.cyclismo.org/tutorial/R/ - Department of Mathematics and Computer Science
Other Useful Resources
http://blog.yhathq.com/posts/10-R-packages-I-wish-I-knew-about-earlier.html - 10 Packages i wish I knew about earlier
http://www.r-bloggers.com/ - aggregation of R related blogs. A great source of information and ideas
Disclaimer: Please note that this author does not represent and is not affiliated with R or any other software mentioned in this post.
(Many thanks to Micky Daniels for his linguistic and editorial advice)