R is a system for statistical computation and graphics. It provides, among other things, a programming language, high level graphics, interfaces to other languages and debugging facilities. The R language is widely used among statisticians and data miners for developing statistical software and data analysis
In this assessment I learn how to use the R language for Statistical Analysis on data. Before writing my blog, completion of seven chapters, covering the basic operation of the R language is needed in order to gain some basic knowledge on R, using the Try R course from Code School- http://tryr.codeschool.com/
A gentle introduction to R expressions, variables, and functions.
Grouping values into vectors, then doing arithmetic and graphs with them.
Creating and graphing two-dimensional data sets.
Calculating and plotting some basic statistics: mean, median, and standard deviation.
Creating and plotting categorized data.
Organizing values into data frames, loading frames from files and merging them.
Testing for correlation between data sets, linear models, and extending R with additional libraries.
My Heat Map in RStudio
After completing this tutorial and with basic knowledge in R language I decided to create heatmap to visualize my data. Step one was to download RStudio and create some data. For this assignment I will be using statistics of top 20 players from American national hockey league (NHL) ordered by points they have score (goals + assists) , greatest to least in 2015/2016 season from www.nhl.com.
Simple copy and paste into my excel spredsheet will give me my data table which has to be converted into csv file in order to be easily imported into my RStudio.
There are few different methods how to load data into your RStudio, you can load your data by using commands like read csv () or manually input it through import dataset button on RStudio which I have done. To prepare my data more accordingly, I replaced row numbers with rows by Players names with following command:.
row.names(nhl) <- nhl$Name
Is going to make a lot more sense if we use the Player name to name the rows rather than a number.
By default all data are sorted in ascending order, but we can easily change it with order( ) function the other way around, to descending order with following code if we want:
nhl <- nhl[order(nhl$Points),]
In next step in order to create heatmap we need to convert the data frame into a numeric matrix.
So I typed in following statement:
nhl_matrix <- data.matrix(nhl)
Finally, last line of code and we can generate our heatmap with cool heat-looking colors:
nhl_heatmap <- heatmap(nhl_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale=”column”, margins=c(5,10))
Various color schemes are available for presenting data, it is really up to individual preferences, all we have to do is change the argument to col. Few examples: topo.colors, terrain.colors, cm.colors etc.
nhl_heatmap <- heatmap(nhl_matrix, Rowv=NA, Colv=NA, col = terrain.colors(256), scale=”column”, margins=c(5,10))
I will start with my favorite player, Alex Ovechkin. He had the most shots (S) on goal by far in last season so there is no surprise he was a top scorer (Goals) in the league, but big red square in assist column is showing us that he was last in this category from the top 20 players I have picked. So he does not pass, he likes to shoot. Is he selfish? Maybe but no big deal, he is “sniper” and every great goal scorer has to be little bit selfish now and then. As we can see on the heat map there is another clear yellow square in power play goals (PPG) column for him, together with Patrick Kane and Jamie Benn they are top dogs in this category. Patrick Kane was total point leader (Points) last season and he had best point per game ratio in league (P/GP). He was voted the Most Valuable player in the league last season and we can see why. All his columns are heating up to yellow color. He, Sidney Crosby and Jamie Benn were well above the average in almost every category. Plus/ minus is a statistic used to measure a player’s impact on the game, represented by the difference between their goals scoring versus their opponent’s when the player is on the ice. The king in this category is Anze Kopitar who only scored 25 goals but still finished with +34 in plus/minus. That’s mean his team was scoring goals more often when he was on the ice than conceding goals. In ice hockey, so called two-way players, who can score goals and defend as well are very valuable. Some players with huge amount of goals still finish their season with minus in plus/minus category. Joe Pavelski scored the most game winning goals. Almost every third goal he scored was game winning goal which is fantastic. He also had a best shooting percentage in last season (S%). Nearly his every 5th shoot on goal ended in the net. Only 6 players have achieved 1+ point per game. The season is long, to be precise 82 games is played in one season. Every player from our heat map have played at least 71 games and those 6 players who have point+ per game ratio, that kind of consistency is unbelievable. But in the end, they are getting paid top dollars to do so. And to get that multimillion dollar contract every, every point matter.
As a total rookie in R language I really enjoyed this assignment. What I have learned from my research, R is the leading open source statistical and data analysis programming language, and its popularity is still growing! Definitely skill I would like to pick up in near future. Another advantage is that R is running on all the platforms, windows, Mac, Linux and it has more than 2000 libraries to use in many areas, like cluster analysis, prediction etc. It’s not that hard to learn it, and it amaze me that with only few lines of codes I was able to generate my heat map, plots and charts etc.