Try R

Introduction

R is a system for statistical computation and graphics. It provides, among other things, a programming language, high level graphics, interfaces to other languages and debugging facilities. The R language is widely used among statisticians and data miners for developing statistical software and data analysis

Code School

In this assessment I learn how to use the R language for Statistical Analysis on data. Before writing my blog, completion of seven chapters, covering the basic operation of the R language is needed in order to gain some basic knowledge on R, using the Try R course from Code School- http://tryr.codeschool.com/

try-r-end

Course Overview

Using R

A gentle introduction to R expressions, variables, and functions.

Vectors

Grouping values into vectors, then doing arithmetic and graphs with them.

Matrices

Creating and graphing two-dimensional data sets.

Summary Statistics

Calculating and plotting some basic statistics: mean, median, and standard deviation.

Factors

Creating and plotting categorized data.

Data Frames

Organizing values into data frames, loading frames from files and merging them.

Real-World Data

Testing for correlation between data sets, linear models, and extending R with additional libraries.

                                                  My Heat Map in RStudio

After completing this tutorial and with basic knowledge in R language I decided to create heatmap to visualize my data. Step one was to download RStudio and create some data. For this assignment I will be using statistics of top 20 players from American national hockey league (NHL) ordered by points they have score (goals + assists) , greatest to least in 2015/2016 season from www.nhl.com.

ovi                                                     Alex Ovechkin (Washington Capitals)

Simple copy and paste into my excel spredsheet will give me my data table which has to be converted into csv file in order to be easily imported into my RStudio.

There are few different methods how to load data into your RStudio, you can load your data by using commands like read csv () or manually input it through import dataset button on RStudio which I have done.  To prepare my data more accordingly, I replaced row numbers with rows by Players names with following command:.

row.names(nhl) <- nhl$Name

Is going to make a lot more sense if we use the Player name to name the rows rather than a number.

imported-data-in-rstudio

By default all data are sorted in ascending order, but we can easily change it with order( ) function the other way around, to descending order with following code if we want:

nhl <- nhl[order(nhl$Points),]

In next step in order to create heatmap we need to convert the data frame into a numeric matrix.

So I typed in following statement:

nhl_matrix <- data.matrix(nhl)

Finally, last line of code and we can generate our heatmap with cool heat-looking colors:

nhl_heatmap <- heatmap(nhl_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale=”column”, margins=c(5,10))

nhl-top-players

Various color schemes are available for presenting data, it is really up to individual preferences, all we have to do is change the argument to col.  Few examples: topo.colors, terrain.colors, cm.colors etc.

nhl_heatmap <- heatmap(nhl_matrix, Rowv=NA, Colv=NA, col = terrain.colors(256), scale=”column”, margins=c(5,10))

untitledterrain

Analysis

I will start with my favorite player, Alex Ovechkin. He had the most shots (S) on goal by far in last season so there is no surprise he was a top scorer (Goals) in the league, but big red square in assist column is showing us that he was last in this category from the top 20 players I have picked. So he does not pass, he likes to shoot. Is he selfish? Maybe but no big deal, he is “sniper” and every great goal scorer has to be little bit selfish now and then. As we can see on the heat map there is another clear yellow square in power play goals (PPG) column for him, together with Patrick Kane and Jamie Benn they are top dogs in this category. Patrick Kane was total point leader (Points) last season and he had best point per game ratio in league (P/GP). He was voted the Most Valuable player in the league last season and we can see why. All his columns are heating up to yellow color. He, Sidney Crosby and Jamie Benn were well above the average in almost every category. Plus/ minus is a statistic used to measure a player’s impact on the game, represented by the difference between their goals scoring versus their opponent’s when the player is on the ice. The king in this category is Anze Kopitar who only scored 25 goals but still finished with +34 in plus/minus. That’s mean his team was scoring goals more often when he was on the ice than conceding goals. In ice hockey, so called two-way players, who can score goals and defend as well are very valuable. Some players with huge amount of goals still finish their season with minus in plus/minus category. Joe Pavelski scored the most game winning goals. Almost every third goal he scored was game winning goal which is fantastic. He also had a best shooting percentage in last season (S%). Nearly his every 5th shoot on goal ended in the net. Only 6 players have achieved 1+ point per game. The season is long, to be precise 82 games is played in one season. Every player from our heat map have played at least 71 games and those 6 players who have point+ per game ratio, that kind of consistency is unbelievable. But in the end, they are getting paid top dollars to do so. And to get that multimillion dollar contract every, every point matter.

Conclusion

As a total rookie in R language I really enjoyed this assignment. What I have learned from my research, R is the leading open source statistical and data analysis programming language, and its popularity is still growing! Definitely skill I would like to pick up in near future. Another advantage is that R is running on all the platforms, windows, Mac, Linux and it has more than 2000 libraries to use in many areas, like cluster analysis, prediction etc. It’s not that hard to learn it, and it amaze me that with only few lines of codes I was able to generate my heat map, plots and charts etc.

coding

References

 

 

Google Fusions Tables

In this blog I will construct an Intensity map for the Republic of Ireland with data received from the 2011 census. Also providing the readers with an informational  guide on achieving your own map in Google fusion tables. Along with information such as merging,layering,gleaming data and concepts on data that can be abstracted from similar intensity maps.

Irish Population Intensity Map

Below is an image of a Intensity map representing the Republic of Ireland’s population and boundary lines. Each county is divided into male,female and total population. Highest populated areas are represented by a vibrant red colour gradually fading as the population reduces whilst light blue represents the least populated counties.

screenshot-15 

Achieving Intensity Map

To achieve the intensity map for the Republic of Ireland I first had to source the correct data sets. I needed population files(Excel) and county boundary lines(KML).I downloaded the appropriate files needed to complete this task.

I was now ready to open Fusion tables and begin to upload my population data. This data was opened in spreadsheet format. I read trough the data sheet checking all data was input and correct for male,female and total population I also noticed that Laois was spelled incorrect and would not merge with Laois boundary lines after correcting this error I repeated this process for my KML file.

I now had two separate files uploaded on Fusion tables and needed to merge my data creating one single file. I matched both files with county names and began to merge.

When merging was complete I opened map geometry and checked the merge was successful. I still needed to make my intensity map more visually effective.

I began to bucket fill counties depending on total population. Red was visually prominent against the blue  and became the most heavily populated areas . All other counties decreased from dark blue to a lighter blue in colour.

I also wanted to see the stops and coverage of transport offered by the Luas trams in Dublin City Center. I downloaded Luas stops in KML file and uploaded to Fusion tables. This file needed to be layered over the original merge I created earlier. This was achievable by using the Fusion table wizard and layering both files one over the other.

screenshot-20

Gleaming Data from Intensity Map

Now that my intensity map has relevant data and structure we can begin to gleam information that can be used in relevant situations. The map highlights in red the highest populated counties in this case been Ireland’s three main cities(Galway,Cork,Dublin). Its also clear to see the coastal counties contain the highest populations and this gradually reduces the further inland you move. The Luas data within Dublin City allows the end user to view stop points and facilities. Aiding in future expansion and development where population is high and transport is low. Other data that may be relevant are employment rates and vehicles registered. If the goal is to reduce vehicle congestion and vacilitate employment transport.

screenshot-11 

what other ideas/concepts could be represented in the intensity map

Intensity maps have the ability to show large clusters of data in a simplistic but effective way. Data formated this way has many uses such as merging data for business or just visual effect.

Intensity maps can be used to define the progression in govermental elections,religion,education,wealth. and much more if a large grocery chain merges data for all  major grocery chains within that area. This data could be very useful if they’re future plan is to find a location for your new retail store. Population of the area and proximity to your nearest rivals could be critical in your decision making. Also with the ability to create a map with the locations of your segmentation preferences.

I sourced a map highlighting Hispanic origin within the various counties in Texas. This information is key if your agenda is to open numerous Mexican restaurants within the state of Texas.
texas

https://blogs.journalism.co.uk

Google Fusion Tables

Let’s Start

fs

In this project we were asked to create heat map outlining population density of every county in Republic of Ireland. It is my first experience working with Google fusion tables and its really good one.  Google Fusion Tables app allows me to create thematic web maps, in which geographic areas are filled in with color or shade according to data values. Thanks to this application I was able to launch my own data out of hiding, combine it with other data on a Web site, collaborate, visualize and share it with world.

mapMy First Map

Geographical data providing the county boundaries in the KMZ data file and Irish County 2011 Population were given to us in our Assessment brief so no research was required there however some data cleansing had to be done in population table which not only contains county information but also divides some counties into South-North and in some cases includes cities breakdown. This is how my cleaned up data set looks like now, copied into excel spredsheet ,ready to be added to Fusion Tables and merged with county boundary geographical KML file (Keyhole Mark-up Language) to create population heat map by county.

ahaNot done yet

To make my new map look even prettier I decided to add another table. Because I have been living in Dublin for more than decade now and during this time had to change my address few times, I’m familiar with rent and house prices in “my city”. Not so much with rest of the Ireland though. So I created another table with 2016 average house prices and average rent prices for each county in republic. All data were gathered from www.daft.ie and copied in to my excel table and merged again with existing population map. And here is my final map where it is clearly seen that rent and house prices in Dublin and his commuting counties are the most expensive in Ireland which is really not big surprise but what strikes me the most is that average rent price per month is ranging between 500 and 700 euro in every county , except Dublin and his surrounding counties (Louth, Meath, Wicklow and Kildare) where the rent average is well above 1000 euro.

<iframe width=”500″ height=”300″ scrolling=”no” frameborder=”no” src=”https://fusiontables.google.com/embedviz?q=select+col2%3E%3E2+from+1I0HncBfKPQyxZg9c7eI3LXktD_mHauSQqpbCNVWD&amp;viz=MAP&amp;h=false&amp;lat=53.330062550152&amp;lng=-8.29716635156251&amp;t=1&amp;z=6&amp;l=col2%3E%3E2&amp;y=2&amp;tmplt=2&amp;hml=KML”></iframe>

What Else Can I do

The possibilities are endless but as a huge sports fan in general it’s no brainer here for me. Next map will be about GAA tittles.

dublinMaybe I’m just jumping on bandwagon because Dublin won their second All-Ireland Football championship tittle in row few days ago and to be honest with you, I do not even fully understand the rules of the game yet, but as fan of every sport where ball is involved and the citizen of  this city I claim that tittle to be mine as well 🙂 Virtually every town and village in Ireland has a GAA club, which plays hurling or gaelic football, or usually both. Each club is affiliated one of the relevant county GAA Boards, of which there are 26 in the Republic of Ireland and 6 in Northen Ireland. With huge respect to Northen Ireland GAA clubs I’m going to leave them out of my table  and count just every senior football and hurling tittle won by county in Republic of Ireland since 1887. In addition to that, another column with number of affiliated GAA clubs in County will be created and again merged  with county boundary geographical KML file. My new heat map looks like this:

<iframe width=”500″ height=”300″ scrolling=”no” frameborder=”no” src=”https://fusiontables.google.com/embedviz?q=select+col2%3E%3E1+from+1Sn4smkRmrv8r1N6-Hp4X2te30LXgBc97-xbMAcJx&amp;viz=MAP&amp;h=false&amp;lat=52.575525241002005&amp;lng=-0.23869467187500382&amp;t=1&amp;z=5&amp;l=col2%3E%3E1&amp;y=2&amp;tmplt=2&amp;hml=KML”></iframe>

GAA Clubs By Numbers

24 — Number of affiliated GAA clubs in Leitrim, the smallest of any county, but just two less than Sligo. Both counties never won single All-Ireland GAA championships

101 — Number of affiliated GAA clubs in  Limerick, which makes them the third highest in the country but with only 9 All-Ireland GAA championships on their name

134– Number of affiliated GAA clubs in Dublin, second highest in the country but still on the 4th place in All-Ireland GAA championships rankings.

259– Number of affiliated GAA clubs in Cork, the highest in the country by far, but still only one less All-Ireland GAA championships than Kerry.

38 — Number of All-Ireland Senior GAA tittles won by Kerry , the highest in the country.

37 — Number of All-Ireland Senior Footbal championships won by Kerry, the highest in the country.

36 –Number of All-Ireland Senior Hurling Championship won by Kilkenny, the highest in the country. Only 41 affiliated GAA clubs in county and without single All-Ireland Senior Footbal championship trophy.

Game Over

bye

Google Fusion Tables are a great tool to help ease the process of data management, as well as data analysis.  While Microsoft Excel which i have been using forever for data management  is very useful, and offers a wide range of options for data manipulation, many of the steps needed to create useful tables, and graphs are boring and tiresome to program. Google Fusion Tables makes this entire process much easier. I’ve been aware of Google’s Fusion Tables for a couple of months now, but used to be a little suspicious of them. Sometimes visualization tools may require technical knowledge or are just too expensive.  After this exercise where i finally give it a go i felt much better about it, its very user friendly and free.

Links to my Fusion Maps

  • https://fusiontables.google.com/DataSource?docid=1Sn4smkRmrv8r1N6-Hp4X2te30LXgBc97-xbMAcJx#map:id=3
  • https://fusiontables.google.com/DataSource?docid=1I0HncBfKPQyxZg9c7eI3LXktD_mHauSQqpbCNVWD#map:id=3

References

  • http://www.cso.ie/en/statistics/population/populationofeachprovincecountyandcity2011/
  • http://www.independent.ie/editorial/test/map_lead.kml
  • http://www.daft.ie
  • http://www.gaa.ie
  • http://www.http://europegaa.eu

Level 8 BA (Hons) Business Information Systems 2016-2017

Here lieth the data projects for the students of Dublin Business School 2016-2017 BA (Hons) Business Information Systems as they journey through big data, data management, and data analytics.

Level 8 BA (Hons) Business Information Systems
Name Blog
aaron http://aaron.dbsdataprojects.com/
agata http://agata.dbsdataprojects.com/
alessandra http://alessandra.dbsdataprojects.com/
annsofie http://annsofie.dbsdataprojects.com/
aoifec http://aoifec.dbsdataprojects.com/
atilla http://atilla.dbsdataprojects.com/
awalsh http://awalsh.dbsdataprojects.com/
awut http://awut.dbsdataprojects.com/
chiedu http://chiedu.dbsdataprojects.com/
conorw http://conorw.dbsdataprojects.com/
darragh http://darragh.dbsdataprojects.com/
davidg http://davidg.dbsdataprojects.com/
dermot http://dermot.dbsdataprojects.com/
edit http://edit.dbsdataprojects.com/
grahamd http://grahamd.dbsdataprojects.com/
janos http://janos.dbsdataprojects.com/
jason http://jason.dbsdataprojects.com/
jonathanr http://jonathanr.dbsdataprojects.com/
jozsef http://jozsef.dbsdataprojects.com/
keithe http://keithe.dbsdataprojects.com/
keithh http://keithh.dbsdataprojects.com/
lysiane http://lysiane.dbsdataprojects.com/
malcolm http://malcolm.dbsdataprojects.com/
markg http://markg.dbsdataprojects.com/
marta http://marta.dbsdataprojects.com/
michaels http://michaels.dbsdataprojects.com/
mivory http://mivory.dbsdataprojects.com/
monika http://monika.dbsdataprojects.com/
natasha http://natasha.dbsdataprojects.com/
nikola http://nikola.dbsdataprojects.com/
padhraic http://padhraic.dbsdataprojects.com/
pawel http://pawel.dbsdataprojects.com/
qualab http://qualab.dbsdataprojects.com/
rory http://rory.dbsdataprojects.com/
ross http://ross.dbsdataprojects.com/
shane http://shane.dbsdataprojects.com/
shery http://shery.dbsdataprojects.com/
sineadh http://sineadh.dbsdataprojects.com/
siobhanr http://siobhanr.dbsdataprojects.com/
stepheno http://stepheno.dbsdataprojects.com/
stephenp http://stephenp.dbsdataprojects.com/
stephenr http://stephenr.dbsdataprojects.com/
steven http://steven.dbsdataprojects.com/
tugrul http://tugrul.dbsdataprojects.com/
vusumuzi http://vusumuzi.dbsdataprojects.com/
zdenko http://zdenko.dbsdataprojects.com/

Level 8 Certificate in Data Management and Analytics 2016 Block 2

Here lieth the data projects for the students of Dublin Business School 2016 block 2 as they journey through big data, data management, and data analytics.

Level 7 Diploma in Big Data for Business

Here lieth the data projects for the students of Dublin Business School as they journey through big data, data management. and data analytics.

Level 8 Higher Diploma in Data Analytics 2016 Block 1

Here lieth the data projects for the students of Dublin Business School’s Higher Diploma in Data Analytics class as they journey through big data, data management, and data analytics.

Level 8 Certificate in Data Management and Analytics 2016 Block 1

Here lieth the data projects for the students of Dublin Business School 2016 block 1 as they journey through big data, data management, and data analytics.

Level 8 Certificate in Data Management and Analytics
Name Blog
abraham http://abraham.dbsdataprojects.com/
alberto http://alberto.dbsdataprojects.com/
aleksandrs http://aleksandrs.dbsdataprojects.com/
alison http://alison.dbsdataprojects.com/
arron http://arron.dbsdataprojects.com/
celina http://celina.dbsdataprojects.com/
claire http://claire.dbsdataprojects.com/
damilola http://damilola.dbsdataprojects.com/
deirdre http://deirdre.dbsdataprojects.com/
diana http://diana.dbsdataprojects.com/
eamonn http://eamonn.dbsdataprojects.com/
eoghan http://eoghan.dbsdataprojects.com/
evaldas http://evaldas.dbsdataprojects.com/
geraldine http://geraldine.dbsdataprojects.com/
graham http://graham.dbsdataprojects.com/
hannu http://hannu.dbsdataprojects.com/
helena http://helena.dbsdataprojects.com/
herve http://herve.dbsdataprojects.com/
jekaterina http://jekaterina.dbsdataprojects.com/
joanne http://joanne.dbsdataprojects.com/
johanna http://johanna.dbsdataprojects.com/
karolina http://karolina.dbsdataprojects.com/
kevin http://kevin.dbsdataprojects.com/
khaled http://khaled.dbsdataprojects.com/
lorna http://lorna.dbsdataprojects.com/
mantas http://mantas.dbsdataprojects.com/
markd http://markd.dbsdataprojects.com/
michaeli http://michaeli.dbsdataprojects.com/
miyei http://miyei.dbsdataprojects.com/
monica http://monica.dbsdataprojects.com/
niall http://niall.dbsdataprojects.com/
noel http://noel.dbsdataprojects.com/
puneet http://puneet.dbsdataprojects.com/
rachel http://rachel.dbsdataprojects.com/
rosemary http://rosemary.dbsdataprojects.com/
sandrac http://sandrac.dbsdataprojects.com/
seamus http://seamus.dbsdataprojects.com/
sinead http://sinead.dbsdataprojects.com/
stefan http://stefan.dbsdataprojects.com/
tara http://tara.dbsdataprojects.com/
tehseen http://tehseen.dbsdataprojects.com/
temitope http://temitope.dbsdataprojects.com/
tezo http://tezo.dbsdataprojects.com/
tony http://tony.dbsdataprojects.com/
william http://william.dbsdataprojects.com/

Level 8 Certificate in Data Management and Analytics 2015 Block 2

Here lieth the data projects for the students of Dublin Business School 2015 block 2 as they journey through big data, data management, and data analytics.

Level 8 Certificate in Data Management and Analytics 2015 Block 1

Here lieth the data projects for the students of Dublin Business School 2015 block 1 as they journey through big data, data management, and data analytics.