Data Analytics with R
In this post I will explain what is R programming language and I will demonstrate how R studio could be used for data analytic.
R is a language for statistical computing and graphics. With R you are able to use a variety of statistical a graphical technique. One of the best feature of R are the use of different plots to visualize data. But R could be also used for different mathematical calculations or analysis. The other very good thing about R is free. Does not require any license fee and runs in LINUX, OSX, and WINDOWS. R programming language include data handling and storage facility, tools for data analysis, graphical facilities and display, input-output facilities and different functions.
RStudio is basically the programming interface of the R programming language. It includes a console, syntax-highlighting editor that supports direct code execution, and a variety of robust tools for plotting, viewing history, debugging and managing your work-space.
Using R sometimes could be really frustrating until you don’t have the necessary knowledge to work by yourself. However you can make really incredible things using only just a few commands. So I recommend to try R if you are interested in programming or data analysis or data-mining. Or if you like statistics and would like to compare different data to find some correlation than R is for you. Data analysis is also used by almost all enterprises to improve business. So if you would like to know more about R than here is a tutorial which brings you through the basics. http://tryr.codeschool.com/
The titanic assignment
After completing the tutorial above, I decided to challenge my skills in RStudio. Our lecturer mentioned a website where you can find different challenges for people who are enthusiastic for data analytic and data mining. You find the website here: https://www.kaggle.com/c/titanic
I choose a challenge where I have to predict the chances of survival on the Titanic. More precisely, predict what sort of people survived this tragedy. If you would like to have more information about the challenge or the tragedy please go to the link above.
To be honest at the beginning I did not have a clue where to start it or what to do but fortunately there are a tone of enthusiastic people online who share their technique with you. I watched a ton of videos before I could start the project by myself but I was able to achieve this basic plot.
In this plot you can see the number of passengers traveled on the Titanic on the first, second and third class. 0 represent people who did not survive, 1 represent people who survived. As you can see if you traveled on the first class your chances of survival was around 70%, if you traveled on the second class than only little bit more than 50% and if you traveled on the third class than you was more likely to die.
I will show you how did I get this simple analysis.
1. Download the required data from here: https://www.kaggle.com/c/titanic/data
2. Download and installed RStudio from here: https://www.rstudio.com/products/rstudio/download/
3. Create a new R script and set up working directory (click on the session tab)
4. read the csv file:
train <- read.csv(“train.csv”, header = TRUE)
test <- read.csv(“test.csv”, header = TRUE)
5.Have a look on your data:
6.Convert your pclass and survived variable to factor:
train$pclass <- as.factor(train$pclass)
train$survived <- as.factor(train$survived)
7. Check how many people survived the tragedy
8. Check in which classes how may passenger traveled
9.Load up ggplot2 package to use for visualizations
10. Run ggplot
ggplot(train, aes(x = pclass, fill = factor(survived))) +
ylab(“Total Count”) +
labs(fill = “Survived”)
So this is my basic concept about how could you survived the tragedy of the titanic. I was not satisfied with my knowledge so I decided to go deeper on this analysis with help and I was able to create another plot which is more detailed. As you can see on the chart below class was not only a factor, gender and age and marital status also had a big role.But I still just scratched the surface.
So now probably you know more about R. I hope I successfully demonstrated the power of R studio and I encourage you to try R because you can do very amazing things if you keep practicing.