CA2 Try R

mahalanobis distance

In our second assessment we were instructed to use R, a language and environment that is used for statistical computing and graphics. We were first tasked with completing the “Try R” course from Code School. This was a relatively easy and self-explanatory assignment that allowed me to gain the basic knowledge to try R Studio on my own. Below is that badge that say I completed the task. 

After completing this course we were instructed to take what we learned and use it on a different piece of data. I found my data on the R studio program. Because I am an American, I thought that it would be interesting if I used the data titled US Arrests. Below is the link to the data:

This data had murder, assault, rape, and urban population statistics for each state. I decided to take the data and make it even more specific eliminating all states accept the ones on the Eastern Seaboard. These states are: Maine, New Hampshire, Massachusetts, Rhode Island, Connecticut, New York, New Jersey, Delaware, Maryland, Virginia, North Carolina, South Carolina, and Georgia. These states collectively have a higher population density than any other region in the United States and 18 of its largest 100 cities are in these 14 states. I wanted to see if there was a relationship between a large urban population and violent crimes. I needed to use the r language to create a graph that could accurately illustrate the data.

I thought that a bar graph would be useful in illustrating this data. I typed in the code library(ggplot2) to open up the bar graph library. In order to easily compare the amount of crime being committed in each state I thought that a bar graph with states on the x-axis and the amount of arrests on the y-axis would be the most suitable graph. Each State would have three bars grouped by the specific violent crime. The code to obtain this graph is:

dat1 <- data.frame(

  State= factor(c(“CT”,”DE”,”FL”,”GA”,”ME”,”MD”,”MA”,” NH”,”NY”,”NJ”,”NC”,”RI”,”SC”,”VA”)),

  Crime= factor(c(“Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”,”Murder”,”Assault”,”Rape”), levels=c(“Murder”,”Assault”, “Rape”)),

  Crimeperhunderedthousand= c(3.3  ,  110  ,  11.1  ,  5.9  ,  238 ,  15.8  ,  15.4  ,  335 ,  31.9 ,  17.4 ,  211 ,  25.8 ,  2.1 ,  83 ,  7.8 ,  11.3 ,  300 ,  27.8 ,  4.4 ,  149 ,  16.3 ,  2.1  ,  57  ,  9.5  ,  7.4  ,  159  ,  18.8  ,  11.1 ,  254  ,  26.1  ,  13  ,  337  ,  16.1  ,  3.4  ,  174  , 8.3  ,  14.4  ,  279  ,  22.5  ,  8.5  ,  156  ,  20.7)


ggplot(data=dat1, aes(x=State, y=Crimeperhunderedthousand, fill=Crime)) +

  geom_bar(colour=”black”, stat=”identity”,


           size=.3) +                        

  scale_fill_hue(name=”Crimes”) +    

  xlab(“State”) + ylab(“Arrests per 100,000”) +

  ggtitle(“Arrests per 1000,000 in Eastern Seaboard States”) +


The red text is the code for putting the data in the graph and lining up which numbers go with each crime and state. The blue text is the code that organizes that data, incorporating axis labels as well as titles etc. Below are the two graphs that were created:


The first graph shows the arrests per 100,000 people for the crimes of murder, assault, and rape for eastern seaboard states. Looking at the graph there is a few things that we can deduce. First off, assault is the most prominent crime for arrests out of the three. Every state has more assaults than the other two crimes. On average, rape is the second most common and murder is the third. Other than that there is no real pattern we can see from the data.

The second graph shows the percentage of the state population that live in cities. New York has the highest, with almost 90% of the population living in cities (This makes sense because of New York City). The lowest is North Carolina with less than 50%.  Now that I created these two graphs I thought it would be easy to find a pattern between urban population and arrests. But after looking at the graphs side by side it was hard to find any correlation. New Hampshire and Georgia have highest number of assaults in the graph. They both have around 60% of their population in urban areas. New York, the state with the highest urban percentage, has the lowest amount of assaults. Since it was hard to find a pattern I decided to create another graph that could maybe help show a correlation. With scatter plots it his easier to see a pattern as well as create a regression like. The Scatter matrix I created is below.

Looking at this scatter plot matrix helps me see patterns much easier. I decided to use all the data given in the graph to create this. Each scatter plot has a regression line that can shows a pattern in which the data is distributed. In the Urban Population each regression line is almost flat. The data looks as though there is no correlation between population and these crimes. But looking at other columns you can see some trends. The amount of assaults and the amount of murders has a strong positive correlation. Rape and Assault as a positive correlation as well. Other than that there are no correlations I can see.

If I had more time on this assignment I would like to see what could be a reason for these violent crimes. It would efficient to make a box plot for each of the crimes to see their significant statistics. I would also like to make a heat map that could show which of the states have the most crimes and which do not.

In conclusion I would say that urban population does not correlate the amount of violent crime in a state. There are other reasons that would lead to this occurring.

Leave a Reply

Your email address will not be published. Required fields are marked *