R language and data management

In continuity with what we did in CA one, I decided to continue working on population data as part of CA two.

After collecting data showing the evolution of the population every five years between 1976 and 2016 (source: World Bank), I have incorporated this data in the R code of R studio.

To create a graphic on this software, I resumed step by step the structure of the code seen in progress and on the websites cooking with R and codeschool. To do this, I did all the exercises proposed by the site to better familiarize myself with the R language, and to have a better overall idea of ​​the possibilities offered by the R language.

By breaking down the code, I realized that the time and population data had to be written from the beginning of the code in the data frame, and that each one had to correspond to an axis on the table. To do this, the data had to be written in order of their evolution over time to obtain a logical array. These data were to be used as a framework for the graph afterwards.

In a second step, I wanted to check if it worked. So, I converted the data entered into a graph to columns, then I added little by little the lines allowing me to obtain a line graph.

Finally, I added titles to my graph to allow a better reading of it and get the final version.

The utility of the R language is important for data processing in the same way as google forms, because this language makes it possible to represent a complex data set simply in the form of a graph. This language is accessible to the greatest number and has been widely used in the scientific community for a few years.

In this case, crossing the data obtained last time on the distribution of populations and their evolution over time over a specific geographical area allows us to constitute a set of data represented schematically and synthetically.

The R language, after a few hours of learning, allows to efficiently manage a large amount of data, and in case of errors it is possible to use Mahalanobis distance to check the accuracy of a sequence of logical data, for example.

In the end, the R language respects a certain logic that can be apprehended rather easily after a few hours of work. If I had had more time, I would have liked to create a curve merging that of the total population in Ireland plus two or three others representing the changes in the population through several large Irish cities in order to better correspond to the subject of CA1 and to observe which are the driving cities of the demographic evolution in the country. Another possibility offered by language has intrigued me, it is the three-dimensional mapping of the “volcano” type. This type of representation has several advantages in terms of original representations in the scientific field, and I think I will take a closer look at it when I have the opportunity.

Leave a Reply

Your email address will not be published. Required fields are marked *