R vs. Python

R and Python are major programming languages in the world of data science and both happen to be open-sourced (i.e. not proprietary and free to use). But which one should you used? As always, there is no clear cut answer but hopefully after reading this you will have a better idea of which one to pick.

Python

Python is object-oriented programming (OOP) language. What does this mean? This means that we are manipulating data (i.e. objects) rather than the logic around the data. Other languages that fall under this categorization are Java, C++, etc. However, unlike Java, Python’s syntax is much more simple. Additionally, while Python is open-sourced, it takes the approach that all functionality is built in to begin with. In terms of interface, Python is often run through the command line, but you can write in Jupyter notebooks or use the software PyCharm if you prefer that interface instead.

There is a high demand in industry for python in areas of web development, AI and Machine learning, etc due to the fact that python is more focused on deployment and production. Learning this language allows you to be a very attractive employee to industry. If you are interested in getting a job in programming in Python then you should learn the syntax, common modules (i.e. numpy), differences between Python 2 and 3, and uses of python in industry.

R

Unlike python, R is not an object-oriented programming but a procedural language (that being said, there are packages you can use though in R to do object oriented programming). R can be hard to learn at first though for people not familiar with programming. However it has an extensive and active online community, and a wide variety of packages. The main interface used for R is RStudio and allows easy viewing of data and variables. Additionally, R (and by extension R Shiny) is great at making visualizations.

R focuses on data analysis and statistics, and while certainly used in industry, is perhaps not valued as highly Python. R tends to be more favored in academia or research and development.

Conclusion

So which one should you learn? While you can accomplish many of the same things in both languages, I would recommend learning both. Why?

R (via RStudio) allows easy navigation of data and has a strong selection of data visualization packages, making it a strong workhorse of research and development.

Python has a wide variety of applications in industry and an important component of production based work.

Like any other skill, you should develop your programming languages and keep up on developments. If you don’t have the opportunity to learn a language via your current job, then take the time to develop it on you own. It will pay dividends later on in your career.

Working with JSON in python

It seems only fair that if we are going to talk about how to handle pseudo JSON files in R, that we should also talk about how to handle them in python. Similar to our previous example in R, we will use JIRA API to pull some JSON like data from JIRA.

import json, urllib2

url = "http://jira.atlassian.com/rest/api/latest/issue/JRA-9"
data = json.load(urllib2.urlopen(url))

What we are doing here is essentially creating a JSON representation of the data in a python object. If you want to explore other ways to represent JSON in a python object I recommend taking a look at this page https://pythonspot.com/json-encoding-and-decoding-with-python/

Let’s print this and see what we have.

print data
This isn’t the whole output, but you get the idea of the structure

This isn’t the whole output, but you get the idea of the structure

Keep in mind, this isn’t a real JSON file, this is simply in a JSON like structure. If you don’t have nested objects, you might be able to convert this pretty easily to a csv (http://blog.appliedinformaticsinc.com/how-to-parse-and-convert-json-to-csv-using-python/).

Unfortunately this is not one of those cases. If you have a situation like this, I recommend taking the time to understand how the fields are nested within each other because that will inform how you want to pull the information out and store it in a csv (and maybe you don’t even want all the fields). If you are looking for inspiration, I recommend the following references:

https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv

https://stackoverflow.com/questions/40588852/pandas-read-nested-json

Most of these methods require a bit of hard coding on which fields you want, which means your code won’t be very flexible if you try to use it for other applications. There is a promising answer under the first link though, that describes how to create a function that will flatten JSON objects. Since I think both these stack overflow questions have answers that provide a lot of detail about what you can do (more than what I can provide), I’m not going to put an example here on how to convert pseudo JSON (pulled using JIRA API) into a csv.

Personally, I think dealing with pseudo JSON in R is easier than trying to deal with it in python. Especially if you want to visualize what the data looks like. There are good reasons though why you might want to work in python instead of R. Next post I will discuss then differences between python and R, and why you might want to use one over the other.

NOTE: You might also run into a situation where the data is actually stream of JSON like data. This might be a good resource if you have that situation:

https://stackoverflow.com/questions/19697846/how-to-convert-csv-file-to-multiline-json