It seems only fair that if we are going to talk about how to handle pseudo JSON files in R, that we should also talk about how to handle them in python. Similar to our previous example in R, we will use JIRA API to pull some JSON like data from JIRA.
import json, urllib2 url = "http://jira.atlassian.com/rest/api/latest/issue/JRA-9" data = json.load(urllib2.urlopen(url))
What we are doing here is essentially creating a JSON representation of the data in a python object. If you want to explore other ways to represent JSON in a python object I recommend taking a look at this page https://pythonspot.com/json-encoding-and-decoding-with-python/
Let’s print this and see what we have.
print data
Keep in mind, this isn’t a real JSON file, this is simply in a JSON like structure. If you don’t have nested objects, you might be able to convert this pretty easily to a csv (http://blog.appliedinformaticsinc.com/how-to-parse-and-convert-json-to-csv-using-python/).
Unfortunately this is not one of those cases. If you have a situation like this, I recommend taking the time to understand how the fields are nested within each other because that will inform how you want to pull the information out and store it in a csv (and maybe you don’t even want all the fields). If you are looking for inspiration, I recommend the following references:
https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv
https://stackoverflow.com/questions/40588852/pandas-read-nested-json
Most of these methods require a bit of hard coding on which fields you want, which means your code won’t be very flexible if you try to use it for other applications. There is a promising answer under the first link though, that describes how to create a function that will flatten JSON objects. Since I think both these stack overflow questions have answers that provide a lot of detail about what you can do (more than what I can provide), I’m not going to put an example here on how to convert pseudo JSON (pulled using JIRA API) into a csv.
Personally, I think dealing with pseudo JSON in R is easier than trying to deal with it in python. Especially if you want to visualize what the data looks like. There are good reasons though why you might want to work in python instead of R. Next post I will discuss then differences between python and R, and why you might want to use one over the other.
NOTE: You might also run into a situation where the data is actually stream of JSON like data. This might be a good resource if you have that situation:
https://stackoverflow.com/questions/19697846/how-to-convert-csv-file-to-multiline-json