Working with JSON in python

It seems only fair that if we are going to talk about how to handle pseudo JSON files in R, that we should also talk about how to handle them in python. Similar to our previous example in R, we will use JIRA API to pull some JSON like data from JIRA.

import json, urllib2

url = "http://jira.atlassian.com/rest/api/latest/issue/JRA-9"
data = json.load(urllib2.urlopen(url))

What we are doing here is essentially creating a JSON representation of the data in a python object. If you want to explore other ways to represent JSON in a python object I recommend taking a look at this page https://pythonspot.com/json-encoding-and-decoding-with-python/

Let’s print this and see what we have.

print data
This isn’t the whole output, but you get the idea of the structure

This isn’t the whole output, but you get the idea of the structure

Keep in mind, this isn’t a real JSON file, this is simply in a JSON like structure. If you don’t have nested objects, you might be able to convert this pretty easily to a csv (http://blog.appliedinformaticsinc.com/how-to-parse-and-convert-json-to-csv-using-python/).

Unfortunately this is not one of those cases. If you have a situation like this, I recommend taking the time to understand how the fields are nested within each other because that will inform how you want to pull the information out and store it in a csv (and maybe you don’t even want all the fields). If you are looking for inspiration, I recommend the following references:

https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv

https://stackoverflow.com/questions/40588852/pandas-read-nested-json

Most of these methods require a bit of hard coding on which fields you want, which means your code won’t be very flexible if you try to use it for other applications. There is a promising answer under the first link though, that describes how to create a function that will flatten JSON objects. Since I think both these stack overflow questions have answers that provide a lot of detail about what you can do (more than what I can provide), I’m not going to put an example here on how to convert pseudo JSON (pulled using JIRA API) into a csv.

Personally, I think dealing with pseudo JSON in R is easier than trying to deal with it in python. Especially if you want to visualize what the data looks like. There are good reasons though why you might want to work in python instead of R. Next post I will discuss then differences between python and R, and why you might want to use one over the other.

NOTE: You might also run into a situation where the data is actually stream of JSON like data. This might be a good resource if you have that situation:

https://stackoverflow.com/questions/19697846/how-to-convert-csv-file-to-multiline-json

Working with JSON in R

I’m not sure if it is just the type of projects I’ve been taking on lately, or if there is some cosmic force out to get me, but I’ve run into JSON (JavaScript Object Notation) a couple of times in the last month.

JSON has it roots in the late 1990s and early 2000s, and has the format of two different data structures: collection of name/value pairs and ordered list of values. Now you are probably saying at this moment “Let’s hurry it up to where I can get a usable data frame out of this” but I think it is useful to review what this actually means. It means that there can be objects nested as one of the values and you can get a somewhat tree-like structure of data (for more detail on JSON in general, Louis Lazaris has a particularly good page that explains it: https://www.impressivewebs.com/what-is-json-introduction-guide-for-beginners/).

I’m not really a fan of JSON and I’m not the only one with strong feelings, but it is the reality of the world that JSON exists and that we have to deal with data in that format.

Now R does have a few packages to help you parse out JSON data (mainly JSON, RJSONIO, and jsonlite, although it looks like JSON is not available for the most recent version of R). Now, it should be as simple as:

library(httr)
library(jsonlite)

rm(list=ls())

# Using URI path for JIRA's public issue tracker
json <- httr::GET("http://jira.atlassian.com/rest/api/latest/issue/JRA-9")
mydf <- fromJSON(json)

However if you are running this yourself, you will notice that instead of getting a data frame, you got an error.

Error: Argument 'txt' must be a JSON string, URL or file.

As far as I’ve been able to tell, this is a particular issue with JSON from JIRA, and not necessarily reflective of all JSON data. But it does mean we need to find a way around it. So I did what most people do. I googled. Google provided some possible (albeit painful) solutions, but what quickly became apparent is that this is an issue that is frustrating many people.

The conclusion I finally came to is that when using JIRA API the file I’m getting is not truly a JSON file. It is psuedo JSON file. Thanks JIRA… This is why we can’t have nice things. To be fair to JIRA, apparently it not uncommon to use API to get data, and end up with something containing JSON lines but is not actually a JSON array. If you want to take a look at the stack overflow post I ended up basing my solution on: https://stackoverflow.com/questions/26519455/error-parsing-json-file-with-the-jsonlite-package

library(jsonlite)

rm(list=ls())

# Using URL path for JIRA's public issue tracker
df <- jsonlite::stream_in(file("https://jira.atlassian.com/rest/api/latest/issue/JRA-9"), verbose=FALSE)

Viola! You have data. Albeit maybe somewhat messily named data, but we can’t win at everything. Unfortunately, this method is somewhat problematic if you need to use GET for username and password authentication. While I’m not sure it is the best solution, I have used the following method when pulling in psuedo JSON files using GET.

#avoid hardcoding username/password for security purposes
query <- httr:GET("http://host:port/rest/api/latest/search?jql=project="Project Name"
    &maxResults="num of Results you want to limit to"", authenticate("username","password","basic")

query <- httr:content(query, as="raw")
json <- jsonlite::fromJSON(rawToChar(query))
df <- flatten(as.data.frame(json))

As you can see, there was a lot of things jsonlite::stream_in was handling on the backend that we actually had to specify after pulling in data using httr:GET. Again, this is probably not the most elegant solution, but it gets the job done.

When not dealing with psuedo JSON files I do recommend this link: https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html

What is particularly nice about this page is that it walks through some steps that tend to get skipped over in most example solutions for converting JSON into a data frame.

Using JIRA API in R

Recently at work, a problem with reading data from JIRA via R came up and I thought I would write up some of my thoughts about it here.

For those of you who aren’t familiar with API, it stand for Application Programming Interface and most websites have some form of API. You can use API to directly talk to a server.

In this specific post, I want to talk directly with the JIRA servers. If you are reading this post then you are probably already familiar with JIRA but it you aren’t, it is essentially a software provided by Atlassian for “agile teams”. Putting aside my feelings about the overuse of terms agile and teams together, let’s dig into the API.

Atlassian does actually provide a page on JIRA API.

The REST here means Representational State Transfer, which really is just fancy language describing the architectural style of the API. If you want to learn more about REST I recommend looking it up as I won’t go in more detail about it here.

Now, there are several ways to pull data into R using JIRA API (curl, GET, etc.). In this particular example I will use GET.

library(httr)

# Using URI path for JIRA's public issue tracker
json <- httr::GET("http://jira.atlassian.com/rest/api/latest/issue/JRA-9")

I like using GET because I do have the option of using a username/password to authenticate access built in to the function. For obvious reasons though you should avoid hardcoding username/password information into your code. If you are having trouble finding your company/institute’s specific URI path, you will want something like this:

http://host:port/rest/api/2

OR

http://host:port/rest/api/latest

You can also specify if you want to look up a specific project and how many results (JIRA API will automatically restrict you to 50 issues but you can overwrite this default).

http://host:port/rest/api/latest/search?jql=project="Project Name"&maxResults="num of Results you want to limit to"

Just make sure to replace the "Project Name" and "num of Results you want to limit to"
(including the quotes) with the appropriate inputs

And viola, you have use JIRA API to bring data from JIRA into R. Probably right about now you are asking yourself, “Wait a minute, what is this? How am I suppose to use this?”. What you are looking at is JSON (JavaScript Object Notation) data. Unfortunately, pretty much whenever you use API you will get JSON data. And frankly, as someone in analytics, I find it an awful format. While I understand there are some benefits to using it (lightweight format, easy to parse/generate, etc.), I like working in data frames and data tables. Which means the rare moments I do encounter JSON, it is the bane of my exist.

My following post will be on what are some of the best ways to try to convert the JSON data into a usable data frame.

EDIT: I’ve reviewed this post and found that the links I originally provided for username and password authentication are not the most helpful. I’ve updated accordingly and provided an example below.

query <- httr:GET("http://host:port/rest/api/latest/search?jql=project="Project Name"
    &maxResults="num of Results you want to limit to"", authenticate("username","password","basic")

Obviously, it isn’t a great idea to hardcode a password and username into your code. I would recommend only using this method if you are prompting the user for username and password input to use in this query (and not actually saving the input). If you need more ideas on how to use password authentication with JIRA API I would recommend checking the following post:

https://community.atlassian.com/t5/Jira-questions/Passing-username-and-password-via-URL-to-jira/qaq-p/16679