Working with JSON in R

I’m not sure if it is just the type of projects I’ve been taking on lately, or if there is some cosmic force out to get me, but I’ve run into JSON (JavaScript Object Notation) a couple of times in the last month.

JSON has it roots in the late 1990s and early 2000s, and has the format of two different data structures: collection of name/value pairs and ordered list of values. Now you are probably saying at this moment “Let’s hurry it up to where I can get a usable data frame out of this” but I think it is useful to review what this actually means. It means that there can be objects nested as one of the values and you can get a somewhat tree-like structure of data (for more detail on JSON in general, Louis Lazaris has a particularly good page that explains it: https://www.impressivewebs.com/what-is-json-introduction-guide-for-beginners/).

I’m not really a fan of JSON and I’m not the only one with strong feelings, but it is the reality of the world that JSON exists and that we have to deal with data in that format.

Now R does have a few packages to help you parse out JSON data (mainly JSON, RJSONIO, and jsonlite, although it looks like JSON is not available for the most recent version of R). Now, it should be as simple as:

library(httr)
library(jsonlite)

rm(list=ls())

# Using URI path for JIRA's public issue tracker
json <- httr::GET("http://jira.atlassian.com/rest/api/latest/issue/JRA-9")
mydf <- fromJSON(json)

However if you are running this yourself, you will notice that instead of getting a data frame, you got an error.

Error: Argument 'txt' must be a JSON string, URL or file.

As far as I’ve been able to tell, this is a particular issue with JSON from JIRA, and not necessarily reflective of all JSON data. But it does mean we need to find a way around it. So I did what most people do. I googled. Google provided some possible (albeit painful) solutions, but what quickly became apparent is that this is an issue that is frustrating many people.

The conclusion I finally came to is that when using JIRA API the file I’m getting is not truly a JSON file. It is psuedo JSON file. Thanks JIRA… This is why we can’t have nice things. To be fair to JIRA, apparently it not uncommon to use API to get data, and end up with something containing JSON lines but is not actually a JSON array. If you want to take a look at the stack overflow post I ended up basing my solution on: https://stackoverflow.com/questions/26519455/error-parsing-json-file-with-the-jsonlite-package

library(jsonlite)

rm(list=ls())

# Using URL path for JIRA's public issue tracker
df <- jsonlite::stream_in(file("https://jira.atlassian.com/rest/api/latest/issue/JRA-9"), verbose=FALSE)

Viola! You have data. Albeit maybe somewhat messily named data, but we can’t win at everything. Unfortunately, this method is somewhat problematic if you need to use GET for username and password authentication. While I’m not sure it is the best solution, I have used the following method when pulling in psuedo JSON files using GET.

#avoid hardcoding username/password for security purposes
query <- httr:GET("http://host:port/rest/api/latest/search?jql=project="Project Name"
    &maxResults="num of Results you want to limit to"", authenticate("username","password","basic")

query <- httr:content(query, as="raw")
json <- jsonlite::fromJSON(rawToChar(query))
df <- flatten(as.data.frame(json))

As you can see, there was a lot of things jsonlite::stream_in was handling on the backend that we actually had to specify after pulling in data using httr:GET. Again, this is probably not the most elegant solution, but it gets the job done.

When not dealing with psuedo JSON files I do recommend this link: https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html

What is particularly nice about this page is that it walks through some steps that tend to get skipped over in most example solutions for converting JSON into a data frame.