Fun RESTful Paging with Bash and jq

Ubuntu Tux

I was pulling data from a RESTful API, and the returned data was paginated, as data like it often is, and I wanted to process it all in bash because I wanted to be able to share this with a large community. But how to really do that efficiently? I'm not worried about the time it takes to process data in bash, but just to make sure that we're not fetching the same data multiple times. What I came up with was pretty interesting.

First, let's set up the body of the RESTful call. Standard JSON, with the data in the payload as well as a nextLink key:

  {
    "events": [ { "dish": "Stevie is at Poppers!", "userId": "ralph" },
                { "dish": "Ralph is at Waffle House", "userId": "dorris" }
              ],
    "nextLink": "https://yoyo.com/api/rumors?page=4"
  }

where the events key is really holding the important data from the call, but because there was too much data to get in one (efficient) call, the service gave a complete link to hit to get the next page of data. Simple.

Now some APIs return a page number, or a placeholder value to pass to the same URL that generated this data, and this saves 100 bytes, but that's just a different way of getting the "next" page to the caller.

So... how to put all this in bash? Let's start with jq, and you can get it from Homebrew, with:

  $ brew install jq

and then we can look at the basic loop over all pages:

  # this is my YoYo API Token - you can get one too
  tok="111111111111111111111"
 
  # dump all the rumors into a file
  hit="https://yoyo.com/api/rumors"
  while [ "$hit" != "null" ]; do
    # make a temp file for the details of this page of the rumor group
    rumors=$(mktemp /tmp/yoyo.rumors.XXXXXXXX)
    # get the page, and possible next page URL...
    curl -s -X GET $hit \
       -H "accept: application/json" \
       -H "X-API-Token: $tok" > $rumors
    # write out as CSV, the data from this file
    for usr in `jq '.events[] | select(has("userId")) | .userId' $rumors`; do
      echo -e "$title,$usr"
    done
    # get the next URL to load from the service
    hit=`jq '.nextLink' $rumors | sed -e 's/^"//' -e 's/"$//'`
    # clean up the file as we don't need it any more
    rm $rumors
  done

In this example, the $title variable is something that's not defined, but would presumably be defined in the script before we get to the loop. Whatever the point of the gathering the data really is.

What I enjoyed about this is that we can get the data from the service with a simple curl command, and then process it with jq and the flexibility with jq is really quite impressive. In just a few lines, I was able to make a script that fetched a lot of data from a service, extract what I needed into a CSV for processing in a spreadsheet, and it didn't take all that long. Winner. 🙂