0

What’s the right way to pull a complete answer to an InfluxQL query over http?

I’m using the acct_gather plugin for a slurm cluster. It sends resource usage data to an influxdb v1 database. So if I write

#SBATCH --profile=Task

in an sbatch file, it records things like memory, I/O, and CPU usage to the database.

But if I try to ask for that data as a json file, e.g.,...

jobid=12345
curl -G 'http://<ip address>:<port>/query?' \
  --data-urlencode "db=myslurmdatabase" \
  --data-urlencode 'q=select "value" from /./ where "job"='"'$jobid'"

...then I get a partial response with only one type of measurement ("CPUFrequency"):

{
  "results": [
    {
      "statement_id": 0,
      "series": [
        {
          "name": "CPUFrequency",
          "columns": [
            "time",
            "value"
          ],
          "values": [

...

          ],
          "partial": true
        }
      ]
    }
  ]
}

I think this happens for jobs that have run past a certain number of data points.

What I've found

  • In this thread on github somebody asked:

    So how does it work? Do you get a url with the second chunk or does the http response contain multiple json bodies? Is this compatible with the json decoders in browser?

    People replied to the effect that modern browsers can handle it, but I don’t think they answered the question directly.

  • There’s a “chunked” parameter for the /query endpoint. The options are either true (in which case it chunks based on series or 10,000 data points), or a specific number of points (in which case it chunks based on that number). Chunking happens either way. But it’s not clear to me how to get the next chunk.

  • It looks like somebody has written a third-party program that can stream the chunked results from a query. But is it possible with curl, or would I have to use something like this?

1 Answer 1

1

InfluxQL supports LIMIT and OFFSET clauses, so you can curl the first thousand points, then the next thousand points, and so on:

# first thousand
curl -G 'http://<ip address>:<port>/query?' \
  --data-urlencode "db=myslurmdatabase" \
  --data-urlencode 'q=select "value" from /./ where "job"='"'$jobid' limit 1000"

# next thousand
curl -G 'http://<ip address>:<port>/query?' \
  --data-urlencode "db=myslurmdatabase" \
  --data-urlencode 'q=select "value" from /./ where "job"='"'$jobid' limit 1000 offset 1000"

If there are no more points, the result will just be empty apart from a statement ID:

{"results":[{"statement_id":0}]}

Since what we’re after is just the "series" element (which might not exist), we can use jq -e '.results[0].series' to either print this or exit 1 if it doesn’t exist:

offset=0
# this will go until jq exits 1 because 'series' is missing
while curl -G 'http://<ip address>:<port>/query?' \
  --data-urlencode "db=myslurmdatabase" \
  --data-urlencode \
 'q=select "value" from /./ where "job"='"'$jobid' limit 1000 offset $offset" |
 jq -e '.results[0].series' > series_"$offset".json; do
 sleep 1
 offset=$(($offset + 1000)) # increment offset
done

# now series_"$offset".json is just the single word "null"
rm series_"$offset".json

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.