0

I'm working with diffbot api in python, and they have a feature where you can send a batch request that contains 50 URLs in one HTTP requests. The problem is I do not know how I would construct such a script.

I'm getting stuck at the very start, but here is what I have.

import requests
import json

url = 'http://www.diffbot.com/api/'

batch = {"method": "GET", "relative_url": "/api/article?url=http%3A%2F%2Fblogs.wsj.com%2Fventurecapital%2F2012%2F05%2F31%2Finvestors-back-diffbots-visual-learning-robot-for-web-content%2F%3Fmod%3Dgoogle_news_blog%26token=XXX"},{"method": "GET", "relative_url": "/api/article?url=http%3A%2F%2Fgigaom.com%2Fcloud%2Fsilicon-valley-royalty-pony-up-2m-to-scale-diffbots-visual-learning-robot%2F%26token=XXX"}

r = requests.get(u+batch)

Now, of course I get the error that says str and tuples cannot concatenate, but I'm just lost as to how I'd pass a json object in the form of a URL.

If anybody could point me in the right direction it would be greatly appreciated.

Here is an example of how to perform the call with curl, if anybody knows how to recreate this in python.

I can't seem to get that to work. The following curl code is provided as an example in the documentation. Any idea of how to recreate it in python?

curl
    -d 'token=...'
    -d 'batch=[
            {"method": "GET", "relative_url": "/api/article?url=http%3A%2F%2Fblogs.wsj.com%2Fventurecapital%2F2012%2F05%2F31%2Finvestors-back-diffbots-visual-learning-robot-for-web-content%2F%3Fmod%3Dgoogle_news_blog%26token=..."},
        {"method": "GET", "relative_url": "/api/article?url=http%3A%2F%2Fgigaom.com%2Fcloud%2Fsilicon-valley-royalty-pony-up-2m-to-scale-diffbots-visual-learning-robot%2F%26token=..."}
    ]'
http://www.diffbot.com/api/batch

2 Answers 2

1

You probably want to serialize and then Base64 encode.

import base64
encoded = base64.urlsafe_b64encode(json.dumps(batch))

Now it's safe to embed in a URL.

To get it back to an object:

json.loads(base64.urlsafe_b64decode(encoded))
Sign up to request clarification or add additional context in comments.

Comments

1

According to the documentation, you should be doing the following things:

  1. Sending the data in a HTTP POST, not a GET.
  2. Sending the data in the request body, not the URL.
  3. Your batch should be in a list, not an implicit tuple.
  4. You should be sending your token as well.

The correct code would look more like:

import requests
import json

batch = [{"method": "GET", "relative_url": "/api/article?url=http%3A%2F%2Fblogs.wsj.com%2Fventurecapital%2F2012%2F05%2F31%2Finvestors-back-diffbots-visual-learning-robot-for-web-content%2F%3Fmod%3Dgoogle_news_blog%26token=XXX"},{"method": "GET", "relative_url": "/api/article?url=http%3A%2F%2Fgigaom.com%2Fcloud%2Fsilicon-valley-royalty-pony-up-2m-to-scale-diffbots-visual-learning-robot%2F%26token=XXX"}]
batch_dumped = json.dumps(batch)

token = 'sample token'

r = requests.post('http://www.diffbot.com/api/batch', data={'token': token, 'batch': batch_dumped})

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.