So... The problem is that the csv writer doesn't understand the concept of "subdictionaries" as mongo returns it.
If I understood correctly, when you query Mongo, you get a dictionary like this:
{
"_id": "a hex ID that correspond with the record that contains several answers",
"answers": [ ... a list with a bunch of dicts in it... ]
}
So when the csv.DictWriter tries to write that, it is only writing one dictionary (the topmost). It doesn't know (or cares) that answers is a list that contains dictionaries whose values need to be written in columns as well (accessing fields in dictionaries using the dot notation such as answers.order is only understood by Mongo, not by the csv writer)
What I understand you should do is "walk" the list of answers and create one dictionary out of each record (each dictionary) in that list. Once you have a list of "flattened" dictionaries you can pass those and write them in your csv file:
cursor = client.stack_overflow.stack_039.find(
{}, {'_id': 1, 'answers.order': 1, 'answers.text': 1, 'answers.answerId': 1})
# Step 1: Create the list of dictionaries (one dictionary per entry in the `answers` list)
flattened_records = []
for answers_record in cursor:
answers_record_id = answers_record['_id']
for answer_record in answers_record['answers']:
flattened_record = {
'_id': answers_record_id,
'answers.order': answer_record['order'],
'answers.text': answer_record['text'],
'answers.answerId': answer_record['answerId']
}
flattened_records.append(flattened_record)
# Step 2: Iterate through the list of flattened records and write them to the csv file
with open('stack_039.csv', 'w') as outfile:
fields = ['_id', 'answers.order', 'answers.text', 'answers.answerId']
write = csv.DictWriter(outfile, fieldnames=fields)
write.writeheader()
for flattened_record in flattened_records:
write.writerow(flattened_record)
Whatch for the use of plurals. answers_record is different than answer_record
That creates a file like this:
$ cat ./stack_039.csv
_id,answers.order,answers.text,answers.answerId
580f9aa82de54705a2520833,0,{u'en': u'Yes'},527d65de7563dd0fb98fa28c
580f9aa82de54705a2520833,1,{u'en': u'No'},527d65de7563dd0fb98fa28b
EDIT:
Your query (the one that makes cursor = db.questions.find ({},{'_id':1, 'answers.order':1, 'answers.text':1, 'answers.answerId':1})) will return all the entries in the questions collection. If this collection is very large, you might want to use the cursor as an iterator.
As you might have already realized, the first for loop in my code above puts all the records in a list (the flattened_records list). You can do lazy loading by iterating through the cursor (instead of loading all the items in memory, fetch one, do something with it, get the next, do something with it...).
It's slightly slower, but more memory efficient.
cursor = client.stack_overflow.stack_039.find(
{}, {'_id': 1, 'answers.order': 1, 'answers.text': 1, 'answers.answerId': 1})
with open('stack_039.csv', 'w') as outfile:
fields = ['_id', 'answers.order', 'answers.text', 'answers.answerId']
write = csv.DictWriter(outfile, fieldnames=fields)
write.writeheader()
for answers_record in cursor: # Here we are using 'cursor' as an iterator
answers_record_id = answers_record['_id']
for answer_record in answers_record['answers']:
flattened_record = {
'_id': answers_record_id,
'answers.order': answer_record['order'],
'answers.text': answer_record['text'],
'answers.answerId': answer_record['answerId']
}
write.writerow(flattened_record)
It will produce the same .csv file as shown above.