Output from function to text/CSV file?

Question

I am counting the number of contractions in a certain set of presidential speeches, and want to output these contractions to a CSV or text file. Here's my code:

import urllib2,sys,os,csv
from bs4 import BeautifulSoup,NavigableString
from string import punctuation as p
from multiprocessing import Pool
import re, nltk
import requests
import math, functools
import summarize
reload(sys)

def processURL_short(l):
    open_url = urllib2.urlopen(l).read()
    item_soup = BeautifulSoup(open_url)
    item_div = item_soup.find('div',{'id':'transcript'},{'class':'displaytext'})
    item_str = item_div.text.lower()
    return item_str

every_link_test = ['http://www.millercenter.org/president/obama/speeches/speech-4427',
'http://www.millercenter.org/president/obama/speeches/speech-4424',
'http://www.millercenter.org/president/obama/speeches/speech-4453',
'http://www.millercenter.org/president/obama/speeches/speech-4612',
'http://www.millercenter.org/president/obama/speeches/speech-5502']

data = {}
count = 0
for l in every_link_test:
    content_1 = processURL_short(l)
    for word in content_1.split():
        word = word.strip(p)
        if word in contractions:
            count = count + 1
        splitlink = l.split("/")
        president = splitlink[4]
        speech_num = splitlink[-1]
        filename = "{0}_{1}".format(president,speech_num)
    data[filename] = count
    print count, filename

   with open('contraction_counts.csv','w',newline='') as fp:
        a = csv.writer(fp,delimiter = ',')
        a.writerows(data)

Running that for loop prints out

79 obama_speech-4427 101 obama_speech-4424 101 obama_speech-4453 182 obama_speech-4612 224 obama_speech-5502

I want to export that to a text file, where the numbers on the left are one column, and the president/speech number are in the second column. My with statement just writes each individual row to a separate file, which is definitely suboptimal.

If you google write csv with python you get plenty of answers, try this one — Kyle Pittman
– Kyle Pittman, Commented Oct 8, 2015 at 14:21
Yeah, I've seen that. That output to CSV essentially put one letter in each column, and didn't even include the contraction count. — boot-scootin
– boot-scootin, Commented Oct 8, 2015 at 14:30
I would suggest editing this question or creating a new question regarding the code you tried to use to output the CSV - It's simpler for us to help you with the code you've already tried than for us to write you something from scratch. — Kyle Pittman
– Kyle Pittman, Commented Oct 8, 2015 at 14:33
The code I tried is at the tail end of the code above. It starts with with open('contraction_counts.csv'... — boot-scootin
– boot-scootin, Commented Oct 8, 2015 at 14:46

reticentroot · Accepted Answer · 2015-10-09 00:23:13Z

1

You can try something like this, this is a generic method, modify as you see fit

import csv
with open('somepath/file.txt', 'wb+') as outfile:
  w = csv.writer(outfile)
  w.writerow(['header1', 'header2'])
  for i in you_data_structure: # eg list or dictionary i'm assuming a list structure
    w.writerow([
      i[0],
      i[1],
    ])

or if a dictionary

import csv
with open('somepath/file.txt', 'wb+') as outfile:
  w = csv.writer(outfile)
  w.writerow(['header1', 'header2'])
  for k, v in your_dictionary.items(): # eg list or dictionary i'm assuming a list structure
    w.writerow([
      k,
      v,
    ])

edited Oct 9, 2015 at 0:23

answered Oct 8, 2015 at 15:07

reticentroot

3,6922 gold badges25 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

boot-scootin Over a year ago

I still get each letter of filename output to individual columns in 'contraction_counts.csv'... I'm trying to get one column for counts and another for filename.

reticentroot Over a year ago

Then you'll have to modify your data structure accordly, this snippet shows you how to do it, it's up to you to figure out how to use it. If I had a list of lists for example [[hi, hello], [by, bye]] this code would iterate over that list and I[0] would be hi on the first pass and by on the second pass.

boot-scootin Over a year ago

Ok. It's actually saved as a dictionary (e.g. {'washington_speech-3446': 8873,'washington_speech-3447': 8874, ...}, so a bit of modification and I should be good, I suppose.

reticentroot Over a year ago

okay, added an example of how it would work with a dictionary.

boot-scootin Over a year ago

okay. that definitely makes sense now, me being able to read the code for writing from a dictionary. thanks!

Serge Ballesta · Accepted Answer · 2015-10-08 15:11:03Z

Your problem is that you open the output file inside the loop in w mode, meaning that it is erased on each iteration. You can easily solve it in 2 ways:

mode the open outside of the loop (normal way). You will open the file only once, add a line on each iteration and close it when exiting the with block:

with open('contraction_counts.csv','w',newline='') as fp:
    a = csv.writer(fp,delimiter = ',')
    for l in every_link_test:
        content_1 = processURL_short(l)
        for word in content_1.split():
            word = word.strip(p)
            if word in contractions:
                count = count + 1
            splitlink = l.split("/")
            president = splitlink[4]
            speech_num = splitlink[-1]
            filename = "{0}_{1}".format(president,speech_num)
        data[filename] = count
        print count, filename
        a.writerows(data)

open the file in a (append) mode. On each iteration you reopen the file and write at the end instead of erasing it - this way uses more IO resources because of the open/close, and should be used only if the program can break and you want to be sure that all that was written before the crash has actually been saved to disk

for l in every_link_test:
    content_1 = processURL_short(l)
    for word in content_1.split():
        word = word.strip(p)
        if word in contractions:
            count = count + 1
        splitlink = l.split("/")
        president = splitlink[4]
        speech_num = splitlink[-1]
        filename = "{0}_{1}".format(president,speech_num)
    data[filename] = count
    print count, filename

    with open('contraction_counts.csv','a',newline='') as fp:
        a = csv.writer(fp,delimiter = ',')
        a.writerows(data)

Both of those solutions leave me back where I started - with the output to contraction_counts.csv being each letter of filename in its own individual column, with no inclusion of the actual contradiction counts.

Collectives™ on Stack Overflow

Output from function to text/CSV file?

2 Answers 2

5 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Linked

Related