0

I am counting the number of contractions in a certain set of presidential speeches, and want to output these contractions to a CSV or text file. Here's my code:

import urllib2,sys,os,csv
from bs4 import BeautifulSoup,NavigableString
from string import punctuation as p
from multiprocessing import Pool
import re, nltk
import requests
import math, functools
import summarize
reload(sys)

def processURL_short(l):
    open_url = urllib2.urlopen(l).read()
    item_soup = BeautifulSoup(open_url)
    item_div = item_soup.find('div',{'id':'transcript'},{'class':'displaytext'})
    item_str = item_div.text.lower()
    return item_str

every_link_test = ['http://www.millercenter.org/president/obama/speeches/speech-4427',
'http://www.millercenter.org/president/obama/speeches/speech-4424',
'http://www.millercenter.org/president/obama/speeches/speech-4453',
'http://www.millercenter.org/president/obama/speeches/speech-4612',
'http://www.millercenter.org/president/obama/speeches/speech-5502']

data = {}
count = 0
for l in every_link_test:
    content_1 = processURL_short(l)
    for word in content_1.split():
        word = word.strip(p)
        if word in contractions:
            count = count + 1
        splitlink = l.split("/")
        president = splitlink[4]
        speech_num = splitlink[-1]
        filename = "{0}_{1}".format(president,speech_num)
    data[filename] = count
    print count, filename

   with open('contraction_counts.csv','w',newline='') as fp:
        a = csv.writer(fp,delimiter = ',')
        a.writerows(data)

Running that for loop prints out

79 obama_speech-4427 101 obama_speech-4424 101 obama_speech-4453 182 obama_speech-4612 224 obama_speech-5502

I want to export that to a text file, where the numbers on the left are one column, and the president/speech number are in the second column. My with statement just writes each individual row to a separate file, which is definitely suboptimal.

4
  • 2
    If you google write csv with python you get plenty of answers, try this one Commented Oct 8, 2015 at 14:21
  • Yeah, I've seen that. That output to CSV essentially put one letter in each column, and didn't even include the contraction count. Commented Oct 8, 2015 at 14:30
  • 1
    I would suggest editing this question or creating a new question regarding the code you tried to use to output the CSV - It's simpler for us to help you with the code you've already tried than for us to write you something from scratch. Commented Oct 8, 2015 at 14:33
  • The code I tried is at the tail end of the code above. It starts with with open('contraction_counts.csv'... Commented Oct 8, 2015 at 14:46

2 Answers 2

1

You can try something like this, this is a generic method, modify as you see fit

import csv
with open('somepath/file.txt', 'wb+') as outfile:
  w = csv.writer(outfile)
  w.writerow(['header1', 'header2'])
  for i in you_data_structure: # eg list or dictionary i'm assuming a list structure
    w.writerow([
      i[0],
      i[1],
    ])

or if a dictionary

import csv
with open('somepath/file.txt', 'wb+') as outfile:
  w = csv.writer(outfile)
  w.writerow(['header1', 'header2'])
  for k, v in your_dictionary.items(): # eg list or dictionary i'm assuming a list structure
    w.writerow([
      k,
      v,
    ])
Sign up to request clarification or add additional context in comments.

5 Comments

I still get each letter of filename output to individual columns in 'contraction_counts.csv'... I'm trying to get one column for counts and another for filename.
Then you'll have to modify your data structure accordly, this snippet shows you how to do it, it's up to you to figure out how to use it. If I had a list of lists for example [[hi, hello], [by, bye]] this code would iterate over that list and I[0] would be hi on the first pass and by on the second pass.
Ok. It's actually saved as a dictionary (e.g. {'washington_speech-3446': 8873,'washington_speech-3447': 8874, ...}, so a bit of modification and I should be good, I suppose.
okay, added an example of how it would work with a dictionary.
okay. that definitely makes sense now, me being able to read the code for writing from a dictionary. thanks!
1

Your problem is that you open the output file inside the loop in w mode, meaning that it is erased on each iteration. You can easily solve it in 2 ways:

  1. mode the open outside of the loop (normal way). You will open the file only once, add a line on each iteration and close it when exiting the with block:

    with open('contraction_counts.csv','w',newline='') as fp:
        a = csv.writer(fp,delimiter = ',')
        for l in every_link_test:
            content_1 = processURL_short(l)
            for word in content_1.split():
                word = word.strip(p)
                if word in contractions:
                    count = count + 1
                splitlink = l.split("/")
                president = splitlink[4]
                speech_num = splitlink[-1]
                filename = "{0}_{1}".format(president,speech_num)
            data[filename] = count
            print count, filename
            a.writerows(data)
    
  2. open the file in a (append) mode. On each iteration you reopen the file and write at the end instead of erasing it - this way uses more IO resources because of the open/close, and should be used only if the program can break and you want to be sure that all that was written before the crash has actually been saved to disk

    for l in every_link_test:
        content_1 = processURL_short(l)
        for word in content_1.split():
            word = word.strip(p)
            if word in contractions:
                count = count + 1
            splitlink = l.split("/")
            president = splitlink[4]
            speech_num = splitlink[-1]
            filename = "{0}_{1}".format(president,speech_num)
        data[filename] = count
        print count, filename
    
        with open('contraction_counts.csv','a',newline='') as fp:
            a = csv.writer(fp,delimiter = ',')
            a.writerows(data)
    

1 Comment

Both of those solutions leave me back where I started - with the output to contraction_counts.csv being each letter of filename in its own individual column, with no inclusion of the actual contradiction counts.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.