0

I have a large amount of data in CSV Format that looks like this:

(u'Sat Jan 17 18:56:05 +0000 2015', u'anx321', 'RT @ManojHarry27: If India loses 2015 worldcup, Karishma\ntanna will be held responsible !!! #BB8', '0.0453125', '0.325')
(u'Sat Jan 17 18:56:13 +0000 2015', u'FrancisKimberl3', 'Python form imploration overgrowth-the consummative the very best as representing construction upsurge: sDGy', '1.0', '0.39')
(u'Sat Jan 17 18:56:18 +0000 2015', u'AllTechBot', 'RT @ruby_engineer: A workshop on monads with C++14 http://t.co/OKFc91J0QJ #hacker #rubyonrails #python #AllTech', '0.0', '0.0')
(u'Sat Jan 17 18:56:22 +0000 2015', u'python_job', ' JOB ALERT  #ITJob #Job #New York - Senior Software Engineer Python Backed by First Round http://t.co/eqVxoMzYMG  view full details', '0.245454545455', '0.44595959596')
(u'Sat Jan 17 18:56:23 +0000 2015', u'weepingtaco', 'Python: basic but beautiful', '0.425', '0.5625')
(u'Sat Jan 17 18:56:27 +0000 2015', u'python_IT_jobs', ' JOB ALERT  #ITJob #Job #New York - Senior Software Engineer Python Backed by First Round http://t.co/gavWyraNqE  view full details', '0.245454545455', '0.44595959596')
(u'Sat Jan 17 18:56:32 +0000 2015', u'accusoftinfoway', 'RT @findmjob: DevOps Engineer http://t.co/NasdBEEnRp #aws #perl #mysql #linux #hadoop #python #Puppet #jobs #hiring #careers', '0.0', '0.0')
(u'Sat Jan 17 18:56:32 +0000 2015', u'accusoftinfoway', 'RT @arnicas: Very useful - end to end deploying python flask on AWS RT @matt_healy: Great tutorial: https://t.co/RsiM09qJsJ #flask #python ', '0.595', '0.375')
(u'Sat Jan 17 18:56:36 +0000 2015', u'denisegregory10', "Oh you can't beat a good 'python' argument! http://t.co/ELo3GvNsuE via @youtube", '0.875', '0.6')
(u'Sat Jan 17 18:56:38 +0000 2015', u'NoSQLDigest', 'RT @KirkDBorne: R and #Python starter code for participating in @BoozAllen #DataScience Bowl: http://t.co/Q5C01eya95 #abdsc #DataSciBowl #B', '0.0', '0.0')
(u'Sat Jan 17 19:00:05 +0000 2015', u'RedditPython', '"academicmarkdown": a Python module for academic writing with Markdown. Haven\'t tried it o... https://t.co/uv8yFaz6cv http://t.co/EhiIIO7uTW', '0.0', '0.0')
(u'Sat Jan 17 19:00:28 +0000 2015', u'shopawol', 'Only 8.5 and 12 left make sure to get yours \nhttp://t.co/4rxmHqP2Qs\n#wdywt #goawol #sneakerheads http://t.co/wACIOdlGwY', '0.166666666667', '0.62962962963')
(u'Sat Jan 17 19:00:31 +0000 2015', u'AuthorBee', "RT @_kevin_ewb_: I know what your girl won't she just wanna kick it like the #WorldCup ", '0.0', '0.0')
(u'Sat Jan 17 19:00:37 +0000 2015', u'g33kmaddy', 'RT @KirkDBorne: R and #Python starter code for participating in @BoozAllen #DataScience Bowl: http://t.co/Q5C01eya95 #abdsc #DataSciBowl #B', '0.0', '0.0')
(u'Sat Jan 17 19:00:45 +0000 2015', u'Altfashion', 'Photo: A stunning photo of Kaoris latex dreams beautiful custom python bra. Photographer: MagicOwenTog... http://t.co/KdWnr3I8xP', '0.675', '1.0')
(u'Sat Jan 17 19:00:46 +0000 2015', u'oh226twt', 'Python programming: Easy and Step by step Guide for Beginners: Learn Python (English Edition) http://t.co/9optdOCrtE 1532', '0.216666666667', '0.416666666667')
(u'Sat Jan 17 19:00:50 +0000 2015', u'DvSpacefest', 'RT @Pomerantz: Potential team in the Learning XPRIZE looking for Python coders. Details: https://t.co/nGgrmYmXCa', '0.0', '1.0')
(u'Sat Jan 17 19:01:04 +0000 2015', u'cun45', 'SPORTS And More: #Cycling #Ciclismo  U23 #Portugal #WorldCup team o... http://t.co/FBeqatfu85', '0.5', '0.5')
(u'Sat Jan 17 19:01:12 +0000 2015', u'insofferentexo', 'RT @FISskijumping: Dawid is already at the hill in Zakopane, in a larger than life format! #skijumping #worldcup http://t.co/SDOnxDwfIX', '0.0', '0.5')
(u'Sat Jan 17 19:01:17 +0000 2015', u'beuhe', 'Madrid Tawarkan Khedira ke Dortmund: Real Madrid dikabarkan telah menawarkan Sami Khedira ... http://t.co/R5YCKjECtm #football #worldcup', '0.2', '0.3')
(u'Sat Jan 17 19:01:18 +0000 2015', u'ITJobs_Karen', ' JOB ALERT  #ITJob #Job #Paradise Valley - Python / Django Developer http://t.co/0Xn1k0cL5B  view full details', '0.35', '0.55')
(u'Sat Jan 17 19:01:22 +0000 2015', u'DonnerBella', 'So confused about #meninist . Monty Python, is that you?', '-0.4', '0.7')
(u'Sat Jan 17 19:01:34 +0000 2015', u'DoggingTeens', '#Dogging,#OutdoorSex,#Sluts,#GangBang,#Stockings,#Uk_Sex: 13 Inch Black Python Being Sucked http://t.co/n9Yv4nhcxo', '-0.166666666667', '0.433333333333')
(u'Sat Jan 17 19:02:03 +0000 2015', u'WorldCupFNH', 'Soccer-La Liga results and standings:  #FNH #WorldCup #Russia2018 #WC2018 http://t.co/3JOOnBQzvG', '0.0', '0.0')
(u'Sat Jan 17 19:02:03 +0000 2015', u'WorldCupFNH', 'Soccer-La Liga summaries:  #FNH #WorldCup #Russia2018 #WC2018 http://t.co/AZgxr5Z9EV', '0.0', '0.0')
(u'Sat Jan 17 19:02:03 +0000 2015', u'WorldCupFNH', "Soccer-Late Congo goal spoils Equatorial Guinea's party:  #FNH #WorldCup #Russia2018 #WC2018 http://t.co/W6Ff4HikxH", '0.0', '0.0')
(u'Sat Jan 17 19:02:04 +0000 2015', u'WorldCupFNH', 'Soccer-Ligue 1 top scorers:  #FNH #WorldCup #Russia2018 #WC2018 http://t.co/WS2lcZnzKu', '0.5', '0.5')
(u'Sat Jan 17 19:02:04 +0000 2015', u'WorldCupFNH', 'Soccer-Pearce answers critics as Forest seal unlikely win:  #FNH #WorldCup #Russia2018 #WC2018 http://t.co/Qb5PKuls6z', '0.15', '0.45')
(u'Sat Jan 17 19:02:04 +0000 2015', u'WorldCupFNH', 'Soccer-Israeli championship results and standings:  #FNH #WorldCup #Russia2018 #WC2018 http://t.co/dce9Qn9oI5', '0.0', '0.0')
(u'Sat Jan 17 19:02:07 +0000 2015', u'Jeff88Ho', 'RT @artwisanggeni: #python jweede.recipe.template 1.2.3: Buildout recipe for making files out of Jinja2 templates http://t.co/dgeuuFWf19', '0.0', '0.0')
(u'Sat Jan 17 19:02:07 +0000 2015', u'Jeff88Ho', 'RT @artwisanggeni: #python aclhound 1.7.5: ACL Compiler http://t.co/fNOFSYd7FJ', '0.0', '0.0')
(u'Sat Jan 17 19:02:08 +0000 2015', u'Jeff88Ho', 'RT @artwisanggeni: #python Flask-Goat 0.2.0: Flask plugin for security and user administration via GitHub OAuth & organization http://t.co/', '0.0', '0.0')
(u'Sat Jan 17 19:02:08 +0000 2015', u'Jeff88Ho', 'RT @artwisanggeni: #python filewatch 0.0.6: Python File Watcher http://t.co/fIHLagCqvf', '0.0', '0.0')
(u'Sat Jan 17 19:02:16 +0000 2015', u'HeatherA789', "Programming Python: Start Learning Python Today, Even If You've Never Coded Before (A Beginner's Guide): http://t.co/3Ss4cwCvP6", '0.0', '0.0')
(u'Sat Jan 17 19:02:18 +0000 2015', u'HeatherA789', 'Python: Learn Python in One Day and Learn It Well. Python for Beginners with Hands-on Project.: Python: Learn http://t.co/zvLIpydd6V', '0.0', '0.0')
(u'Sat Jan 17 19:02:26 +0000 2015', u'AlexeiCherenkov', 'It looks like I should learn Python. Do you think I can do this during 3 hours tomorrow? Yes-Rt; No-Fav.', '0.0', '0.0')
(u'Sat Jan 17 19:02:33 +0000 2015', u'cleansheet', "#WorldCup Cricket World Cup: Australia should've picked a leg-spinner and named Steve Smith vice-captain ... http://t.co/kgXgUVbHDd", '0.0', '0.0')
(u'Sat Jan 17 19:02:34 +0000 2015', u'cleansheet', '#WorldCup Younger Northug earns 1st cross-country World Cup victory http://t.co/y7jozMriFG', '0.0', '0.0')
(u'Sat Jan 17 19:02:35 +0000 2015', u'cleansheet', '#WorldCup ICC World Cup 2015: School massacre survivors inspire Pakistan team http://t.co/Tj1jpCZsj6', '0.0', '0.0')
(u'Sat Jan 17 19:02:35 +0000 2015', u'cleansheet', '#WorldCup We Want to Win World Cup for Peshawar Schoolkids: Misbah-ul-Haq http://t.co/RbeBkrv69s', '0.8', '0.4')
(u'Sat Jan 17 19:02:38 +0000 2015', u'world_latest', 'New: Equatorial Guinea 1-1 Congo http://t.co/32sfrrbBOW #follow #worldcup world_latest world_latest', '0.136363636364', '0.454545454545')
(u'Sat Jan 17 19:02:39 +0000 2015', u'FAHAD_CTID', 'RT @fawadiii: @FAHAD_CTID @VeronaPerqukuu Hahaha. Hanw ;) bdw worldcup bhi hai 15 sy :D', '0.483333333333', '0.8')
(u'Sat Jan 17 19:02:43 +0000 2015', u'amazon_mybot', '#3: Python http://t.co/LLzeKQQBon', '0.0', '0.0')
(u'Sat Jan 17 19:02:45 +0000 2015', u'LarryMesast', '#javascript #html5 #UX #Python #agile #DDD', '0.5', '0.75')
(u'Sat Jan 17 19:02:46 +0000 2015', u'washim987', 'RT @anjali_damania: I was angry at @shaziailmi & @thekiranbedi My husband calms me down & says. Haame Worldcup jitna hai. Sirf Pakistan se ', '-0.327777777778', '0.644444444444')
(u'Sat Jan 17 19:03:02 +0000 2015', u'sksh_rana', '"@ManojHarry27: If India loses 2015 worldcup, Karishma\ntanna will be held responsible !!! #BB8"\n@TheFarahKhan @BeingSalmanKhan', '0.0453125', '0.325')
(u'Sat Jan 17 19:03:14 +0000 2015', u't_kohyama', '@_3mame PythonMatlabPython', '0.0', '0.0')
(u'Sat Jan 17 19:03:16 +0000 2015', u'AntonShipulin', '#photo #worldcup #flowerceremony #sprint #Ruhpolding http://t.co/fe9qpiwsqJ', '0.0', '0.0')
(u'Sat Jan 17 19:03:22 +0000 2015', u'karthik_vik', 'RT @ValaAfshar: Highest paying programming languages, ranked by salary:\n\n1 Ruby\n2 Objective C\n3 Python\n4 Java\n\nhttp://t.co/RudytdjFLC http:', '0.0', '0.1')

Right now I plot the data with the following script:

import matplotlib
matplotlib.use('Agg') 
from matplotlib.mlab import csv2rec
import matplotlib.pyplot as plt    
import matplotlib.dates as mdates  
from pylab import *             
from datetime import datetime
import dateutil
from dateutil import parser
import re
import os
import operator
import csv

input_filename="test_output.csv"
output_image_namep='polarity.png'
output_image_name2='subjectivity.png'

input_file = open(input_filename, 'r')

data = csv2rec(input_file, names=['time', 'name', 'message', 'polarity', 'subjectvity'])  

time_list = []
polarity_list = []

''' I am aware there's a much more concise way of doing this'''

for line in data:
    td = line['time']

    ''' stupid regex '''
    s = re.sub('\(\u', '', td)
    dtime = parser.parse(s)
    dtime = re.sub('-', '', str(dtime))
    dtime = re.sub(' ', '', dtime)
    dtime = re.sub('\+00:00', '', dtime)
    dtime = re.sub(':', '', dtime)
    dtime = dtime[:-2]

    try:
        subjectivity = float(line['subjectivity'].replace("'", '').replace(")", ''))

    except:
        pass

    print dtime, polarity

    time_list.append( str(dtime) )
    polarity_list.append( polarity )

rcParams['figure.figsize'] = 10, 4                              
rcParams['font.size'] = 8                                       

fig = plt.figure()                                              

plt.plot([time_list], [polarity_list], 'ro')
axes = plt.gca()
axes.set_ylim([-1,1])

plt.savefig(output_image_namep)

It ends up looking like:

enter image description here

Which is fine but I would like the X axis to display the date labels correctly. Right now I'm doing some ugly regex to strip the date down to YYYYMMDDHHMM.

1 Answer 1

1

What about this:

import time

def format_time_label(original):
    return time.strftime('%Y%m%d%H%M', 
                         time.strptime(original, "%a %b %d %H:%M:%S +0000 %Y"))

Example:

>>> format_time_label('Sat Jan 17 19:00:50 +0000 2015')
'201501171900'

This works only if every date in your data has timezone offset +0000, as there seems to be no code in Python standard library to recognize this.

You can change parsing format expression accordingly to account for leftovers from your data format:

def format_time_label(original):
    return time.strftime('%Y%m%d%H%M', 
                         time.strptime(original, "(u'%a %b %d %H:%M:%S +0000 %Y'"))

>>> format_time_label("(u'Sat Jan 17 18:56:05 +0000 2015'")
'201501171856'
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.