3

I have csv file that looks like this:

artist,year,id,video_name,new_video_id,file_root_name,video_type
,,,,,,
Clay Aiken,1,clay_aiken,Sorry Seems To Be...,sorry-seems-to-be,02_sc_ca_sorry,FLV
Clay Aiken,1,clay_aiken,Everything I Do (I Do It For You),everything-i-do-i-do-it-for-you,03_sc_ca_everything,FLV
Clay Aiken,1,clay_aiken,A Thousand Days,a-thousand-days,04_sc_ca_thousandda,FLV
Clay Aiken,1,clay_aiken,Here You Come Again,here-you-come-again,05_sc_ca_hereyoucom,FLV
Clay Aiken,1,clay_aiken,Interview,interview,06_sc_ca_intv,FLV

Each row from above would generate a separate xml file like below (5 to be precise):

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">
<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
  <head>
    <meta base="rtmp://cp23636.edgefcs.net/ondemand" />
  </head>
  <body>
    <switch>
      <video src="mp4:soundcheck/%year/%id/%file_root_name_256.mp4" system-bitrate="336000"/>
      <video src="mp4:soundcheck/%year/%id/%file_root_name_512.mp4" system-bitrate="592000"/>
      <video src="mp4:soundcheck/%year/%id/%file_root_name_768.mp4" system-bitrate="848000"/>
      <video src="mp4:soundcheck/%year/%id/%file_root_name_1128.mp4" system-bitrate="1208000"/>
    </switch>
  </body>
</smil>

Naming it %new_video_id.smil

I've figured out how to parse the csv file:

import csv
import sys

f = open(sys.argv[1], 'rU')
reader = csv.reader(f)
for row in reader:
    year = row[1]
    id = row[2]
    file_root_name = row[5]
    print year, id, file_root_name

How to I take each of the variables and include when writing the xml file?

12
  • why don't you build the xml while iterating through the csv? Commented Dec 29, 2011 at 20:19
  • not a single xml, each row is a new xml. Commented Dec 29, 2011 at 20:30
  • do you need to build this in memory or save each xml file to the disk? Commented Dec 29, 2011 at 20:33
  • @DavidNeudorfer What you mean is that each row is video tag in a single xml file? Commented Dec 29, 2011 at 20:33
  • each row will be a new xml file written to disk. each file will be be named using the new_video_id variable Commented Dec 29, 2011 at 20:36

2 Answers 2

5

I'd start with something like this:

import csv
import sys

from xml.etree import ElementTree
from xml.dom import minidom

video_data = ((256, 336000),
              (512, 592000),
              (768, 848000),
              (1128, 1208000))

with open(sys.argv[1], 'rU') as f:
    reader = csv.DictReader(f)
    for row in reader:
        switch_tag = ElementTree.Element('switch')

        for suffix, bitrate in video_data:
            attrs = {'src': ("mp4:soundcheck/{year}/{id}/{file_root_name}_{suffix}.mp4"
                             .format(suffix=str(suffix), **row)),
                     'system-bitrate': str(bitrate),
                     }
            ElementTree.SubElement(switch_tag, 'video', attrs)
        print minidom.parseString(ElementTree.tostring(switch_tag)).toprettyxml()

Basically as the csv file is parsed, an xml document is created using the attributes in the row to create video tags one by one.

Example output (for one row):

<?xml version="1.0" ?>
<switch>
    <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_256.mp4" system-bitrate="336000"/>
    <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_512.mp4" system-bitrate="592000"/>
    <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_768.mp4" system-bitrate="848000"/>
    <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_1128.mp4" system-bitrate="1208000"/>
</switch>

Note: ElementTree doesn't support pretty printing, so I used the trick explained in PyMOTW.

Sign up to request clarification or add additional context in comments.

4 Comments

I'm trying to create an xml file for each row and not an a single xml file with all rows. How would I modify this to do that?
@DavidNeudorfer I've updated my answer to do that. You just need to move the part that creates the xml file into the for loop to generate a new xml for each row.
@DavidNeudorfer I see, I've changed the code to generate multiple video tags for each csv row.
this is exactly what I needed but I'm still having trouble grasping xml manipulation in python and have followed up your script with a new question here: stackoverflow.com/questions/8674610/…
1

I'd consider treating your XML template as a format string: create a string whose value is the XML, replace all of that %year, %id, and %file_root_name. with %s, and then you can do:

 print xml_template % [year, id, file_root_name] * 3

This will only work, mind you, if the data in your csv contains XML-legal characters that don't need to be escaped; you need to preprocess each value to convert markup characters (<, >, ', ") to entities (&lt;, &gt;, &apos;, &quot;).

It's safer to use minidom and ElementTree to build your XML, as jcollado suggests.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.