0

Write a program which reads a text file called input.txt which contains an arbitrary number of lines of the form ", " then records this information using a dictionary, and finally outputs to the screen a list of countries represented in the file and the number of cities contained.

For example, if input.txt contained the following:

New York, US
Angers, France
Los Angeles, US
Pau, France
Dunkerque, France
Mecca, Saudi Arabia

The program would output the following (in some order):

Saudi Arabia : 1
US : 2
France : 3

My code:

from os import dirname

def parseFile(filename, envin, envout = {}):
    exec "from sys import path" in envin
    exec "path.append(\"" + dirname(filename) + "\")" in envin
    envin.pop("path")
    lines = open(filename, 'r').read()
    exec lines in envin
    returndict = {}
    for key in envout:
        returndict[key] = envin[key]
    return returndict

I get a Syntax error: invalid syntax... when I use my file name i used file name input.txt

4
  • You forgot to post the code you've written so far, the problem with it, and how you've tried/been unable to solve it. Then, someone might be able to help you. Commented Apr 9, 2011 at 16:11
  • So, what have you tried so far? Commented Apr 9, 2011 at 16:11
  • How are you calling parseFile() and what is the syntax error you are seeing? Commented Apr 9, 2011 at 16:24
  • 1
    The normal way to do this would be to use a collections.Counter together with a csv.reader. You don't need the path manipulation either: you can just open the absolute path to the file. And for goodness sake don't use exec! What's that even meant to be doing? Commented Apr 9, 2011 at 16:32

4 Answers 4

4

I don't understand what you are trying to do, so I can't really explain how to fix it. In particular, why are you execing the lines of the file? And why write exec "foo" instead of just foo? I think you should go back to a basic Python tutorial...

Anyway, what you need to do is:

  • open the file using its full path
  • for line in file: process the line and store it in a dictionary
  • return the dictionary

That's it, no exec involved.

Sign up to request clarification or add additional context in comments.

Comments

3

Yup, that's a whole lot of crap you either don't need or shouldn't do. Here's how I'd do it prior to Python 2.7 (after that, use collections.Counter as shown in the other answers). Mind you, this'll return the dictionary containing the counts, not print it, you'd have to do that externally. I'd also not prefer to give a complete solution for homeworks, but it's already been done, so I suppose there's no real damage in explaining a bit about it.

def parseFile(filename):
  with open(filename, 'r') as fh:
    lines = fh.readlines()
    d={}
    for country in [line.split(',')[1].strip() for line in lines]:
      d[country] = d.get(country,0) + 1
    return d

Lets break that down a bit, shall we?

  with open(filename, 'r') as fh:
    lines = fh.readlines()

This is how you'd normally open a text file for reading. It will raise an IOError exception if the file doesn't exist or you don't have permissions or the likes, so you'll want to catch that. readlines() reads the entire file and splits it into lines, each line becomes an element in a list.

    d={}

This simply initializes an empty dictionary

    for country in [line.split(',')[1].strip() for line in lines]:

Here is where the fun starts. The bracket enclosed part to the right is called a list comprehension, and it basically generates a list for you. What it pretty much says, in plain english, is "for each element 'line' in the list 'lines', take that element/line, split it on each comma, take the second element (index 1) of the list you get from the split, strip off any whitespace from it, and use the result as an element in the new list" Then, the left part of it just iterates over the generated list, giving the name 'country' to the current element in the scope of the loop body.

      d[country] = d.get(country,0) + 1

Ok, ponder for a second what would happen if instead of the above line, we'd used the following:

      d[country] = d[country] + 1

It'd crash, right (KeyError exception), because d[country] doesn't have a value the first time around. So we use the get() method, all dictionaries have it. Here's the nifty part - get() takes an optional second argument, which is what we want to get from it if the element we're looking for doesn't exist. So instead of crashing, it returns 0, which (unlike None) we can add 1 to, and update the dictionary with the new count. Then we just return the lot of it.

Hope it helps.

3 Comments

Nice explanation! It's better practice to do for country in [line.split(',')[1].strip() for line in fh]:
You're quite right, it is. I didn't really want to complicate the list comp more than necessary by 'more magic' though, and for ease of explanation left it outside the loop.
I just connected to add my "+1", from any subject I never read such a clear and educational answer. The list comprehension is really more pythonic.
1

I would use a defaultdict plus a list to mantain the structure of the information. So additional statistics can be derived.

import collections

def parse_cities(filepath):
    countries_cities_map = collections.defaultdict(list)
    with open(filepath) as fd:
        for line in fd:
            values = line.strip().split(',')
            if len(values) == 2:
                city, country = values
                countries_cities_map[country].append(city)
    return countries_cities_map

def format_cities_per_country(countries_cities_map):
    for country, cities in countries_cities_map.iteritems():
        print " {ncities} Cities found in {country} country".format(country=country, ncities = len(cities))


if __name__ == '__main__':
    import sys
    filepath = sys.argv[1]
    format_cities_per_country(parse_cities(filepath))

Comments

1
import collections

def readFile(fname):
    with open(fname) as inf:
        return [tuple(s.strip() for s in line.split(",")) for line in inf]

def countCountries(city_list):
    return collections.Counter(country for city,country in city_list)

def main():
    cities = readFile("input.txt")
    countries = countCountries(cities)

    print("{0} cities found in {1} countries:".format(len(cities), len(countries)))

    for country, num in countries.iteritems():
        print("{country}: {num}".format(country=country, num=num))

if __name__=="__main__":
    main()

3 Comments

@Hugh Bothwell: I realize this is a matter of taste, but I like to dial down the 'give 100% complete answer' knob on homework questions.
"It's usually better not to provide a complete code sample if you believe it would not help the student, using your best judgment. You can use pseudo-code first, and, in the spirit of creating a programming resource, you may come back after a suitable amount of time and edit your response to include more complete code. This way, the student still has to write their own code, but a full solution can still become available after the assignment has ended." --Answering homework questions, Meta
@phooji: true; I gave a full answer because (a) it's such a simple question, (b) she was headed in such a weird direction, and (c) it allows me to demonstrate a variety of ideas (generators and comprehensions, positional and named arguments, difference between passed argument name and name in function, etc).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.