1

my code so far:

import csv

myIds = ['1234','3456','76']
countries = []

# open the file
with open('my.csv', 'r') as infile:
  # read the file as a dictionary for each row ({header : value})
  reader = csv.DictReader(infile)
  data = {}
  for row in reader:
    for header, value in row.items():
      try:
        data[header].append(value)
      except KeyError:
        data[header] = [value]

# extract the variables and assign to lists
myFileIds = data['id']
myFileCountry = data['country']
listfromfile = [a + " " + b for a, b in zip(myFileIds, myFileCountry)]

the above gives this results in listfromfile as follows:

listfromfile = ['1 Uruguay', '2 Vatican', '1234 US', '3456 UK', '5678 Brazil','10111 Argentina','234567 Spain']

I'm aiming for a list with countries for which IDs occur in my.csv file, but it's also possible that id from myIds list won't be present in my.csv file. Then I need that place on the list show value as 'Unsupported Country'. Both lists should have the same length myIds and countries so I will know that first id on my list corresponds with first country on the other list etc. Desired outcome:

myIds = ['1234','3456','76']
countries = ['US', 'UK', 'Unsupported Country']

Alternatively I'm trying with pandas but also no luck :(

import pandas as pd

df=pd.read_csv('my.csv')
myIds = ['1234','3456','76']

countries = df.loc[df["id"].isin(myIds),"country"].tolist()

my.csv:

id     country
1      Uruguay
2      Vatican
1234   US
3456   UK
5678   Brazil
10111  Argentina
234567 Spain

Could someone help me with this please? thanks in advance!

1
  • Assuming the text in the file is exactly as you have it in your example, to get a dataframe from the file: pd.read_csv("my.csv", sep=r'\s+') You need to specify the separator. Check my answer for other alternatives. Commented Apr 22, 2020 at 21:56

3 Answers 3

1

You can achieve this using dataframes.

import pandas as pd
input_df = pd.read_csv("test.csv")
myIds = ['1234','3456','76']
my_ids_df = pd.DataFrame(myIds,columns=['id']).astype(int)
output_df = pd.merge(input_df, my_ids_df, on=['id'], how='right')
output_df['country'] = output_df['country'].fillna('Unsupported Country')
print(list(zip(output_df['id'].values.tolist(),output_df['country'].values.tolist())))
Sign up to request clarification or add additional context in comments.

Comments

1

Maybe this would be useful for your purposes:

This assumes your file data is as you have it in your example. Otherwise, you could split on another character.

>>> from collections import defaultdict
>>> country_data = defaultdict(lambda: 'Unsupported Country')
>>> 
>>> for line in open("my.csv", 'r'):
...     try:
...         id, country = line.split()
...         country_data[int(id)] = country
...         country_data[country] = int(id)
...     except ValueError:
...         pass # Row isn't in the right format. Skip it.
...         
>>> country_data['Vatican']
2
>>> country_data[2]
'Vatican'
>>> country_data['Moojoophorbovia']
'Unsupported Country'
>>> 

If you're not trying to fit a square peg into a round hole by assuming you need two lists that you have to keep in sync - and then trying to fit your file data into them, the above might solve the problem of reading in country data and having it accessible by ID index, or getting the ID from the name of the country.

Comments

0
 import pandas as pd

 myIds = ['1234','3456','76']

 df = pd.DataFrame(myIds, columns=['id'])

 fields=['id', 'country']

 df = df1
 df2 = pd.read_csv('my.csv', sep = ',', usecols=fields)
 df3 = df1.merge(df2, on="id", how='left')
 df3['country'].fillna('Unsupported Country', inplace=True)
 del df3['id']
 countries = df3['country'].tolist()

the above works for me. However, still trying to find an easier solution.

1 Comment

Similar solution is provided by @ketan-krishna-patil earlier. Duplicate answer

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.