How to write into csv using the results generated from a DataFrame in python?

Question

I am reading data from a tsv file using DataFrame from Pandas module in Python.

df = pandas.DataFrame.from_csv(filename, sep='\t')

The file has around 5000 columns (4999 test parameters and 1 result / output value).

I iterate through the entire tsv file and check if the result value matches the value that is expected. I then write this row inside another csv file.

expected_value = 'some_value'
with open(file_to_write, 'w') as csvfile:
  csvfwriter = csv.writer(csvfile, delimiter='\t')
  for row in df.iterrows():
    result = row['RESULT']
    if expected_value.lower() in str(result).lower():
        csvwriter.writerow(row)

But in the output csv file, the result is not proper, i.e. the individual column values are not going into their respective columns / cells. It is getting appended as rows. How do I write this data correctly in the csv file?

The answers suggested works well however, I need to check for multiple conditions. I have a list which has some values:

vals = ['hello', 'foo', 'bar'] One of the column for all the rows has values that looks like this 'hello,foo,bar'. I need to do two checks, one if any value in the vals list is present in the column with the values 'hello, foo, bar' or if the result value matches the expected value. I have written the following code

df = pd.DataFrame.from_csv(filename, sep='\t')
for index, row in df.iterrows():
  csv_vals = row['COL']
  values = str(csv_vals).split(",")
  if(len(set(vals).intersection(set(values))) > 0 or expected_value.lower() in str(row['RESULT_COL'].lower()):
    print row['RESULT_COL']

First, I wouldn't try to compare numbers using their string representation...It won't work if you have more or less decimal places etc. Cast them to float and check for equality. Second, can't you just do the modification in pandas and then output a full csv file using df.to_csv(file_to_write)? — Julien Marrec
– Julien Marrec, Commented Nov 29, 2016 at 10:23
How do I do the modification in pandas? Moreover, is it possible to create a separate dataframe with the rows that I would be interested in? — Spider Man
– Spider Man, Commented Nov 29, 2016 at 10:26

Julien Marrec · Accepted Answer · 2016-11-29 10:40:14Z

4

You should create a dataframe where you have a column 'RESULT' and one 'EXPECTED'.

Then you can filter the rows where both match and output only these to csv using:

df.ix[df['EXPECTED']==df['RESULT']].to_csv(filename)

answered Nov 29, 2016 at 10:40

Julien Marrec

11.9k5 gold badges51 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Spider Man Over a year ago

Looks like a good approach. I am checking out a case where I need to check on multiple conditions. @JulienMarrec

Spider Man Over a year ago

Moreover, some of the rows in the file contain NA / NAN values. It throws an error "Cannot index with vector containing NA / NaN values"

Julien Marrec Over a year ago

If you would include a mcve to your question that would help to test your specific application

zipa · Accepted Answer · 2016-11-29 13:12:36Z

1

You can filter the values like this:

df[df['RESULT'].str.lower().str.contains(expected_value.lower())].to_csv(filename)

This will work for filtering values that contain your expected_value as you did in your code. If you want to get exact match you can use:

df.loc[df['Result'].str.lower() == expected_value.lower()].to_csv(filename)

As you suggested in comment, for multiple criteria you will need something like this:

expected_values = [expected_value1, expected_value2, expected_value3]
df[df['Result'].isin(expected_values)]

UPDATE:

And to filter on multiple criteria and to filter desired column:

df.ix[df.isin(vals).any(axis=1)].loc[df['Result'].str.lower() == expected_value.lower()].to_csv(filename)

edited Nov 29, 2016 at 13:12

answered Nov 29, 2016 at 10:36

zipa

28k6 gold badges45 silver badges62 bronze badges

7 Comments

zipa Over a year ago

Are you sure that Result is numerical? He didn't mention that anywhere in question.

zipa Over a year ago

Why? Maybe you are correct :) BTW, just add df['EXPECTED']=expected_value to your code, because if you are correct your solution is better and that part is missing if you ask me.

Spider Man Over a year ago

Yeah, the result field may or may not be numerical that is why I am treating it as a string. This approach looks good, I am trying out the case where I have multiple check conditions @Boris

zipa Over a year ago

Sure, I added another approach that might help. You can also combine first and last approach to both convert to string and look for multiple criteria.

Spider Man Over a year ago

I have updated my problem definition if you can still take a look at it @Boris

|

Collectives™ on Stack Overflow

How to write into csv using the results generated from a DataFrame in python?

2 Answers 2

3 Comments

7 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

7 Comments

Related