0

I am reading data from a tsv file using DataFrame from Pandas module in Python.

df = pandas.DataFrame.from_csv(filename, sep='\t')

The file has around 5000 columns (4999 test parameters and 1 result / output value).

I iterate through the entire tsv file and check if the result value matches the value that is expected. I then write this row inside another csv file.

expected_value = 'some_value'
with open(file_to_write, 'w') as csvfile:
  csvfwriter = csv.writer(csvfile, delimiter='\t')
  for row in df.iterrows():
    result = row['RESULT']
    if expected_value.lower() in str(result).lower():
        csvwriter.writerow(row)

But in the output csv file, the result is not proper, i.e. the individual column values are not going into their respective columns / cells. It is getting appended as rows. How do I write this data correctly in the csv file?

The answers suggested works well however, I need to check for multiple conditions. I have a list which has some values:

vals = ['hello', 'foo', 'bar'] One of the column for all the rows has values that looks like this 'hello,foo,bar'. I need to do two checks, one if any value in the vals list is present in the column with the values 'hello, foo, bar' or if the result value matches the expected value. I have written the following code

df = pd.DataFrame.from_csv(filename, sep='\t')
for index, row in df.iterrows():
  csv_vals = row['COL']
  values = str(csv_vals).split(",")
  if(len(set(vals).intersection(set(values))) > 0 or expected_value.lower() in str(row['RESULT_COL'].lower()):
    print row['RESULT_COL']
2
  • 1
    First, I wouldn't try to compare numbers using their string representation...It won't work if you have more or less decimal places etc. Cast them to float and check for equality. Second, can't you just do the modification in pandas and then output a full csv file using df.to_csv(file_to_write)? Commented Nov 29, 2016 at 10:23
  • How do I do the modification in pandas? Moreover, is it possible to create a separate dataframe with the rows that I would be interested in? Commented Nov 29, 2016 at 10:26

2 Answers 2

4

You should create a dataframe where you have a column 'RESULT' and one 'EXPECTED'.

Then you can filter the rows where both match and output only these to csv using:

df.ix[df['EXPECTED']==df['RESULT']].to_csv(filename)
Sign up to request clarification or add additional context in comments.

3 Comments

Looks like a good approach. I am checking out a case where I need to check on multiple conditions. @JulienMarrec
Moreover, some of the rows in the file contain NA / NAN values. It throws an error "Cannot index with vector containing NA / NaN values"
If you would include a mcve to your question that would help to test your specific application
1

You can filter the values like this:

df[df['RESULT'].str.lower().str.contains(expected_value.lower())].to_csv(filename)

This will work for filtering values that contain your expected_value as you did in your code. If you want to get exact match you can use:

df.loc[df['Result'].str.lower() == expected_value.lower()].to_csv(filename)

As you suggested in comment, for multiple criteria you will need something like this:

expected_values = [expected_value1, expected_value2, expected_value3]
df[df['Result'].isin(expected_values)]

UPDATE:

And to filter on multiple criteria and to filter desired column:

df.ix[df.isin(vals).any(axis=1)].loc[df['Result'].str.lower() == expected_value.lower()].to_csv(filename)

7 Comments

Are you sure that Result is numerical? He didn't mention that anywhere in question.
Why? Maybe you are correct :) BTW, just add df['EXPECTED']=expected_value to your code, because if you are correct your solution is better and that part is missing if you ask me.
Yeah, the result field may or may not be numerical that is why I am treating it as a string. This approach looks good, I am trying out the case where I have multiple check conditions @Boris
Sure, I added another approach that might help. You can also combine first and last approach to both convert to string and look for multiple criteria.
I have updated my problem definition if you can still take a look at it @Boris
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.