0

Let's say I have a root directory(folder) z and i have three sub-directory(folders) a, b, and c

Each a, b, and c contain one csv file which are similar data and have similar names a_data, b_data, and c_data)

Out of three csv files, only one csv contains the value of integer 100 inside data frame.``

How can I design a loop that scans all csv inside three sub-folders and tells me which csv has the value "100"?

Thanks alot!

1
  • Include your code showing what you've tried so far. Commented Apr 6, 2020 at 5:15

3 Answers 3

1

I can't profile my idea at the moment, but I assume it is going to be faster to open each file with Pandas than try to search through the text of the CSV before opening it in Pandas. Also, it will probably read better.

So, under the assumption that its faster to open everything with Pandas than using something like the CSV library, let's do:

import pandas as pd
import numpy as np

df = pd.read_csv("~/z/a/a_data.csv")

if not df["column"].isin([100]).all():
  df = pd.read_csv("~/z/b/b_data.csv")

  if not df["column"].isin([100]).all():
    df = pd.read_csv("~/z/c/c_data.csv")

    if not df["column"].isin([100]).all():
      print("No value")

Ultimately, nested if's aren't pretty. But, it's hard to find what's the right fit without seeing your code. If you can post your code, that would help. Otherwise, hope the above helps you get started.

Sign up to request clarification or add additional context in comments.

3 Comments

I have tried this way before, however the problem is that what if i have more than three sub folders? Wouldn't it be cleaner to use loop to find csv in each data? I can't really think of loop that well
@JunyoungBae Correct, at some point you definitely don't want nested if's. You'd probably want something recursive and/or one where it leverages os. So you'd end up with something that looks at the folder structure and checks through each csv opening it with pandas like df = pd.read_csv(root_folder_str + "/" + sub_folder_str + "/" + file_name_str). If you need more help, edit your question to be your exact project structure and I will edit my answer based on that. Hope that helps!
Thank you for your well explained answer!
1
import glob
import pandas as pd
val = 100
subdir_files = glob.glob(folder_path  + '/**/*.csv', recursive=True)
for file in subdir_files:
    df = pd.read_csv(file)
    if val in df['column_name'].values:
        print(file)
        break

7 Comments

Might want to add a break to avoid looping after you've found the file. It may also be helpful to include a flag for if anything was found
Thanks for your answer. from 7th line of your code, What if I want to find value of 100 anywhere regardless of column name? Is it okay if I do df.values?
@JunyoungBae That would also be okay
@Mohnish Quick question. How can I change the code of glob.glob if I have other csv files that I do not want to include for df? for example, I only want to include a_data, b_data not other data
@JunyoungBae you can add if file.split("/")[-2] == file.split("/")[-1].replace('_data.csv',""): after for loop and the rest of the code inside this. that would check if file name 'a_data.csv' is in folder 'a'
|
0

You can loop over your csv_files list like this, reading each using pandas.read_csv and finding the first one with the desired value. The else clause of the for loop will be executed if the loop ended normally (i.e. not on break), corresponding to none of the files containing the desired value.

import pandas as pd
csv_files = ["a/a.csv", "b/b.csv", "c/c.csv"]
found_df = None
for csv_file in csv_files:
    df = pd.read_csv(csv_file)
    if 100 in df["column"].values:
        found_df = df
        break
else:
    print("No value found")

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.