14

I need to write a Python generator that yields tuples (X, Y) coming from two different CSV files.

It should receive a batch size on init, read line after line from the two CSVs, yield a tuple (X, Y) for each line, where X and Y are arrays (the columns of the CSV files).

I've looked at examples of lazy reading but I'm finding it difficult to convert them for CSVs:

Also, unfortunately Pandas Dataframes are not an option in this case.

Any snippet I can start from?

Thanks

2
  • Did I understand you correctly, that you want a generator that yields pairs of lines out of different CSV files? Commented Jul 26, 2016 at 8:21
  • 1
    I've added references to solutions I've tried, and corrected y to Y (both X and Y are arrays of floats). Commented Jul 26, 2016 at 8:44

1 Answer 1

31

You can have a generator, that reads lines from two different csv readers and yield their lines as pairs of arrays. The code for that is:

import csv
import numpy as np

def getData(filename1, filename2):
    with open(filename1, "rb") as csv1, open(filename2, "rb") as csv2:
        reader1 = csv.reader(csv1)
        reader2 = csv.reader(csv2)
        for row1, row2 in zip(reader1, reader2):
            yield (np.array(row1, dtype=np.float),
                   np.array(row2, dtype=np.float)) 
                # This will give arrays of floats, for other types change dtype

for tup in getData("file1", "file2"):
    print(tup)
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.