0

I have developed a code in C# that copies data from csv file to a data table. The csv file contains 5 Million rows and I read the rows line by line to prevent memory issues. I wonder why I still get OutOfMemory Exception. I added breakPoints to make sure the right strings are copied to my variables and they are working correctly. Any ideas?

int first_row_flag = 0;   //first row is column name and we dont need to import them
string temp;                
foreach (var row in File.ReadLines(path3))
{
    if (!string.IsNullOrEmpty(row))
    {
        int i = 0;
        if (first_row_flag != 0)
        {
            dt.Rows.Add();
            foreach (string cell in row.Split(','))
            {

                if (i < 9)
                {
                    temp = cell.Replace("\n", "");
                    temp = temp.Replace("\r", "");
                    dt.Rows[dt.Rows.Count - 1][i] = temp;
                    i++;
                }
            }
        }
        else
        {
            first_row_flag++;    //get rid of first row
        }
    }
}

The number of columns in each row is 9. Thats why I use i to make sure I will not read unexpected data in 10th column.

Here is the stack trace:

enter image description here

6
  • 5
    You're loading all this data into a DataTable at once? Seems like that's going to use a lot of memory.... Commented Feb 15, 2017 at 16:19
  • 3
    you're still adding and adding and adding to a (i'm guessing) a datatable. Commented Feb 15, 2017 at 16:19
  • 3
    How exactly is reading line by line help here if you still hold all the rows in memory? Commented Feb 15, 2017 at 16:20
  • 1
    why would this not eventually have you run out of memory? If you do nothing but "put in" without ever "taking out", you're eventually going to run out. Commented Feb 15, 2017 at 16:20
  • I think OP's concern is why creating 5 millions of empty rows (as only first one is filled) causing OOM. Most likely due to usage of 32bit process to run the code (see stackoverflow.com/questions/14186256/…)... (ignoring practical value of creating millions of empty rows)... Commented Feb 15, 2017 at 16:28

1 Answer 1

2

5 million rows, could be too much data to handle. (it will depend on the number of columns and values). Check the file size and then compare it with the memory available for a rough idea. The point is, with this much data , you will end up with out of memory exception with other techniques, most of the time.

You should reconsider the usage of DataTable, if you are holding records so that you can later do an insert in DB, then process your data in small batches.

If you decide to handle data in batches, then you could even think about not using DataTable at all, instead use List<T>.

Also, look at other techniques to read CSV file. Reading CSV files using C#

Sign up to request clarification or add additional context in comments.

1 Comment

Side note: you do know "number of columns and values" - all rows except at most 9 are empty... :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.