0

I have this type of data in a text file (csv) :

column1|column2|column3|column4|column5 (\r\n)
column1|column2|column3|column4|column5 (\r\n)
column1|column2 (\r\n)
column2 (\r\n)
column2|column3|column4|column5 (\r\n)

I would like to delete the \r\n that are line 3 and line 4 to have :

column1|column2|column3|column4|column5 (\r\n)
column1|column2|column3|column4|column5 (\r\n)
column1|column2/column2/column2|column3|column4|column5 (\r\n)

My idea is if the row doesn't have 4 column separators ("|") then delete the CRLF, and repeat the operation until you have only correct rows.

This is my code :

String path = "test.csv";

// Read file
string[] readText = File.ReadAllLines(path);

// Empty the file
File.WriteAllText(path, String.Empty);

int x = 0;
int countheaders = 0;
int countlines;
using (StreamWriter writer = new StreamWriter(path))
{
    foreach (string s in readText)
    {
        if (x == 0)
        {
            countheaders = s.Where(c => c == '|').Count();
            x = 1;
        }

        countlines = 0;
        countlines = s.Where(d => d == '|').Count();
        if (countlines == countheaders)
        {
            writer.WriteLine(s);
        }
        else
        {
            string s2 = s;
            s2 = s2.ToString().TrimEnd('\r', '\n');
            writer.Write(s2);
        }
    }
}

The problem is that i'm reading the file in one pass, so the line break on line 4 is removed and line 4 and line 5 are together...

2
  • 2
    Post your code. Commented May 28, 2020 at 12:32
  • Hi, i edit my post with the code. Commented May 28, 2020 at 15:08

1 Answer 1

1

You could probably do the following (cant test it now, but it should work):

IEnumerable<string> batchValuesIn(
    IEnumerable<string> source, 
    string separator,
    int size)
{
    var counter = 0;
    var buffer = new StringBuilder();

    foreach (var line in  source)
    {
        var values = line.Split(separator);

        if (line.Length != 0)
        {
            foreach (var value in values)
            {
                buffer.Append(value);
                counter++;

                if (counter % size == 0)
                {
                    yield return buffer.ToString();
                    buffer.Clear();
                }
                else
                   buffer.Append(separator);
            }
        }
    }

    if (buffer.Length != 0)
       yield return buffer.ToString();

And you'd use it like:

var newLines = batchValuesIn(File.ReadLines(path), "|", 5);

The good thing about this solution is that you are never loading into memory the enitre orignal source. You simply build the lines on the fly.

DISCLAIMER: this may behave weirdly with malfomred input strings.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi, thanks for your answer, but it's not working. Also, remember to keep the "\r\n" on line four, otherwise you'll end up with line four and line five together. Because normally if you delete line 2 and 3, you'd have line 4 with the other two, so there's no need to delete the line break.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.