19

Suppose I have this CSV file :

NAME,ADDRESS,DATE
"Eko S. Wibowo", "Tamanan, Banguntapan, Bantul, DIY", "6/27/1979"

I would like like to store each token that enclosed using a double quotes to be in an array, is there a safe to do this instead of using the String split() function? Currently I load up the file in a RichTextBox, and then using its Lines[] property, I do a loop for each Lines[] element and doing this :

string[] line = s.Split(',');

s is a reference to RichTextBox.Lines[]. And as you can clearly see, the comma inside a token can easily messed up split() function. So, instead of ended with three token as I want it, I ended with 6 tokens

Any help will be appreciated!

5
  • 1
    stackoverflow.com/questions/2081418/… Commented Jun 20, 2013 at 7:07
  • 1
    Unless you want to display anything, do not (ab)use GUI components for data storage. If you need the contents of the file line by line, use the File.ReadLines method. Commented Jun 20, 2013 at 7:07
  • stackoverflow.com/questions/769621/… Commented Jun 20, 2013 at 7:11
  • @O.R.Mapper You're absolutely right! I'll change my code design for that Commented Jun 21, 2013 at 13:03
  • @chancea CsvHelper and CsvReader it that link should be good, but I think I will go with the solution that use RegEx. :) Thanks! Commented Jun 21, 2013 at 13:13

6 Answers 6

30

You could use regex too:

string input = "\"Eko S. Wibowo\", \"Tamanan, Banguntapan, Bantul, DIY\", \"6/27/1979\"";
string pattern = @"""\s*,\s*""";

// input.Substring(1, input.Length - 2) removes the first and last " from the string
string[] tokens = System.Text.RegularExpressions.Regex.Split(
    input.Substring(1, input.Length - 2), pattern);

This will give you:

Eko S. Wibowo
Tamanan, Banguntapan, Bantul, DIY
6/27/1979
Sign up to request clarification or add additional context in comments.

3 Comments

I accepted this as the answer, as I always want to enhanced my skill on RegEx and actually, this solution should be part of a PHP solution, which depend greatly on RegEx also for this purpose. Using a .NET only solution would not be a good idea. Although, I am sorry that I am not elaborate enough about it. I just got this idea when I read answer by @unlimit : a simple RegEx is way to go!
this is a fine solution but just a caution not every CSV file will always put quotes around each value. I know if you make a CSV file from Excel it does not, only when the values have commas, quotes, etc inside the value.
A better pattern would be ""?\s*,\s*""?, so that it matches columns which don't have double quotes too. Sometimes CSV files have numerical values without the double quotes.
9

I've done this with my own method. It simply counts the amout of " and ' characters.
Improve this to your needs.

    public List<string> SplitCsvLine(string s) {
        int i;
        int a = 0;
        int count = 0;
        List<string> str = new List<string>();
        for (i = 0; i < s.Length; i++) {
            switch (s[i]) {
                case ',':
                    if ((count & 1) == 0) {
                        str.Add(s.Substring(a, i - a));
                        a = i + 1;
                    }
                    break;
                case '"':
                case '\'': count++; break;
            }
        }
        str.Add(s.Substring(a));
        return str;
    }

2 Comments

By including both " and ' in the counter you'll incorrectly parse something with mixed quotes: " \"The quote's break\", this "
@drzaus: That's correct. The actual method is more complicated and has a lot counters for different things. The shown code is meant to show the basic idea.
3

It's not an exact answer to your question, but why don't you use already written library to manipulate CSV file, good example would be LinqToCsv. CSV could be delimited with various punctuation signs. Moreover, there are gotchas, which are already addressed by library creators. Such as dealing with name row, dealing with different date formats and mapping rows to C# objects.

Comments

1

You can replace "," with ; then split by ;

var values= s.Replace("\",\"",";").Split(';');

Comments

1

Five years old but there is always somebody new who wants to split a CSV.

If your data is simple and predictable (i.e. never has any special characters like commas, quotes and newlines) then you can do it with split() or regex.

But to support all the nuances of the CSV format properly without code soup you should really use a library where all the magic has already been figured out. Don't re-invent the wheel (unless you are doing it for fun of course).

CsvHelper is simple enough to use:

https://joshclose.github.io/CsvHelper/2.x/

using (var parser = new CsvParser(textReader)
{
    while(true)
    {
        string[] line = parser.Read();

        if (line != null)
        {
            // do something
        }
        else
        {
            break;
        }
    }
}

More discussion / same question: Dealing with commas in a CSV file

Comments

0

If your CSV line is tightly packed it's easiest to use the end and tail removal mentioned earlier and then a simple split on a joining string

 string[] tokens = input.Substring(1, input.Length - 2).Split("\",\"");

This will only work if ALL fields are double-quoted even if they don't (officially) need to be. It will be faster than RegEx but with given conditions as to its use.

Really useful if your data looks like "Name","1","12/03/2018","Add1,Add2,Add3","other stuff"

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.