3

I thought this will be simple, but can't find out how to do it.

Scenario

I have a single .csv file with id_user,text,id_group columns where each column is delimited by tabs such like:

"123456789"        "Here's the field of the text, also contains comma"        "10"
"987456321"        "Here's the field of the text, also contains comma"        "10"
"123654789"        "Here's the field of the text, also contains comma"        "11"
"987456123"        "Here's the field of the text, also contains comma"        "11"

How to find the the text?

Attempt

awk

I was looking for a way to specify the print $n delimiter, if I could do it an option will be

$ awk -d '\t' '{print $2}' file.csv | sed -e 's/"//gp'

where -d is the delimiter for the print option and the sed to take out the "

5
  • 1
    check it stackoverflow.com/questions/5374239/tab-separated-values-in-awk Commented Sep 8, 2015 at 15:09
  • 1
    Use awk -F '\t' .... Commented Sep 8, 2015 at 15:15
  • @Thor please update your answer with this option to accept it, this is what I was looking for Commented Sep 8, 2015 at 16:00
  • @tachomi: Glad it worked, added two awk alternatives. Commented Sep 8, 2015 at 19:39
  • @tachomi: You could also use the double-quotes as the delimiter. See my answer. Commented Sep 9, 2015 at 9:11

5 Answers 5

10

TAB delimiter

cut

You do not need sed or awk, a simple cut will do:

cut -f2 infile

awk

If you want to use awk, the way to supply the delimiter is either through the -F argument or as a FS= postfix:

awk -F '\t' '{ print $2 }' infile

Or:

awk '{ print $2 }' FS='\t' infile

Output in all cases:

"Here's the field of the text, also contains comma"
"Here's the field of the text, also contains comma"
"Here's the field of the text, also contains comma"
"Here's the field of the text, also contains comma"

Quote delimiter

If the double-quotes in the file are consistent, i.e. no embedded double-quotes in fields, you could use them as the delimiter and avoid having them in the output, e.g.:

cut

cut -d\" -f4 infile

awk

awk -F\" '{ print $4 }' infile

Output in both cases:

Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
4

You can use grep with PCRE (-P) :

grep -Po '\s"\K[^"]+(?="\s)' file.txt
  • \s" matches any whitespace followed by a ", \K discards the match

  • [^"]+ get our desired portion between two "s

  • (?="\s) is a zero width positive lookahead pattern ensuring the required portion is followed by " and any whitespace character.

Example :

$ grep -Po '\s"\K[^"]+(?="\s)' file.txt 
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
2

To specific the tab as a delimiter

$ awk -F '\t' '{print $2}' file.csv

To take away the unwanted "

$ awk -F '\t' '{print $2}' file.csv | sed 's/"//g'

Other option using awk -F

$ awk -F '"' '{print $4}' file.csv
1

I would use perl for this, because Text::CSV is really good for handling non-trivial CSV (e.g. involving quotes):

#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;

open ( my $input, '<', "file.csv" ) or die $!;   
my $csv = Text::CSV -> new ( { binary => 1, 
                               sep_char => "\t", } );

while ( my $row = $csv -> getline ( $input ) ) {
    print $row -> [1],"\n";
}
close ( $input );

Prints:

Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
1

Your sed part is correct. You can either use awk -F '\t' or the following,

awk 'BEGIN{FS="\t"} {print $2}' file.csv | sed 's/"//g'

or if you do not want to use sed, you can pipe the output of the first awk to the second awk and then use '"' as the field delimiter and then print the second field.

awk 'BEGIN{FS="\t"} {print $2}' file.csv | awk -F "\"" '{print $2}'

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.