5

I am trying to print only the matched pattern in a CSV file. Example: all the columns value starting with 35=its value. Thanks.

CSV file:

35=A,D=35,C=129,ff=136
D=35,35=BCD,C=129,ff=136
900035=G,D=35,C=129,ff=136
35=EF,D=35,C=129,ff=136,35=G
36=o,D=35,k=1

Output:

35=A
35=BCD
35=EF
35=G

The command I used did not work:

sed -n '/35=[A-Z]*?/ s/.*\(35=[A-Z]*?\).*/\1/p' filename
2
  • try to remove space before s in sed expression. it might be difficult to catch both 35in third line. Commented Jun 13, 2017 at 11:38
  • Use the following: sed -e 'y/,/\n/; /^35=/P; D' filename Commented Jun 15, 2017 at 13:57

5 Answers 5

14

Using tr to replace all commas with newlines, and then grep to get all lines that start with the string 35=:

$  tr ',' '\n' <data.in | grep '^35='
35=A
35=BCD
35=EF
35=G
12

With GNU grep which supports -o option to print only matched string, each on its own line

$ grep -oE '\b35=[^,]+' ip.csv 
35=A
35=BCD
35=EF
35=G
  • \b is word boundary, so that 900035 won't match
  • [^,]+ to match one or more non, characters
  • assumes the values do not contain ,


With awk

$ awk -F, '{ for(i=1;i<=NF;i++){if($i~/^35=/) print $i} }' ip.csv 
35=A
35=BCD
35=EF
35=G
  • -F, set , as input field separator
  • for(i=1;i<=NF;i++) iterate over all fields
  • if($i~/^35=/) if field starts with 35=
    • print $i print that field

Similar with perl

perl -F, -lane 'foreach (@F){print if /^35=/}' ip.csv 
4
  • 1
    Thanks a lot. In Solaries the grep option o is not available. used the awk for the solution. Commented Jun 13, 2017 at 12:04
  • Note that \b isn't the best choice of boundary ; it would for instance match 35=value from somethingsomething, 42.35=value, somethingelse. If the grep versions supports -Perl regex and the input contains no spaces, I'd use a lookbehind to assert that we match from the start of a field. If there can be extra spaces this would require variable-length lookbehinds, which aren't implemented in any regex flavour available to grep AFAIK. Commented Jun 13, 2017 at 13:55
  • @Aaron agreed.. steeldriver's solution based on csv module is most robust... for variable length lookbehinds \K can be used stackoverflow.com/documentation/regex/639/… Commented Jun 13, 2017 at 14:05
  • @Sundeep thanks for the \K trick, I knew about the meta-character but had never thought about its ability to emulate basic variable-length lookbehinds :) Commented Jun 13, 2017 at 14:35
6

With perl:

$ perl -lne 'print for /(\b35=[^,]+)/g' filename
35=A
35=BCD
35=EF
35=G

or perhaps more generally/robustly using the Text::CSV module

$ perl -MText::CSV -lne '
  BEGIN{$p = Text::CSV->new()} 
  print for grep { /^35=/ } $p->fields(), $p->parse($_)
' filename
35=A
35=BCD
35=EF
35=G
2

Perl lookarounds with grep work really well.

grep -oP '(?<=35\=).*?(?=,)'

This returns the exact information minus the 35= bit

grep -oP '(?<=35\=).*?(?=,)' file.csv will return this

A
BCD
G
EF

0

Pure Bash solution:

(                                                  # Use parentheses as scope for IFS
    IFS=$',\n'                                     # Split on both , or \n
    for c in $(</tmp/file.csv)                     # For every column or row
    do
        [[ "$c" =~ ^35= ]] && echo ${line##35=}    # Find ^35= and print while removing ^35=
    done
) # Optionally >/tmp/filtered-output.txt

Note, only use this for its readability and flexibility - if you can read it, otherwise you can use the following approach:

# Read            | Replace     | Find        | Remove
cat /tmp/file.csv | tr ',' '\n' | grep '^35=' | sed 's/^35=//'

which is more intuitive and efficient.

Input (/tmp/file.csv):

35=A,D=35,C=129,ff=136
D=35,35=BCD,C=129,ff=136
900035=G,D=35,C=129,ff=136
35=EF,D=35,C=129,ff=136,35=G
36=o,D=35,k=1

Output:

A
BCD
EF
G

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.