Find only the matched pattern in a CSV file

Question

I am trying to print only the matched pattern in a CSV file. Example: all the columns value starting with 35=its value. Thanks.

CSV file:

35=A,D=35,C=129,ff=136
D=35,35=BCD,C=129,ff=136
900035=G,D=35,C=129,ff=136
35=EF,D=35,C=129,ff=136,35=G
36=o,D=35,k=1

Output:

35=A
35=BCD
35=EF
35=G

The command I used did not work:

sed -n '/35=[A-Z]*?/ s/.*\(35=[A-Z]*?\).*/\1/p' filename

try to remove space before s in sed expression. it might be difficult to catch both 35in third line. — Archemar
– Archemar, Commented Jun 13, 2017 at 11:38

Kusalananda · Accepted Answer · 2017-06-13 12:38:54Z

14

Using tr to replace all commas with newlines, and then grep to get all lines that start with the string 35=:

$  tr ',' '\n' <data.in | grep '^35='
35=A
35=BCD
35=EF
35=G

answered Jun 13, 2017 at 12:38

Kusalananda♦

356k42 gold badges735 silver badges1.1k bronze badges

Add a comment |

Sundeep · Accepted Answer · 2017-06-13 11:44:50Z

12

With GNU grep which supports -o option to print only matched string, each on its own line

$ grep -oE '\b35=[^,]+' ip.csv 
35=A
35=BCD
35=EF
35=G

\b is word boundary, so that 900035 won't match
[^,]+ to match one or more non, characters
assumes the values do not contain ,

With awk

$ awk -F, '{ for(i=1;i<=NF;i++){if($i~/^35=/) print $i} }' ip.csv 
35=A
35=BCD
35=EF
35=G

-F, set , as input field separator
for(i=1;i<=NF;i++) iterate over all fields
if($i~/^35=/) if field starts with 35=
- print $i print that field

Similar with perl

perl -F, -lane 'foreach (@F){print if /^35=/}' ip.csv

edited Jun 13, 2017 at 11:44

answered Jun 13, 2017 at 11:38

Sundeep

12.2k3 gold badges28 silver badges73 bronze badges

1

Thanks a lot. In Solaries the grep option o is not available. used the awk for the solution.

user102299
– user102299

2017-06-13 12:04:48 +00:00
Commented Jun 13, 2017 at 12:04
Note that \b isn't the best choice of boundary ; it would for instance match 35=value from somethingsomething, 42.35=value, somethingelse. If the grep versions supports -Perl regex and the input contains no spaces, I'd use a lookbehind to assert that we match from the start of a field. If there can be extra spaces this would require variable-length lookbehinds, which aren't implemented in any regex flavour available to grep AFAIK.

Aaron
– Aaron

2017-06-13 13:55:39 +00:00
Commented Jun 13, 2017 at 13:55
@Aaron agreed.. steeldriver's solution based on csv module is most robust... for variable length lookbehinds \K can be used stackoverflow.com/documentation/regex/639/…

Sundeep
– Sundeep

2017-06-13 14:05:05 +00:00
Commented Jun 13, 2017 at 14:05
@Sundeep thanks for the \K trick, I knew about the meta-character but had never thought about its ability to emulate basic variable-length lookbehinds :)

Aaron
– Aaron

2017-06-13 14:35:36 +00:00
Commented Jun 13, 2017 at 14:35

Add a comment |

steeldriver · Accepted Answer · 2017-06-13 12:01:01Z

6

With perl:

$ perl -lne 'print for /(\b35=[^,]+)/g' filename
35=A
35=BCD
35=EF
35=G

or perhaps more generally/robustly using the Text::CSV module

$ perl -MText::CSV -lne '
  BEGIN{$p = Text::CSV->new()} 
  print for grep { /^35=/ } $p->fields(), $p->parse($_)
' filename
35=A
35=BCD
35=EF
35=G

edited Jun 13, 2017 at 12:01

answered Jun 13, 2017 at 11:50

steeldriver

83.8k12 gold badges124 silver badges175 bronze badges

Add a comment |

rcjohnson · Accepted Answer · 2017-06-13 11:51:00Z

2

Perl lookarounds with grep work really well.

grep -oP '(?<=35\=).*?(?=,)'

This returns the exact information minus the 35= bit

grep -oP '(?<=35\=).*?(?=,)' file.csv will return this

A
BCD
G
EF

answered Jun 13, 2017 at 11:51

rcjohnson

9296 silver badges12 bronze badges

Add a comment |

Yeti · Accepted Answer · 2017-06-13 21:27:52Z

Pure Bash solution:

(                                                  # Use parentheses as scope for IFS
    IFS=$',\n'                                     # Split on both , or \n
    for c in $(</tmp/file.csv)                     # For every column or row
    do
        [[ "$c" =~ ^35= ]] && echo ${line##35=}    # Find ^35= and print while removing ^35=
    done
) # Optionally >/tmp/filtered-output.txt

Note, only use this for its readability and flexibility - if you can read it, otherwise you can use the following approach:

# Read            | Replace     | Find        | Remove
cat /tmp/file.csv | tr ',' '\n' | grep '^35=' | sed 's/^35=//'

which is more intuitive and efficient.

Input (/tmp/file.csv):

35=A,D=35,C=129,ff=136
D=35,35=BCD,C=129,ff=136
900035=G,D=35,C=129,ff=136
35=EF,D=35,C=129,ff=136,35=G
36=o,D=35,k=1

Output:

A
BCD
EF
G

Stack Exchange Network

Find only the matched pattern in a CSV file

5 Answers 5

You must log in to answer this question.

Hot Network Questions

Find only the matched pattern in a CSV file

5 Answers 5

You must log in to answer this question.

Related

Hot Network Questions