0

how to capture string from csv line that comes after specific word

for example , this is the csv line that we want to cut the strings that comes after /data/

status=true /data/sdb/hadoop/hdfs/log,/data/sdc/hadoop/hdfs/log,/data/sdd/hadoop/hdfs/log,/data/sde/hadoop/hdfs/log,/data/sdf/hadoop/hdfs/log

example of expected resuls

sdb
sdc
sdd
sde
sdf
3
  • Just for completeness: is the status=true part not separated by a ,? Commented Mar 3, 2020 at 11:29
  • no , its just words that come before the csv Commented Mar 3, 2020 at 11:32
  • What should the output be if /data/sdb/foo/data/bar existed in the CSV? What if a field was /foo/bar/data/ (i.e. nothing after /data/)? Commented Mar 3, 2020 at 15:57

6 Answers 6

4

Use grep:

with PCRE:

grep -Po '/data/\K[^/]*'

if that is not available:

grep -o '/data/[^/]*' | cut -d'/' -f3
1

@pLumo absolutely has the right answer. If, for whatever reason, you wanted to use awk and bash's builtin parameter expansion, all the while being slightly convoluted...

LINE_COUNTER=0
while read line; do
    COUNT_SEP="${line//[^,]}"
    for col in $(seq 2 $((${#COUNT_SEP}+1))); do
        LINE_COUNTER=$(($LINE_COUNTER+1))
        COLUMN=$(echo "${line}" | awk -v variable="${col}" -F, '{ print $variable }')
        if [ $LINE_COUNTER -eq 1 ]
        then
            echo "${COLUMN}" > /tmp/splitCSV
        else
            echo "${COLUMN}" >> /tmp/splitCSV
        fi
    done
    while read splitCol; do
        echo "${splitCol}" | awk -F'/data/' '{ print $2 }' | awk -F'/' '{ print $1 }'
    done < /tmp/splitCSV
done < test.csv
3
  • 1
    You should never do that. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons. Commented Mar 3, 2020 at 15:59
  • 1
    Thanks! I didn't know that was best practice. Very interesting. Commented Mar 3, 2020 at 18:51
  • 1
    Yeah, the guys who invented shell to manipulate files and processes also invented tools like awk for shell to call to manipulate text. So, horses for courses... never write a shell loop just to manipulate text and you can't go wrong. Commented Mar 4, 2020 at 14:41
1

Just to add an option, having in mind that there's only one pattern that match three characters between slashes, with sed and grep:

grep -o "/.../"  foo | sed 's;/;;g' file

Output:

sdb
sdc
sdd
sde
sdf
1

For Above input below command will work

perl -pne "s/,/\n/g"  filename|awk -F '/data/' '{gsub("/.*","",$2);print $2}'

output

sdb
sdc
sdd
sde
sdf
1

This works for me with awk

awk -F'/' '{for(i=1;i<=NF;i++) if($i=="data") print $(i+1)}' <file>

1: -F defines field separator as /

2: loop on every field on each line

3: if field equals "data" print next field

1

We can choose from the following :

awk -F/ '
     BEGIN { OFS = RS }
     {
       N = split($0, a, /\//)
       $0 = "" 
        for ( i=j=1; i<N; i++ ) 
            if ( a[i] == "data" ) 
                 $(j++) = a[++i]
      }N>1' file.csv


perl -F/ -lane '
   shift(@F) eq q(data) and print(shift(@F)) 
      while(@F && m{/data/});
' file.csv


perl -lne 'print for m{/data/([^/,]+)}g' file.csv


sed -re '
    /\n/{P;D;}
    s:/data/([^/,]+):\n\1\n:
   D
' file.csv

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.