0

I want to catch the word that is after "on" in several lines at a file, and if the word appears already in the file, I want to skip it. I tried to do so:

#!/bin/bash
echo "" > missig_packages.txt
cat log_file.txt | grep depends > dependsLog.txt
function createListOfPackages {
    if grep "$1" missig_packages.txt; then
        continue
    else
        echo "$1" >> missig_packages.txt
    fi  
}
while read line; do
    package=`cat dependsLog.txt | cut -d" " -f5`
    createListOfPackages $package
done < dependsLog.txt

The file dependsLog.txt contains lines like this:

  libgcc1:amd64 depends on **gcc-4.9-base** (= 4.9.1-0ubuntu1); however:
  cinder-volume depends on **cinder-common** (= 1:2015.1.1-0ubuntu2~cloud2); 
  python-cryptography depends on **python-cffi**.
  python-pycadf depends on **python-netaddr**.

How can I grep the words between ** and ** (which are not themselves in the text)? Each line begins with "".

1
  • Sure you don't want to call that file missing_packages.txt (with an 'n') instead? Commented Sep 20, 2015 at 15:59

3 Answers 3

3

this is awk's jobs.

line with on

awk '$3 == "on" '

the word you are looking for

awk '$3 == "on" { print $4 ;}'
  • $3 == "on" will get the line with "on" as third word
  • { print $4 ;} will print fourth word
1

Your entire shell script fragment can be replaced with:

awk '/depends on/ { print $4}' log_file.txt | sed -e 's/\.$//' | sort -u > missing_packages.txt

The sed script strips the trailing . from package names where the input line doesn't have version information.

2
  • Or just awk '/depends on/ { sub(/\.$/, "", $4); print $4 }' log_file.txt | sort -u > missing_packages.txt Commented Sep 21, 2015 at 2:52
  • True, and that's a valuable addition to my answer but it's easier for a novice like the OP to understand a simple pipeline using multiple tools. Your version would be the next step after understanding the simple version - understanding comes first, optimisation later. Commented Sep 21, 2015 at 3:25
1

Try following command which will grep word after 'on' from dependLogs.txt and then will insert "" at start of line.

cat dependsLog.txt | grep -oP "(?<=on )[^ ]+" | sed 's/^/\"\"/' >> missig_packages.txt

To make sure lines do not get duplicated you can sort and uniq by following command.

cat dependsLog.txt | grep -oP "(?<=on )[^ ]+" | sed 's/^/\"\"/' | sort | uniq >> missig_packages.txt
1
  • 1) ` cat foo | grep pattern | ...` can be replace by grep pattern foo | ... 2) ` sort | uniq ` can be replaced by ` sort -u `. Commented Sep 21, 2015 at 6:31

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.