2

My problem is to read a text file word by word and recovers every word in a variable. I tried to do:

while read ligne; 
do {
for ( i=1; i<= length ($ligne); i++); do
 { var=$(awk '{print $i}' test.txt)}

}done < test.txt

But it doesn't work and I have this error:

Couldn't parse this for loop
1
  • You're missing a done on your second do. You only have one done, aligning to your while ... do loop. Commented Mar 7, 2016 at 19:10

7 Answers 7

4

It depends on how you define words.

If words are separated by one or more spaces, you can do:

tr -s '[:blank:]' '[\n*]' < file |
  while IFS= read -r word; do
    : echo "$word" here
  done

If words are sequences of characters contains A-Z, a-z and _:

tr -cs 'A-Za-z_' '[\n*]' < file | ...

On historical System V systems, you need to use square brackets [A-Za-z_].

4
  • In addition to blank characters, the first tr word splits on brackets. And the second tr allows brackets in words, which is disallowed by the explanatory text. I tried to remove the erroneous [ and ] in each tr, but because of brevity, the site rejected the edit. Commented Mar 7, 2016 at 20:18
  • @BarefootIO: It's [:blank:]. The second one [A-Za-z_] is intended, as required by POSIX. Commented Mar 8, 2016 at 1:31
  • Unlike their globbing and regular expression counterparts, tr range expressions are not bracketed. Commented Mar 8, 2016 at 1:45
  • @BarefootIO: Bracket is required in SysV system, but not in BSD and standardlize by POSIX, I added a note. Commented Mar 8, 2016 at 1:51
4

Just

while read -ra line; 
do
    for word in "${line[@]}";
    do
        echo "$word";
    done;
done < test.txt

will split up the file word by word. Change the echo to whatever you want to do with the words.

semicolons are added so this can be put into a one-liner.

6
  • @Stéphane, this script just worked without the -ra and the double quotes are in my opinion not necessary since the words does not contain spaces or other delimiters Commented Apr 9, 2015 at 9:58
  • One can consider using ${word} when the variable is used to specify the boundaries of the variable name... Commented Apr 9, 2015 at 10:00
  • The fact that it worked on simple test input does not mean it can work in the wild where you don't know what the input will be. What if the line contained a \ character? Commented Apr 9, 2015 at 10:32
  • According to what @terdon mentioned it is indeed safer to use double quotes around $word, say "${word}" Commented Apr 9, 2015 at 10:43
  • 1
    If you're using ${line[@]}, that suggests you want to loop over array elements. However, if you don't use -a, that's not what happens. Instead, the line (with backslash processing without -r and with leading and trailing blanks removed) is stored in $line aka ${line[0]}. And when you do for w in ${line[@]}, same as for i in $line, you're invoking the split+glob operator (when omiting quotes around variables) which spits $line and expands globs (see what happens for instance if you have a * word in there). Commented Apr 9, 2015 at 11:59
1

You will receive list of "words" which was been separated by spaces but with punctuation marks enclosed:

while read -a tmp_var; 
do
    for i in "${tmp_var[@]}"
    do
        var[${#var[*]}=$i
    done
done < test.txt

But, as usual, test.txt has been transformed by tr or sed or etc. in "1 word in line" list by tr or sed or etc. and read line by line.

4
  • @ Stéphane Chazelas what do you min by 'IFS' ? Commented Apr 9, 2015 at 9:08
  • @AomineDaiki IFS= set Input Field Separator as null to do not interfere for inner word separator if its present Commented Apr 9, 2015 at 9:10
  • in my file the only separator exist is the " " separator Commented Apr 9, 2015 at 9:14
  • i want to receive each word alone, I have a test to perform on it Commented Apr 9, 2015 at 9:16
1

A simpler way, using whitespace as separator and all else as a word,

set -o noglob
words=($(cat text_file)) # use split+glob operator with glob disabled
                         # above. Splits on space tab and newline with
                         # the default value of $IFS.

If that words contain punctuactions and punctuations are words don't cause much trouble you could try this way.

1
read -a WORDS -d "" < file.txt
for word in "${WORDS[@]}"
do
        echo $word
done

Option -a stores the words in an array.

Option -d specifies the delim string. This tells read where to stop processing the file. I've specified an empty string which causes read to continue until it gets an EOF. That is, the whole file is processed regardless of line endings.

0

Considering that you used awk in your question, here's another alternative to tr and fmt:

awk '{ for ( i = 1; i < NF; ++i ) print $(i); }' test.txt |
while IFS= read -r var 
do 
    echo processing: "$var" 
done

Note that, as with fmt and unlike tr, awk accepts input filenames as arguments.

0

I have written the following sample script which reads file /etc/passwd word by word.

#!/bin/bash

COUNT=1

FCOUNT=`cat /etc/passwd|wc -l`

while [ $COUNT -le $FCOUNT ]
do
    FTCOUNT=`awk -F ":" '{print NF,$0}' /etc/passwd|awk '{print $1}'|head -$COUNT|tail -1`
    TCOUNT=1
    while [ $TCOUNT -le $FTCOUNT ]
    do
            if [ $TCOUNT -gt $FTCOUNT ]
            then
                    FTCOUNT=""
                    break
            else
                    OUTPUT=`head -$COUNT /etc/passwd|cut -d ":" -f $TCOUNT|tail -1`
                    echo -n "${OUTPUT} "
                    sleep 2
                    TCOUNT=$(( TCOUNT + 1 ))
            fi
    done
    echo ""
    COUNT=$(( COUNT + 1 ))
done

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.