Read File word by word

Question

My problem is to read a text file word by word and recovers every word in a variable. I tried to do:

while read ligne; 
do {
for ( i=1; i<= length ($ligne); i++); do
 { var=$(awk '{print $i}' test.txt)}

}done < test.txt

But it doesn't work and I have this error:

Couldn't parse this for loop

You're missing a done on your second do. You only have one done, aligning to your while ... do loop. — Wildcard
– Wildcard, Commented Mar 7, 2016 at 19:10

cuonglm · Accepted Answer · 2016-03-08 01:54:20Z

4

It depends on how you define words.

If words are separated by one or more spaces, you can do:

tr -s '[:blank:]' '[\n*]' < file |
  while IFS= read -r word; do
    : echo "$word" here
  done

If words are sequences of characters contains A-Z, a-z and _:

tr -cs 'A-Za-z_' '[\n*]' < file | ...

On historical System V systems, you need to use square brackets [A-Za-z_].

edited Mar 8, 2016 at 1:54

answered Apr 9, 2015 at 9:20

cuonglm

158k41 gold badges341 silver badges419 bronze badges

In addition to blank characters, the first tr word splits on brackets. And the second tr allows brackets in words, which is disallowed by the explanatory text. I tried to remove the erroneous [ and ] in each tr, but because of brevity, the site rejected the edit.

Barefoot IO
– Barefoot IO

2016-03-07 20:18:23 +00:00
Commented Mar 7, 2016 at 20:18
@BarefootIO: It's [:blank:]. The second one [A-Za-z_] is intended, as required by POSIX.

cuonglm
– cuonglm

2016-03-08 01:31:52 +00:00
Commented Mar 8, 2016 at 1:31
Unlike their globbing and regular expression counterparts, tr range expressions are not bracketed.

Barefoot IO
– Barefoot IO

2016-03-08 01:45:09 +00:00
Commented Mar 8, 2016 at 1:45
@BarefootIO: Bracket is required in SysV system, but not in BSD and standardlize by POSIX, I added a note.

cuonglm
– cuonglm

2016-03-08 01:51:53 +00:00
Commented Mar 8, 2016 at 1:51

Add a comment |

Stéphane Chazelas · Accepted Answer · 2015-04-09 09:51:32Z

4

Just

while read -ra line; 
do
    for word in "${line[@]}";
    do
        echo "$word";
    done;
done < test.txt

will split up the file word by word. Change the echo to whatever you want to do with the words.

semicolons are added so this can be put into a one-liner.

edited Apr 9, 2015 at 9:51

Stéphane Chazelas

585k96 gold badges1.1k silver badges1.7k bronze badges

answered Apr 9, 2015 at 9:20

Lambert

12.8k2 gold badges28 silver badges35 bronze badges

@Stéphane, this script just worked without the -ra and the double quotes are in my opinion not necessary since the words does not contain spaces or other delimiters

Lambert
– Lambert

2015-04-09 09:58:53 +00:00
Commented Apr 9, 2015 at 9:58
One can consider using ${word} when the variable is used to specify the boundaries of the variable name...

Lambert
– Lambert

2015-04-09 10:00:29 +00:00
Commented Apr 9, 2015 at 10:00
The fact that it worked on simple test input does not mean it can work in the wild where you don't know what the input will be. What if the line contained a \ character?

terdon
– terdon ♦

2015-04-09 10:32:46 +00:00
Commented Apr 9, 2015 at 10:32
According to what @terdon mentioned it is indeed safer to use double quotes around $word, say "${word}"

Lambert
– Lambert

2015-04-09 10:43:07 +00:00
Commented Apr 9, 2015 at 10:43
1

If you're using ${line[@]}, that suggests you want to loop over array elements. However, if you don't use -a, that's not what happens. Instead, the line (with backslash processing without -r and with leading and trailing blanks removed) is stored in $line aka ${line[0]}. And when you do for w in ${line[@]}, same as for i in $line, you're invoking the split+glob operator (when omiting quotes around variables) which spits $line and expands globs (see what happens for instance if you have a * word in there).

Stéphane Chazelas
– Stéphane Chazelas

2015-04-09 11:59:40 +00:00
Commented Apr 9, 2015 at 11:59

| Show 1 more comment

Costas · Accepted Answer · 2015-04-09 09:08:28Z

1

You will receive list of "words" which was been separated by spaces but with punctuation marks enclosed:

while read -a tmp_var; 
do
    for i in "${tmp_var[@]}"
    do
        var[${#var[*]}=$i
    done
done < test.txt

But, as usual, test.txt has been transformed by tr or sed or etc. in "1 word in line" list by tr or sed or etc. and read line by line.

edited Apr 9, 2015 at 9:08

answered Apr 9, 2015 at 9:01

Costas

15k24 silver badges38 bronze badges

@ Stéphane Chazelas what do you min by 'IFS' ?

Aomine Daiki
– Aomine Daiki

2015-04-09 09:08:09 +00:00
Commented Apr 9, 2015 at 9:08
@AomineDaiki IFS= set Input Field Separator as null to do not interfere for inner word separator if its present

Costas
– Costas

2015-04-09 09:10:58 +00:00
Commented Apr 9, 2015 at 9:10
in my file the only separator exist is the " " separator

Aomine Daiki
– Aomine Daiki

2015-04-09 09:14:29 +00:00
Commented Apr 9, 2015 at 9:14
i want to receive each word alone, I have a test to perform on it

Aomine Daiki
– Aomine Daiki

2015-04-09 09:16:46 +00:00
Commented Apr 9, 2015 at 9:16

Add a comment |

Stéphane Chazelas · Accepted Answer · 2016-03-07 17:47:00Z

1

A simpler way, using whitespace as separator and all else as a word,

set -o noglob
words=($(cat text_file)) # use split+glob operator with glob disabled
                         # above. Splits on space tab and newline with
                         # the default value of $IFS.

If that words contain punctuactions and punctuations are words don't cause much trouble you could try this way.

edited Mar 7, 2016 at 17:47

Stéphane Chazelas

585k96 gold badges1.1k silver badges1.7k bronze badges

answered Apr 9, 2015 at 17:50

xae

2,08117 silver badges10 bronze badges

Add a comment |

mos fetish · Accepted Answer · 2017-04-06 04:45:14Z

1

read -a WORDS -d "" < file.txt
for word in "${WORDS[@]}"
do
        echo $word
done

Option -a stores the words in an array.

Option -d specifies the delim string. This tells read where to stop processing the file. I've specified an empty string which causes read to continue until it gets an EOF. That is, the whole file is processed regardless of line endings.

answered Apr 6, 2017 at 4:45

mos fetish

1111 bronze badge

Add a comment |

JdeBP · Accepted Answer · 2015-04-09 10:52:24Z

0

Considering that you used awk in your question, here's another alternative to tr and fmt:

awk '{ for ( i = 1; i < NF; ++i ) print $(i); }' test.txt |
while IFS= read -r var 
do 
    echo processing: "$var" 
done

Note that, as with fmt and unlike tr, awk accepts input filenames as arguments.

answered Apr 9, 2015 at 10:52

JdeBP

71.9k13 gold badges175 silver badges378 bronze badges

Add a comment |

Anthon · Accepted Answer · 2016-03-07 17:24:21Z

I have written the following sample script which reads file /etc/passwd word by word.

#!/bin/bash

COUNT=1

FCOUNT=`cat /etc/passwd|wc -l`

while [ $COUNT -le $FCOUNT ]
do
    FTCOUNT=`awk -F ":" '{print NF,$0}' /etc/passwd|awk '{print $1}'|head -$COUNT|tail -1`
    TCOUNT=1
    while [ $TCOUNT -le $FTCOUNT ]
    do
            if [ $TCOUNT -gt $FTCOUNT ]
            then
                    FTCOUNT=""
                    break
            else
                    OUTPUT=`head -$COUNT /etc/passwd|cut -d ":" -f $TCOUNT|tail -1`
                    echo -n "${OUTPUT} "
                    sleep 2
                    TCOUNT=$(( TCOUNT + 1 ))
            fi
    done
    echo ""
    COUNT=$(( COUNT + 1 ))
done

Stack Exchange Network

Read File word by word

7 Answers 7

You must log in to answer this question.

Linked

Hot Network Questions

Read File word by word

7 Answers 7

You must log in to answer this question.

Linked

Related

Hot Network Questions