Count number of lines with length condition

Question

I am trying to count the number of lines in large files with length of the line less than 300 characters.

My current approach to do this is with following command(but it is slow):

awk "length<=300" *.log | wc -l

Is there a better way to get only the count of the lines?

Is your input contains all single byte characters? or it's can contain unicode characters which are double bytes? if unicode then you want count each of characters or characters bytes? — αғsнιη
– αғsнιη, Commented Jun 3, 2022 at 11:35

Archemar · Accepted Answer · 2022-06-04 06:29:45Z

4

use awk to count line

awk 'length<=300{c++} END { print c }' *.log

where

c++ increment counter
END { print c } is executed after last line and print c value.

I am not sure, this will be faster (at least wc -l won't have to count and parse lines)

to get subtotal (can be one lined)

awk 'length<=300{t++;s++} 
     ENDFILE { printf "%s:%d\n",FILENAME,s ; s=0 ; } 
     END { printf "TOTAL:%d\n",t }' *.log

edited Jun 4, 2022 at 6:29

answered Jun 3, 2022 at 9:05

Archemar

32.3k18 gold badges75 silver badges107 bronze badges

On my data set this command was faster than using the solution with grep

xyz
– xyz

2022-06-03 12:54:45 +00:00
Commented Jun 3, 2022 at 12:54
You should mention that to get the subtotal that way requires GNU awk for ENDFILE. Also, printf "TOTAL:%s\n",t should be printf "TOTAL:%d\n",t so you get numeric output (0 vs null) even if no lines are shorter than 301 chars.

Ed Morton
– Ed Morton

2022-06-03 16:56:19 +00:00
Commented Jun 3, 2022 at 16:56

Add a comment |

thanasisp · Accepted Answer · 2022-06-03 19:09:08Z

3

With grep:

cat *.log | grep -vc '^.\{301\}'

To match lines with length <=300 we grep with -v (invert match) for any 301 characters, as the search pattern is limited to one line for grep. Pattern is anchored at the beginning of the line with ^. And -c counts the matching lines.

If you want to have some basic progress indicator, you can use pv from package moreutils:

pv *.log | grep -vc '^.\{301\}'

If you want to get line number per file:

grep -vc '^.\{301\}' *.log

and if you want to get the total from the above command:

grep -vc '^.\{301\}' *.log | awk -F':' '{c+=$NF} END {print c}'

Depending on the data, although we don't usually pipe grep with awk, it could be faster than cat & grep, if there are many very long input lines, the pipe here is used just for a small amount of data, numbers and filenames.

edited Jun 3, 2022 at 19:09

answered Jun 3, 2022 at 9:07

thanasisp

8,5322 gold badges29 silver badges40 bronze badges

Both solutions works well. I like yours also because I can see per file the count

xyz
– xyz

2022-06-03 11:57:44 +00:00
Commented Jun 3, 2022 at 11:57
1

@thanasisp, grep would print lines like hello.txt:1, foo.txt:3 if it's given multiple filenames. cat *.log | grep ... would give the total, though

ilkkachu
– ilkkachu

2022-06-03 12:04:28 +00:00
Commented Jun 3, 2022 at 12:04
I used this: grep -vc '^.\{301\}' *.log | awk -F: '{s+=$2} END {print s}' , but in case if you need per file the count: grep -vc '^.\{301\}' *.log > 300grep.txt and then awk

xyz
– xyz

2022-06-03 12:24:38 +00:00
Commented Jun 3, 2022 at 12:24
I have ~ 60GB of logs to search in

xyz
– xyz

2022-06-03 12:25:18 +00:00
Commented Jun 3, 2022 at 12:25

Add a comment |

jubilatious1 · Accepted Answer · 2022-06-04 06:06:42Z

0

Using Raku (formerly known as Perl_6)

Dependent on shell-globbing:

raku -ne 'state $i; $i++ if .chars <= 300; END say $i // 0;'

#OR

raku -ne 'state $i; if .chars <= 300 {$i++}; END say $i // 0;'

Files determined via regex (independent of shell-globbing):

raku -e 'for dir(test => / .+ \.log $ /) {state $i; $i++ if .chars <= 300 for .lines; END say $i // 0};'

https://docs.raku.org/syntax/state
https://docs.raku.org/routine/dir
https://raku.org

edited Jun 4, 2022 at 6:06

answered Jun 3, 2022 at 18:56

jubilatious1

3,92310 silver badges20 bronze badges

Add a comment |

Stack Exchange Network

Count number of lines with length condition

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Count number of lines with length condition

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions