Finding the number of time a particular number in a file where range also specified

Question

I have a file with numbers separated by ,(comma). In between it also contains a number range like 300-400. Say for example I have a text file, namely testme.txt which looks like,

200,300,234,340-350,400,360,333-339
409-420
4444-31231231
348

I want to find out whether number 348 is present or not. 348 is present in 2 places:

340-350
In last line.

How to find it?. I tried using regex in sed,awk, but I am not able to completely write the script to capture the number range. Is there any other way to find it?

UPDATE: Found 1 brute force solution & it's working only for range.

count=0;
num1=348;
for i in `sed 's/\([0-9]\+\-[0-9]\+\)/:&:/g' testme.txt  | 
    awk -F: '{ for(i=1; i<=NF; i++) if($i ~/[0-9]+-[0-9]+/){print $i} }'`;      
do 
    lh=`echo $i | awk -F\- '{print $1}'`; 
    rh=`echo $i | awk -F\- '{print $2}'`;  
    if [ $lh -le $num1 -a $rh -ge $num1 ]; 
    then  
        count=`expr $count + 1`; 
    fi; 
done
echo $count;

Regular expressions cannot handle range of values easily. You should do numerical comparisons in a language that supports them (even awk can do that, and by setting FS you can split at both space and comma). — vinc17
– vinc17, Commented Jul 21, 2014 at 15:12

iruvar · Accepted Answer · 2014-07-21 15:20:52Z

4

A GNU awk solution that treats , or \n as a record separator and - as a field separator. An equality check or a range check is applied depending on number of fields

awk -v num=348 -v RS=',|\n' -F'-' 'NF == 2 && $1 <= num && $2 >= num{c++};
           NF == 1 && $0 == num{c++};
           END{print c+0}' file
2

answered Jul 21, 2014 at 15:20

iruvar

17k8 gold badges51 silver badges81 bronze badges

Add a comment |

cuonglm · Accepted Answer · 2014-07-21 15:57:47Z

3

If you can use perl:

$ perl -F',' -anle '
for (@F) {
    ($l,$h) = split "-";                
    $count++ if $l == 348 || ($l < 348 and $h >= 348);
}
END {print $count}
' file
2

edited Jul 21, 2014 at 15:57

answered Jul 21, 2014 at 15:18

cuonglm

158k41 gold badges342 silver badges420 bronze badges

Your script doesn't give the right answer if one replaces the first comma by a dash.

vinc17
– vinc17

2014-07-21 15:29:02 +00:00
Commented Jul 21, 2014 at 15:29
Take the example given by the OP, then replace the first comma (between 200 and 300) by a dash. Your script gives 1 instead of 2.

vinc17
– vinc17

2014-07-21 15:34:11 +00:00
Commented Jul 21, 2014 at 15:34
@vinc17: Sorry, my bad, forget to add the loop, fixed.

cuonglm
– cuonglm

2014-07-21 15:39:38 +00:00
Commented Jul 21, 2014 at 15:39
for (@F) { ($l,$h) = split "-"; defined $h or $h = $l; $count++ if (sort {$a <=> $b} $l,348,$h)[1] == 348; } END {print $count} is shorter and more robust (if you have 348-348 in the file).

vinc17
– vinc17

2014-07-21 15:47:10 +00:00
Commented Jul 21, 2014 at 15:47
There is more than one way to do it! See my updated.

cuonglm
– cuonglm

2014-07-21 15:49:50 +00:00
Commented Jul 21, 2014 at 15:49

| Show 2 more comments

glenn jackman · Accepted Answer · 2014-07-21 16:11:58Z

2

This answer will provide the fields that contain the specified number, not just the lines, if you are after that level of detail (and if the ranges in your data might contain overlaps):

awk -v num=348 -F, '{
  for (i=1; i<=NF; i++) {
    if ($i == num || (split($i, a, /-/) == 2 && (a[1] <= num && num <= a[2]))) {
      print $i
    }
  }
}' <<END
200,300,234,340-350,400,360,333-339
409-420
4444-31231231
348
1-400,100-1000
END

For giggles, golfed:

awk -F, '{for(i=1;i<=NF;i++)if($i==n||(split($i,a,/-/)==2&&a[1]<=n&&n<=a[2]))print $i}' n=348 file

edited Jul 21, 2014 at 16:11

answered Jul 21, 2014 at 16:06

glenn jackman

88.5k16 gold badges124 silver badges179 bronze badges

Add a comment |

ShellGame · Accepted Answer · 2014-07-21 15:09:01Z

Possible method to approach the problem (as there are I am sure many ways to get this done) is to simplify the checks for the number.

Use nested if statements to move through the logic, naturally splitting the 'values' to check based on a comma delimeter.

If the value has a "-" then for the check, split the two numbers at the "-". Then it is a simple matter of checking to see if the number you are checking for is greater than or equal to the first number AND less than or equal to the second number. This will denote it is in the range.

For values without a "-" it is a simple check to see if it is equal.

Perhaps not an elegant approach, but it would get the job done (it seemed to me that you were looking for the method to get at the comparisons and not for the finished script itself, so I am hoping the above provides you with that brainstorming).

Rhim · Accepted Answer · 2014-07-21 19:11:09Z

0

This example uses function match.

awk -F ',' '{num = 348; i = 0; while(i <= NF) {i++; match($i,/([0-9]+)-?([0-9]*)/,arr); if(arr[1] == num || (arr[1] <= num && num <= arr[2])){count++}}} END {print count}' file

edited Jul 21, 2014 at 19:11

answered Jul 21, 2014 at 19:03

Rhim

2392 silver badges4 bronze badges

Add a comment |

user78256 · Accepted Answer · 2014-07-21 22:27:48Z

Assuming your input is wellformed, file with list and number as parameter, this should work in PHP:

<?php
$count = 0;
foreach(explode("\n",file_get_contents($argv[1])) as $line)
foreach(explode(",",$line) as $cols)
{
    $data = split(',',$cols);
    if(((count($data)>0)&&($data[0]==$argv[2])) ||
        (count($data)>1)&&(($data[0]-$argv[2])*($data[1]-$argv[2]) < 0))
        count++;
}
echo $count;

put the code in a file script.php and call it from bash like this:

php script.php testme.txt 348

Stack Exchange Network

Finding the number of time a particular number in a file where range also specified

6 Answers 6

You must log in to answer this question.

Hot Network Questions

Finding the number of time a particular number in a file where range also specified

6 Answers 6

You must log in to answer this question.

Related

Hot Network Questions