I have the following data that I am processing to get the 1st and 5th column, convert the D format to E format and delete rows that have gibberish numbers such as 9.410-316.
DEG = 1.500
2.600D+01 0.000D+00 0.000D+00 0.000D+00 0.000D+00
2.700D+01 8.720-304 2.369-316 7.556-316 9.410-316
4.300D+01 1.208D-83 4.156D-96 7.360D-96 6.984D-96
1.590D+02 8.002D-07 6.555D-19 7.748D-19 7.376D-19
1.600D+02 1.173D-06 9.669D-19 1.143D-18 1.089D-18
1.610D+02 1.709D-06 1.417D-18 1.676D-18 1.596D+01
1.620D+02 2.468D-06 2.058D-18 2.436D-18 2.320D-10
DEG = 18.500
2.700D+01 2.794-314 0.000D+00 0.000D+00 0.000D+00
2.800D+01 4.352-285 1.224-297 3.685-297 4.412-297
8.800D+01 1.371D-02 6.564D-15 7.852D-15 7.275D-15
My problem is in identifying the number formats that I want to delete. So far, I have tried
maxa=18.5
maxangle=$(printf "%.3f" $maxa)
if (( $(echo "$maxa < 10" | bc -l) )); then
txt2search="DEG = $maxangle"
# 6 spaces between = and value if deg=>10, else only 5)
else
txt2search="DEG = $maxangle"
fi
line=$(grep -n "$txt2search" file | cut -d : -f 1)
# Once the line number is read for the string, skip a few lines (4) and read next several lines(1000)
beginline=$((line + 4))
endline=$((line + 1002))
awk -v a="$beginline" -v b="$endline" 'NR==a, NR==b {print $1, $5}' fileinput > fileoutput
sed -i 's/D/E/g' fileoutput
Then, to discard the rows with the nonsense numbers, I tried (one at a time) and failed with the following commands.
sed -ni '/E/p' fileoutput
sed -E '/(E)/!d' fileoutput > spec2.tempdata
sed '/E/!d' fileoutput > spec2.tempdata
awk '!/E/' fileoutput > spec2.tempdata
How can I identify and remove lines with such nonsense numbers? The versions are
- sed (GNU sed) 4.7
- grep (GNU grep) 3.4
- GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
The output would be
2.600D+01 0.000D+00 0.000D+00 0.000D+00 0.000D+00
4.300D+01 1.208D-83 4.156D-96 7.360D-96 6.984D-96
1.590D+02 8.002D-07 6.555D-19 7.748D-19 7.376D-19
1.600D+02 1.173D-06 9.669D-19 1.143D-18 1.089D-18
1.610D+02 1.709D-06 1.417D-18 1.676D-18 1.596D+01
1.620D+02 2.468D-06 2.058D-18 2.436D-18 2.320D-10
EDIT: The solution that I was looking for is (see first comment)
grep -v '[0-9]-'
x.y-z, you can probably just usegrep -v '[0-9]-'4and1002in your code come from? Are they related to your input?D, and would then be very close to the machine epsilon of double precision IEEE754 floating point numbers. That's a bit much of a coincidence. I don't believe in computers producing gibberish out of thin air – these numbers came into that dataset somehow, and you erasing a lot of very small numbers sounds like you're falsifying a statistic just because you're too lazy.Dis ommitted for space reasons!9.410-316cannot be interpreted as a valid number? I mean, 9.410-316=−306.59. Could we think of it another way, can we simply remove any entry if it has a-unless it is the first character or it follows aDorE? And what should we do with the removed fields? Leave them blank? Add some filler? What is the expected output here?