Timeline for exclude a string when a range of numbers is successfully found. Followup to a previous questions about searching strings in a txt of house addresses
Current License: CC BY-SA 4.0
13 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| May 10, 2021 at 14:33 | history | edited | YMGenesis | CC BY-SA 4.0 |
deleted 272 characters in body
|
| May 10, 2021 at 14:22 | comment | added | YMGenesis | @cas That also makes a lot of sense. I guess my head isn't at that stage yet where I can think of these kinds of solutions, so I appreciate the input. | |
| May 10, 2021 at 14:22 | comment | added | YMGenesis | @icarus That's very helpful to start 'sanitizing' my data for the range. | |
| May 10, 2021 at 14:15 | vote | accept | YMGenesis | ||
| May 10, 2021 at 6:41 | comment | added | cas |
Then you either don't need to use -F'[- ]' as the field separator, or you can make awk split the input fields again with {$1 = $1 "-" $1; $0=$0}. The point is to have awk do the transformation on the fly, while it's reading the file(s).
|
|
| May 10, 2021 at 6:33 | comment | added | cas |
@YMGenesis changing $1 if it doesn't include a range is a good idea - but you don't have to change all your input files. just do something like $1 !~ /-/ {$1 = $1 "-" $1}, or $1 ~ /^[[:digit:]]+$/ {$1 = $1 "-" $1} in your awk script.
|
|
| May 10, 2021 at 5:26 | answer | added | Kamil Maciorowski | timeline score: 1 | |
| May 10, 2021 at 4:44 | comment | added | icarus |
sed '/^[0-9]* /s/^\([0-9]*\)/\1-\1/' addresses.txt> newaddresses.txt might be a good starting point.
|
|
| May 9, 2021 at 23:22 | history | edited | YMGenesis | CC BY-SA 4.0 |
added 272 characters in body
|
| May 9, 2021 at 23:16 | comment | added | YMGenesis | continued: But, I think if it makes more sense to add 1-1 ranges (two people have suggested it now), maybe I'll just bite the bullet and start changing the data. I appreciate the input. | |
| May 9, 2021 at 23:12 | comment | added | YMGenesis | Yes there will be an overlap in some cases. Kamil did suggest the 1-1 range solution. My addresses.txt data is around 2000 lines long, so I'd have to go in and change each single number to a single 1-1 range. I agree with the philosophy you mention, clean and simple. Would just take a while to change the data. But if it works, it works. The data itself doesn't change much. It's supposed to be a database of addresses which specify which route a letter carrier delivers to (the number after the colon), so it doesn't change much at all. Maybe once a year or less. Little changes here and there. | |
| May 9, 2021 at 22:58 | comment | added | icarus | Will there ever been an overlap between the range and the single numbers (or even a different range)? e.g. 3 fastest rd: 99 1-58 fastest rd: 98. In general there are two ways to approach this, (1) Have clean data and a simple program, or (2) have dirty data and a complicated program. For this case I think clean data is a good approach, so perhaps you can change your "1 test st: 1" line to "1-1 test st: 1" so there are never any cases where you don't have a range (even if the range is just 1 long). This is not a good approach if the data changes frequently so can you tell us this as well? | |
| May 9, 2021 at 22:33 | history | asked | YMGenesis | CC BY-SA 4.0 |