Timeline for exclude a string when a range of numbers is successfully found. Followup to a previous questions about searching strings in a txt of house addresses

Current License: CC BY-SA 4.0

13 events

when toggle format	what		by	license	comment
May 10, 2021 at 14:33	history	edited	YMGenesis	CC BY-SA 4.0	deleted 272 characters in body
May 10, 2021 at 14:22	comment	added	YMGenesis		@cas That also makes a lot of sense. I guess my head isn't at that stage yet where I can think of these kinds of solutions, so I appreciate the input.
May 10, 2021 at 14:22	comment	added	YMGenesis		@icarus That's very helpful to start 'sanitizing' my data for the range.
May 10, 2021 at 14:15	vote	accept	YMGenesis
May 10, 2021 at 6:41	comment	added	cas		Then you either don't need to use `-F'[- ]'` as the field separator, or you can make awk split the input fields again with `{$1 = $1 "-" $1; $0=$0}`. The point is to have awk do the transformation on the fly, while it's reading the file(s).
May 10, 2021 at 6:33	comment	added	cas		@YMGenesis changing $1 if it doesn't include a range is a good idea - but you don't have to change all your input files. just do something like `$1 !~ /-/ {$1 = $1 "-" $1}`, or `$1 ~ /^[[:digit:]]+$/ {$1 = $1 "-" $1}` in your awk script.
May 10, 2021 at 5:26	answer	added	Kamil Maciorowski		timeline score: 1
May 10, 2021 at 4:44	comment	added	icarus		`sed '/^[0-9]* /s/^$[0-9]*$/\1-\1/' addresses.txt> newaddresses.txt` might be a good starting point.
May 9, 2021 at 23:22	history	edited	YMGenesis	CC BY-SA 4.0	added 272 characters in body
May 9, 2021 at 23:16	comment	added	YMGenesis		continued: But, I think if it makes more sense to add 1-1 ranges (two people have suggested it now), maybe I'll just bite the bullet and start changing the data. I appreciate the input.
May 9, 2021 at 23:12	comment	added	YMGenesis		Yes there will be an overlap in some cases. Kamil did suggest the 1-1 range solution. My addresses.txt data is around 2000 lines long, so I'd have to go in and change each single number to a single 1-1 range. I agree with the philosophy you mention, clean and simple. Would just take a while to change the data. But if it works, it works. The data itself doesn't change much. It's supposed to be a database of addresses which specify which route a letter carrier delivers to (the number after the colon), so it doesn't change much at all. Maybe once a year or less. Little changes here and there.
May 9, 2021 at 22:58	comment	added	icarus		Will there ever been an overlap between the range and the single numbers (or even a different range)? e.g. 3 fastest rd: 99 1-58 fastest rd: 98. In general there are two ways to approach this, (1) Have clean data and a simple program, or (2) have dirty data and a complicated program. For this case I think clean data is a good approach, so perhaps you can change your "1 test st: 1" line to "1-1 test st: 1" so there are never any cases where you don't have a range (even if the range is just 1 long). This is not a good approach if the data changes frequently so can you tell us this as well?
May 9, 2021 at 22:33	history	asked	YMGenesis	CC BY-SA 4.0