Finding and removing words beginning / ending with numbers

Question

I have a large file (>10000 lines) that contains one word per line, with a newline character after each word. The words contain no spaces.

I'd like to list (or even better, output to a new file) any words that start and/or end with a number, then I'd like to remove these from the original file. But I don't want to remove words that just contain numbers.

For example, if I had the contents

789
hello
1hello
112121hello3323
he11o
hello9
88888

Then the strings 1hello, 112121hello3323, hello9 would get output and then removed from the file.

How can I do this?

123 · Accepted Answer · 2016-02-09 11:30:53Z

2

GNU grep

grep -vP '^\d+\D|\D\d+$'

produces

789
hello
he11o
88888

answered Feb 9, 2016 at 11:30

123

1,5527 silver badges9 bronze badges

1

or POSIXly, grep -vE '^[[:digit:]]+[^[:digit:]]|[^[:digit:]][[:digit:]]+$'

iruvar
– iruvar

2016-02-09 13:37:07 +00:00
Commented Feb 9, 2016 at 13:37
@1_CR Yeah i was going to add that but wasn't 100% that -E was completely posix compliant.

123
– 123

2016-02-09 13:48:37 +00:00
Commented Feb 9, 2016 at 13:48

Add a comment |

glenn jackman · Accepted Answer · 2016-02-09 14:32:11Z

1

To actually edit the source file and create a new file with the discards is a bit trickier. I would do this

$ cat file
789
hello
1hello
112121hello3323
he11o
hello9
88888

$ perl -i -lne 'if (/^\d+\D|\D\d+$/) {warn "$_\n"} else {print}' file 2>file_nums

$ cat file
789
hello
he11o
88888

$ cat file_nums
1hello
112121hello3323
hello9

The matched lines are output on stderr, which is then redirected to a separate file. perl's -i flag takes care of saving the changes in-place.

The one-liner can be even trickier:

perl -i -lne 'print {/^\d+\D|\D\d+$/ ? STDERR : ARGVOUT} $_' file 2>file_nums

answered Feb 9, 2016 at 14:32

glenn jackman

88.5k16 gold badges124 silver badges179 bronze badges

1

Note that this will also write any warnings or errors from perl into file_nums.

123
– 123

2016-02-09 14:53:04 +00:00
Commented Feb 9, 2016 at 14:53

Add a comment |

chaos · Accepted Answer · 2016-02-09 16:56:44Z

1

An awk solution:

awk '$0!~/.*[[:alpha:]][[:digit:]]+$/ && $0!~/^[[:digit:]]+[[:alpha:]]+/' words.txt
789
hello
he11o
88888

edited Feb 9, 2016 at 16:56

chaos

49.3k11 gold badges127 silver badges147 bronze badges

answered Feb 9, 2016 at 16:36

Sergiy Kolodyazhnyy

16.9k12 gold badges58 silver badges111 bronze badges

Add a comment |

Stack Exchange Network

Finding and removing words beginning / ending with numbers

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Finding and removing words beginning / ending with numbers

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions