1

I have a large file (>10000 lines) that contains one word per line, with a newline character after each word. The words contain no spaces.

I'd like to list (or even better, output to a new file) any words that start and/or end with a number, then I'd like to remove these from the original file. But I don't want to remove words that just contain numbers.

For example, if I had the contents

789
hello
1hello
112121hello3323
he11o
hello9
88888

Then the strings 1hello, 112121hello3323, hello9 would get output and then removed from the file.

How can I do this?

3 Answers 3

2

GNU grep

grep -vP '^\d+\D|\D\d+$'

produces

789
hello
he11o
88888
2
  • 1
    or POSIXly, grep -vE '^[[:digit:]]+[^[:digit:]]|[^[:digit:]][[:digit:]]+$' Commented Feb 9, 2016 at 13:37
  • @1_CR Yeah i was going to add that but wasn't 100% that -E was completely posix compliant. Commented Feb 9, 2016 at 13:48
1

To actually edit the source file and create a new file with the discards is a bit trickier. I would do this

$ cat file
789
hello
1hello
112121hello3323
he11o
hello9
88888

$ perl -i -lne 'if (/^\d+\D|\D\d+$/) {warn "$_\n"} else {print}' file 2>file_nums

$ cat file
789
hello
he11o
88888

$ cat file_nums
1hello
112121hello3323
hello9

The matched lines are output on stderr, which is then redirected to a separate file. perl's -i flag takes care of saving the changes in-place.

The one-liner can be even trickier:

perl -i -lne 'print {/^\d+\D|\D\d+$/ ? STDERR : ARGVOUT} $_' file 2>file_nums
1
  • 1
    Note that this will also write any warnings or errors from perl into file_nums. Commented Feb 9, 2016 at 14:53
1

An awk solution:

awk '$0!~/.*[[:alpha:]][[:digit:]]+$/ && $0!~/^[[:digit:]]+[[:alpha:]]+/' words.txt
789
hello
he11o
88888

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.