Return to Revisions

4 of 5

added 336 characters in body

edited Oct 7, 2022 at 7:55

355.9k
42
735
1.1k

Assuming the input contains a single word per line, you may use

grep -x '\(.\).*\1' file

... to extract all lines that start and end with the same character. This is done by capturing the first character on the line using $.$, allowing the rest of the characters on the line to be anything (with .*) but then forcing a match of the captured character at the very end using the back-reference \1.

The -x option to grep tells the utility that the pattern must match across a complete line, not just a part of the line. Without -x, you would have to insert explicit anchors in the regular expression to be sure you match complete lines: ^$.$.*\1$

Example run on my system's dictionary, showing only the 5 first results:

$ grep -x '\(.\).*\1' /usr/share/dict/words | head -n 5
aa
aba
abaca
abasia
abepithymia

If you're dealing with input that contains multiple space-delimited words on each line, then you may pre-process that text by splitting it into one word per line first. Here, I additionally convert all characters to lower-case with tr at the same time as replacing spaces with newlines, and I remove duplicates by means of sort -u:

tr ' [:upper:]' '\n[:lower:]' <file | sort -u | grep -x '\(.\).*\1'

Note that this ignores the fact that an "ordinary text" may contain punctuation and other characters that are not part of words.

It is pointed out in comments that the grep command misses single-letter words, which technically starts and end with the same character.

To get these too:

grep -x -e '\(.\).*\1' -e . file

This now returns lines starting and ending with the same character or lines only containing a single character.

answered Oct 7, 2022 at 7:33

Kusalananda ♦

355.9k
42
735
1.1k