Skip to main content
added 145 characters in body
Source Link
highsciguy
  • 2.6k
  • 4
  • 25
  • 29

I have a long text file (a tab-file for stardict-editor) which consists of lines in the following format:

word1  some text
word1  some other text
word2  more text
word3  even more

and would like to convert it to

word1  some text<br>some other text
word2  more text
word3  even more

This means that subsequent lines (the file is sorted) which start with the same word should be merged to a single one (here the definitions are separated with <br>). Lines with equal beginning can also appear more often than just twice. The The character which separates word and definition is a tab-character and is unique on each line. word1, word2, word3 are of course placeholders for something arbitrary (except tab and newline characters) which I don't know in advance.

I can think of a longer piece of Perl code which does this, but wonder if there is a short solution in Perl or something for the command line. Any ideas?

I have a long text file (a tab-file for stardict-editor) which consists of lines in the following format:

word1  some text
word1  some other text
word2  more text
word3  even more

and would like to convert it to

word1  some text<br>some other text
word2  more text
word3  even more

This means that subsequent lines (the file is sorted) which start with the same word should be merged to a single one (here the definitions are separated with <br>). Lines with equal beginning can also appear more often than just twice. The character which separates word and definition is a tab-character and is unique on each line.

I can think of a longer piece of Perl code which does this, but wonder if there is a short solution in Perl or something for the command line. Any ideas?

I have a long text file (a tab-file for stardict-editor) which consists of lines in the following format:

word1  some text
word1  some other text
word2  more text
word3  even more

and would like to convert it to

word1  some text<br>some other text
word2  more text
word3  even more

This means that subsequent lines (the file is sorted) which start with the same word should be merged to a single one (here the definitions are separated with <br>). Lines with equal beginning can also appear more often than just twice. The character which separates word and definition is a tab-character and is unique on each line. word1, word2, word3 are of course placeholders for something arbitrary (except tab and newline characters) which I don't know in advance.

I can think of a longer piece of Perl code which does this, but wonder if there is a short solution in Perl or something for the command line. Any ideas?

Source Link
highsciguy
  • 2.6k
  • 4
  • 25
  • 29

Join lines of text with repeated beginning

I have a long text file (a tab-file for stardict-editor) which consists of lines in the following format:

word1  some text
word1  some other text
word2  more text
word3  even more

and would like to convert it to

word1  some text<br>some other text
word2  more text
word3  even more

This means that subsequent lines (the file is sorted) which start with the same word should be merged to a single one (here the definitions are separated with <br>). Lines with equal beginning can also appear more often than just twice. The character which separates word and definition is a tab-character and is unique on each line.

I can think of a longer piece of Perl code which does this, but wonder if there is a short solution in Perl or something for the command line. Any ideas?