Return to Question

added 145 characters in body

Source Link

edited Apr 1, 2015 at 8:42

2.6k
4
25
29

I have a long text file (a tab-file for stardict-editor) which consists of lines in the following format:

word1  some text
word1  some other text
word2  more text
word3  even more

and would like to convert it to

word1  some text<br>some other text
word2  more text
word3  even more

This means that subsequent lines (the file is sorted) which start with the same word should be merged to a single one (here the definitions are separated with <br>). Lines with equal beginning can also appear more often than just twice. The The character which separates word and definition is a tab-character and is unique on each line. word1, word2, word3 are of course placeholders for something arbitrary (except tab and newline characters) which I don't know in advance.

I can think of a longer piece of Perl code which does this, but wonder if there is a short solution in Perl or something for the command line. Any ideas?

I have a long text file (a tab-file for stardict-editor) which consists of lines in the following format:

word1  some text
word1  some other text
word2  more text
word3  even more

and would like to convert it to

word1  some text<br>some other text
word2  more text
word3  even more

I can think of a longer piece of Perl code which does this, but wonder if there is a short solution in Perl or something for the command line. Any ideas?

I have a long text file (a tab-file for stardict-editor) which consists of lines in the following format:

word1  some text
word1  some other text
word2  more text
word3  even more

and would like to convert it to

word1  some text<br>some other text
word2  more text
word3  even more

This means that subsequent lines (the file is sorted) which start with the same word should be merged to a single one (here the definitions are separated with <br>). Lines with equal beginning can also appear more often than just twice. The character which separates word and definition is a tab-character and is unique on each line. word1, word2, word3 are of course placeholders for something arbitrary (except tab and newline characters) which I don't know in advance.

I can think of a longer piece of Perl code which does this, but wonder if there is a short solution in Perl or something for the command line. Any ideas?

Source Link

asked Apr 1, 2015 at 8:26

highsciguy

2.6k
4
25
29

Join lines of text with repeated beginning

I have a long text file (a tab-file for stardict-editor) which consists of lines in the following format:

word1  some text
word1  some other text
word2  more text
word3  even more

and would like to convert it to

word1  some text<br>some other text
word2  more text
word3  even more

I can think of a longer piece of Perl code which does this, but wonder if there is a short solution in Perl or something for the command line. Any ideas?

command-line text-processing