0

I have header starting with '>' and I want fix the header by keeping first word and removing other shown as in output.txt and print it

input.txt

>AGAJ01065549.1 scaffold:Xipmac4.4.2:AGAJ01065549.1:1:500:1 REF
CGCCAGGTGTCTGGCGTAATAGCGCCAGCGCCAGGTGTCATATACGTAATAGCGCCAGGT
>RGAMMT01065456.1 scaffold:Xipmac4.4.2:AGAJ01065595.1:1:500:1 REF
GACTAGTTTTTACATATAGTAATGGTTATTCGGAAGTGTACAGACGTTTTCAGGTTTTTT
TTTGGTAGGGGTTGAGGTGTTGAGGTGAGGGGACTATGTGGAGGGAACTTTCCATAGAGG

output.txt

>AGAJ01065549.1 
CGCCAGGTGTCTGGCGTAATAGCGCCAGCGCCAGGTGTCATATACGTAATAGCGCCAGGT
>RGAMMT01065456.1 
GACTAGTTTTTACATATAGTAATGGTTATTCGGAAGTGTACAGACGTTTTCAGGTTTTTT
TTTGGTAGGGGTTGAGGTGTTGAGGTGAGGGGACTATGTGGAGGGAACTTTCCATAGAGG

3 Answers 3

3

This might work for you (GNU sed):

sed -i '/^>/s/\s.*//' file
2

You can do this by piping the text through awk

awk '{print $1}' input.txt

This prints out the first entry of every line (entries are separated with spaces).

2
  • pass the file to awk directly, no need to cat and pipe it. Secondly your solution would break if non-header lines contain embedded spaces Commented Nov 2, 2012 at 21:28
  • True, but it addresses the example given and shows a solution that would work as long as the sample doesn't change. Commented Nov 2, 2012 at 21:34
1

Similar to the answer using awk is cut:

cut -d' ' -f 1 input.txt > output.txt

The -d option sets the delimiter to one space and -f selects the first field.

However you can also use sed:

sed 's,^\([^ ]\+\) .*,\1,' input.txt > output.txt

This command substitutes an expression. It looks the beginning of a line and copies every character into a buffer which is not white space. Furthermore it matches a white space and any other character. sed replaces this line with the buffer content.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.