Return to Answer

updated solution

Source Link

edited Jun 14, 2017 at 15:39

30.8k
5
47
68

The output from the file should have first two columns and the AFLA_* ids

awk solution:

awk -F'[[:space:]]+|\\|AFLA_' -v pfx="ALFA_" 
     '{ print $1,$2,pfx $9,pfx substr($10,0,index($10,"|")-1) }' yourfile

The output:

EQ963472 29264 072280ALFA_072280 072280ALFA_072280
EQ963472 31777 072310ALFA_072310 072310ALFA_072310
EQ963472 58523 072370ALFA_072370 072370ALFA_072370
EQ963472 171022 072870ALFA_072870 072870ALFA_072870
EQ963472 174382 072890ALFA_072890 072890ALFA_072890
EQ963472 185314 072940ALFA_072940 072940ALFA_072940
EQ963472 188490 072960ALFA_072960 072960ALFA_072960

-F'[[:space:]]+|\\|AFLA_' - considering whitespace(s) and |AFLA_ sequence as field separators

The output from the file should have first two columns and the AFLA_* ids

awk solution:

awk -F'[[:space:]]+|\\|AFLA_' '{ print $1,$2,$9,substr($10,0,index($10,"|")-1) }' yourfile

The output:

EQ963472 29264 072280 072280
EQ963472 31777 072310 072310
EQ963472 58523 072370 072370
EQ963472 171022 072870 072870
EQ963472 174382 072890 072890
EQ963472 185314 072940 072940
EQ963472 188490 072960 072960

-F'[[:space:]]+|\\|AFLA_' - considering whitespace(s) and |AFLA_ sequence as field separators

The output from the file should have first two columns and the AFLA_* ids

awk solution:

awk -F'[[:space:]]+|\\|AFLA_' -v pfx="ALFA_" 
     '{ print $1,$2,pfx $9,pfx substr($10,0,index($10,"|")-1) }' yourfile

The output:

EQ963472 29264 ALFA_072280 ALFA_072280
EQ963472 31777 ALFA_072310 ALFA_072310
EQ963472 58523 ALFA_072370 ALFA_072370
EQ963472 171022 ALFA_072870 ALFA_072870
EQ963472 174382 ALFA_072890 ALFA_072890
EQ963472 185314 ALFA_072940 ALFA_072940
EQ963472 188490 ALFA_072960 ALFA_072960

-F'[[:space:]]+|\\|AFLA_' - considering whitespace(s) and |AFLA_ sequence as field separators

Source Link

answered Jun 14, 2017 at 15:26

RomanPerekhrest

30.8k
5
47
68

The output from the file should have first two columns and the AFLA_* ids

awk solution:

awk -F'[[:space:]]+|\\|AFLA_' '{ print $1,$2,$9,substr($10,0,index($10,"|")-1) }' yourfile

The output:

EQ963472 29264 072280 072280
EQ963472 31777 072310 072310
EQ963472 58523 072370 072370
EQ963472 171022 072870 072870
EQ963472 174382 072890 072890
EQ963472 185314 072940 072940
EQ963472 188490 072960 072960

-F'[[:space:]]+|\\|AFLA_' - considering whitespace(s) and |AFLA_ sequence as field separators