Tag Info

Hot answers tagged bioinformatics

19 votes

Accepted

Find any line in VI that has something other than ATCG

First of all, you definitely do not want to open the file in an editor (it's much too large to edit that way). Instead, if you just want to identify whether the file contains anything other than A, T,...

Kusalananda♦

356k

answered Aug 31, 2018 at 15:47

17 votes

Accepted

Explanation of a sed command

The command sed -e 's/$.$/\1\n/g' is a single GNU sed substitution command that replaces every character with itself, followed by a newline character. The effect of this is to fold the input into a ...

Kusalananda♦

356k

answered Oct 21, 2021 at 12:18

16 votes

Accepted

Removing rows containing NA in every column

With awk: awk '{ for (i=2;i<=NF;i++) if ($i!="NA"){ print; break } }' file Loop through the fields starting at the second field and print the line if a field not containing NA is found. Then break ...

Freddy

26.3k

answered Sep 16, 2019 at 20:00

15 votes

bash script quoting frustration

The shell uses quotes to identify where words (tokens) should be separated. They are not usually part of the text that a command sees. bcftools filter -i "INFO/RegionType='Core'" -Oz -o ...

Chris Davies

128k

answered Aug 21, 2024 at 21:27

13 votes

Accepted

How to count the number of characters in a line, except a specific character?

GNU awk solution: awk -v FPAT='[^N[:space:]]' '{ print NF }' file FPAT='[^N[:space:]]' - the pattern defining a field value (any character except N char and whitespace) The expected output: 1 1 1 0 ...

RomanPerekhrest

30.9k

answered Oct 6, 2017 at 20:45

12 votes

Accepted

How to convert all text into 1 except "0"s and first two fields in a csv file?

With awk you could do: awk 'BEGIN {FS=OFS=","} {for (i=3;i<=NF;i++) {$i==0?1:$i=1}} 1' test.csv BEGIN {FS=OFS=","} - set in- and output separator to comma for (i=3;i<=NF;i++) ...

FelixJN

14.1k

answered Jul 11, 2023 at 6:28

12 votes

Replace new lines with spaces using awk

Assuming that the lines are ordered in the manner that you show in the question, then the paste command can do that: $ paste - - < input_file A1_R1.fastq.gz A1_R2.fastq.gz A2_R1.fastq.gz A2_R2....

canupseq

1,974

answered Feb 26, 2024 at 9:03

11 votes

Accepted

AWK to replace character for lines not starting with ">"

It seems more natural to do this with sed: sed '/^>/!y/./X/' Sfr.pep >Sfr2.pep This would match ^> against the current line ("does this line start with a > character?"). If that ...

Kusalananda♦

356k

answered Apr 23, 2020 at 17:25

11 votes

Explanation of a sed command

I hope this makes it clearer. "I know that it is compose of three substitute commands" It's just one substitute command (if you are referring to the sed command): s/<pattern to search&...

schrodingerscatcuriosity

12.8k

answered Oct 21, 2021 at 11:51

11 votes

bash loop to replace middle of string after a certain character

I think you were very close to a working command. This worked for me on the few examples you gave: sed -E 's/_[0-9]+ /|/' "$file" > "$file.1" I changed the match expression ...

Sotto Voce

7,211

answered Jul 20, 2022 at 13:14

10 votes

Accepted

Counting a specific consecutive character with its occurrence position and length

You could do that with awk, whose match() that sets the RSTART and RLENGTH variable is quite useful for that: <mySequence.fasta awk -v C=N '{ i=0 while (match($0, C "+")) { printf "...

Stéphane Chazelas

585k

answered Aug 31, 2017 at 6:21

10 votes

Removing rows containing NA in every column

Using GNU sed sed -e '/g[0-9]\+$\s*NA\s*$\+$/d' filename Short explanation: g[0-9]\+$\s*NA\s*$\+$ is a regex matching g followed by at least one digit, then any number of NAs with optional ...

eike

answered Sep 16, 2019 at 19:46

9 votes

How to count the number of characters in a line, except a specific character?

awk '{ gsub("[ N]",""); print length() }'

Hauke Laging

94.6k

answered Oct 6, 2017 at 20:48

9 votes

extract lines that match a list of words in another file

grep -Fw -f words file This would extract the lines from file that contain any of the words in the words file. The strings in words are treated as fixed strings (not regular expressions) due to the -...

Kusalananda♦

356k

answered Jul 25, 2018 at 19:03

9 votes

Removing rows containing NA in every column

With all from the Perl List::Util module: $ perl -MList::Util=all -alne 'shift @F; print unless all { $_ eq "NA" } @F' file gene v1 v2 v3 v4 g2 NA NA 2 3 g4 1 2 3 2

steeldriver

83.8k

answered Sep 16, 2019 at 19:44

9 votes

Removing rows containing NA in every column

With grep: egrep -v -x 'g[0-9]+([[:blank:]]+NA)*[[:blank:]]*' filename This causes grep to not display (-v) lines where the entire line (-x) matches: lower case g in first column, followed by one or ...

Jim L.

8,785

answered Sep 16, 2019 at 21:00

9 votes

AWK to replace character for lines not starting with ">"

You can try with: awk '!/^>/ { gsub(/\./, "X") }1' Sfr.pep > Sfr2.pep Output: >sequence.1 GTCAGTCAGTCAXGTCAGTCA

schrodingerscatcuriosity

12.8k

answered Apr 23, 2020 at 17:16

9 votes

Replace new lines with spaces using awk

For the input you show where all the paired lines are next to each other all you need with any awk is: $ awk '{ORS=(NR%2 ? "\t" : RS)} 1' file A1_R1.fastq.gz A1_R2.fastq.gz A2_R1.fastq.gz ...

Ed Morton

35.9k

answered Feb 26, 2024 at 10:22

9 votes

Accepted

how to pass environment variables to singularity exec

The point of Singularity is that it runs software inside a container. The container is isolated from its host so that it works the same everywhere. The behavior of the container does not depend on ...

Gilles 'SO- stop being evil'

865k

answered Apr 13 at 15:39

8 votes

How to display the difference between two DNA Sequences via command line tools

Is this what you are after? awk '{$3=$1;sub($2,"",$3)}1' file $3=$1 copies the 1st field to the 3rd field and sub($2,"",$3) looks for the 2nd field in the 3rd field. If there is ...

Quasímodo

19.4k

answered Aug 4, 2020 at 11:16

7 votes

How to count the number of characters in a line, except a specific character?

assuming that count is needed for each line other than space character and N $ perl -lne 'print tr/N //c' ip.txt 1 1 1 0 1 2 2 return value of tr is how many characters were replaced c to complement ...

Sundeep

12.2k

answered Oct 7, 2017 at 4:52

7 votes

How to count the number of characters in a line, except a specific character?

Another awk approach (will return -1 for empty lines). awk -F'[^N ]' '$0=NF-1""' infile Or in complex, it will return -1 on empty lines, 0 on whitespaces (Tabs/Spaces) lines only. awk -F'[^N \t]+' '$...

αғsнιη

41.9k

answered Oct 6, 2017 at 21:30

7 votes

Accepted

Replace pattern between two characters

.* is a greedy regexp, matching the longest possible match. You need to match the shortest match but match it globally on the whole line. Try sed 's/-[^:-]*:/:/g' 1.file > 2.file The character ...

NickD

3,028

answered Sep 29, 2017 at 20:51

7 votes

Cleaning a genes database polluted by non-numeric characters except plus and minus signs

With tr, transliterating characters from the complement of the wanted set to spaces, and squeezing repeats: $ tr -sc '[:alnum:][:space:]+-' ' ' < data chr2 74711 127472363 Pos1 0 + chr3 74723 ...

steeldriver

83.8k

answered Oct 7, 2018 at 0:05

7 votes

Accepted

awk cuts strings

GFF is a tab-separated format but you are are not using tabs. Unless you use -F'\t' or BEGIN{FS="\t"}, awk will use any whitespace as the field delimiter, and that includes spaces. Since you are ...

terdon♦

252k

answered May 26, 2020 at 10:50

7 votes

Remove all sub-fields in column-organized datafile that contain "_XX"

Here is the solution: sed 's/;[^;]*_XX[^;]*//g' You need to look for _XX within two ;s and so, you should let every other character pass.

unxnut

6,124

answered Jul 2, 2021 at 16:03

7 votes

Identify strings between patterns and print entire region between pattern if string is found. Perferably using awk

Using Raku (formerly known as Perl_6) ~$ raku -MXML -e 'my $xml = open-xml($*ARGFILES.Str); \ .say for $xml.getElementsByTagName("entry").grep(/ TSPAN6 | TNMD /).pairs;' ...

jubilatious1

3,923

answered Oct 20, 2022 at 22:14

7 votes

Accepted

filter lines based on some criteria

Here's a perl way: $ perl -F'\t' -lane ' if(/^#/){ print; next }; $F[7] =~ /\bSVLEN=(\d+)/; $svlen=$1; $F[7] =~ /\bSVCALLERS=([^;]+)/; @callers=split(/,/,$1); print if $svlen > 100 ...

terdon♦

252k

answered May 18, 2023 at 18:00

7 votes

How to split a given column's string values in a text file

With perl: $ perl -lane 'printf "%6s %s\n", $F[0], join " ", split "", $F[1]' <your-file 12345 0 1 0 2 0 1 0 2 0 5 54322 2 2 2 1 1 1 0 0 5 1 123456 1 1 2 2 0 1 1 5 1 ...

Stéphane Chazelas

585k

answered Jan 26, 2024 at 19:25

Only top scored, non community-wiki answers of a minimum length are eligible

322

questions tagged

bioinformatics

bioinformatics × 322
text-processing × 187
awk × 129
sed × 66
linux × 46
shell-script × 36
bash × 31
grep × 27
scripting × 17
command-line × 13
perl × 13
files × 11
python × 10
regular-expression × 9
sort × 9
shell × 8
for × 7
text-formatting × 6
join × 6
pattern-matching × 6
cut × 5
columns × 5
ubuntu × 4
pipe × 3
filenames × 3

Tag Info

Hot answers tagged bioinformatics

Related Tags