Skip to main content
19 votes
Accepted

Find any line in VI that has something other than ATCG

First of all, you definitely do not want to open the file in an editor (it's much too large to edit that way). Instead, if you just want to identify whether the file contains anything other than A, T,...
Kusalananda's user avatar
  • 356k
17 votes
Accepted

Explanation of a sed command

The command sed -e 's/\(.\)/\1\n/g' is a single GNU sed substitution command that replaces every character with itself, followed by a newline character. The effect of this is to fold the input into a ...
Kusalananda's user avatar
  • 356k
16 votes
Accepted

Removing rows containing NA in every column

With awk: awk '{ for (i=2;i<=NF;i++) if ($i!="NA"){ print; break } }' file Loop through the fields starting at the second field and print the line if a field not containing NA is found. Then break ...
Freddy's user avatar
  • 26.3k
15 votes

bash script quoting frustration

The shell uses quotes to identify where words (tokens) should be separated. They are not usually part of the text that a command sees. bcftools filter -i "INFO/RegionType='Core'" -Oz -o ...
Chris Davies's user avatar
13 votes
Accepted

How to count the number of characters in a line, except a specific character?

GNU awk solution: awk -v FPAT='[^N[:space:]]' '{ print NF }' file FPAT='[^N[:space:]]' - the pattern defining a field value (any character except N char and whitespace) The expected output: 1 1 1 0 ...
RomanPerekhrest's user avatar
12 votes
Accepted

How to convert all text into 1 except "0"s and first two fields in a csv file?

With awk you could do: awk 'BEGIN {FS=OFS=","} {for (i=3;i<=NF;i++) {$i==0?1:$i=1}} 1' test.csv BEGIN {FS=OFS=","} - set in- and output separator to comma for (i=3;i<=NF;i++) ...
FelixJN's user avatar
  • 14.1k
12 votes

Replace new lines with spaces using awk

Assuming that the lines are ordered in the manner that you show in the question, then the paste command can do that: $ paste - - < input_file A1_R1.fastq.gz A1_R2.fastq.gz A2_R1.fastq.gz A2_R2....
canupseq's user avatar
  • 1,974
11 votes
Accepted

AWK to replace character for lines not starting with ">"

It seems more natural to do this with sed: sed '/^>/!y/./X/' Sfr.pep >Sfr2.pep This would match ^> against the current line ("does this line start with a > character?"). If that ...
Kusalananda's user avatar
  • 356k
11 votes

Explanation of a sed command

I hope this makes it clearer. "I know that it is compose of three substitute commands" It's just one substitute command (if you are referring to the sed command): s/<pattern to search&...
schrodingerscatcuriosity's user avatar
11 votes

bash loop to replace middle of string after a certain character

I think you were very close to a working command. This worked for me on the few examples you gave: sed -E 's/_[0-9]+ /|/' "$file" > "$file.1" I changed the match expression ...
Sotto Voce's user avatar
  • 7,211
10 votes
Accepted

Counting a specific consecutive character with its occurrence position and length

You could do that with awk, whose match() that sets the RSTART and RLENGTH variable is quite useful for that: <mySequence.fasta awk -v C=N '{ i=0 while (match($0, C "+")) { printf "...
Stéphane Chazelas's user avatar
10 votes

Removing rows containing NA in every column

Using GNU sed sed -e '/g[0-9]\+\(\s*NA\s*\)\+$/d' filename Short explanation: g[0-9]\+\(\s*NA\s*\)\+$ is a regex matching g followed by at least one digit, then any number of NAs with optional ...
eike's user avatar
  • 548
9 votes

How to count the number of characters in a line, except a specific character?

awk '{ gsub("[ N]",""); print length() }'
Hauke Laging's user avatar
  • 94.6k
9 votes

extract lines that match a list of words in another file

grep -Fw -f words file This would extract the lines from file that contain any of the words in the words file. The strings in words are treated as fixed strings (not regular expressions) due to the -...
Kusalananda's user avatar
  • 356k
9 votes

Removing rows containing NA in every column

With all from the Perl List::Util module: $ perl -MList::Util=all -alne 'shift @F; print unless all { $_ eq "NA" } @F' file gene v1 v2 v3 v4 g2 NA NA 2 3 g4 1 2 3 2
steeldriver's user avatar
  • 83.8k
9 votes

Removing rows containing NA in every column

With grep: egrep -v -x 'g[0-9]+([[:blank:]]+NA)*[[:blank:]]*' filename This causes grep to not display (-v) lines where the entire line (-x) matches: lower case g in first column, followed by one or ...
Jim L.'s user avatar
  • 8,785
9 votes

AWK to replace character for lines not starting with ">"

You can try with: awk '!/^>/ { gsub(/\./, "X") }1' Sfr.pep > Sfr2.pep Output: >sequence.1 GTCAGTCAGTCAXGTCAGTCA
schrodingerscatcuriosity's user avatar
9 votes

Replace new lines with spaces using awk

For the input you show where all the paired lines are next to each other all you need with any awk is: $ awk '{ORS=(NR%2 ? "\t" : RS)} 1' file A1_R1.fastq.gz A1_R2.fastq.gz A2_R1.fastq.gz ...
Ed Morton's user avatar
  • 35.9k
9 votes
Accepted

how to pass environment variables to singularity exec

The point of Singularity is that it runs software inside a container. The container is isolated from its host so that it works the same everywhere. The behavior of the container does not depend on ...
Gilles 'SO- stop being evil''s user avatar
8 votes

How to display the difference between two DNA Sequences via command line tools

Is this what you are after? awk '{$3=$1;sub($2,"",$3)}1' file $3=$1 copies the 1st field to the 3rd field and sub($2,"",$3) looks for the 2nd field in the 3rd field. If there is ...
Quasímodo's user avatar
  • 19.4k
7 votes

How to count the number of characters in a line, except a specific character?

assuming that count is needed for each line other than space character and N $ perl -lne 'print tr/N //c' ip.txt 1 1 1 0 1 2 2 return value of tr is how many characters were replaced c to complement ...
Sundeep's user avatar
  • 12.2k
7 votes

How to count the number of characters in a line, except a specific character?

Another awk approach (will return -1 for empty lines). awk -F'[^N ]' '$0=NF-1""' infile Or in complex, it will return -1 on empty lines, 0 on whitespaces (Tabs/Spaces) lines only. awk -F'[^N \t]+' '$...
αғsнιη's user avatar
  • 41.9k
7 votes
Accepted

Replace pattern between two characters

.* is a greedy regexp, matching the longest possible match. You need to match the shortest match but match it globally on the whole line. Try sed 's/-[^:-]*:/:/g' 1.file > 2.file The character ...
NickD's user avatar
  • 3,028
7 votes

Cleaning a genes database polluted by non-numeric characters except plus and minus signs

With tr, transliterating characters from the complement of the wanted set to spaces, and squeezing repeats: $ tr -sc '[:alnum:][:space:]+-' ' ' < data chr2 74711 127472363 Pos1 0 + chr3 74723 ...
steeldriver's user avatar
  • 83.8k
7 votes
Accepted

awk cuts strings

GFF is a tab-separated format but you are are not using tabs. Unless you use -F'\t' or BEGIN{FS="\t"}, awk will use any whitespace as the field delimiter, and that includes spaces. Since you are ...
terdon's user avatar
  • 252k
7 votes

Remove all sub-fields in column-organized datafile that contain "_XX"

Here is the solution: sed 's/;[^;]*_XX[^;]*//g' You need to look for _XX within two ;s and so, you should let every other character pass.
unxnut's user avatar
  • 6,124
7 votes

Identify strings between patterns and print entire region between pattern if string is found. Perferably using awk

Using Raku (formerly known as Perl_6) ~$ raku -MXML -e 'my $xml = open-xml($*ARGFILES.Str); \ .say for $xml.getElementsByTagName("entry").grep(/ TSPAN6 | TNMD /).pairs;' ...
jubilatious1's user avatar
  • 3,923
7 votes
Accepted

filter lines based on some criteria

Here's a perl way: $ perl -F'\t' -lane ' if(/^#/){ print; next }; $F[7] =~ /\bSVLEN=(\d+)/; $svlen=$1; $F[7] =~ /\bSVCALLERS=([^;]+)/; @callers=split(/,/,$1); print if $svlen > 100 ...
terdon's user avatar
  • 252k
7 votes

How to split a given column's string values in a text file

With perl: $ perl -lane 'printf "%6s %s\n", $F[0], join " ", split "", $F[1]' <your-file 12345 0 1 0 2 0 1 0 2 0 5 54322 2 2 2 1 1 1 0 0 5 1 123456 1 1 2 2 0 1 1 5 1 ...
Stéphane Chazelas's user avatar

Only top scored, non community-wiki answers of a minimum length are eligible