19
            
            votes
        
            
                
                Accepted
            
        
            
            
        Find any line in VI that has something other than ATCG
                    First of all, you definitely do not want to open the file in an editor (it's much too large to edit that way).
Instead, if you just want to identify whether the file contains anything other than A, T,...
                
            
       
        
            
                17
            
            votes
        
            
                
                Accepted
            
        
            
            
        Explanation of a sed command
                    The command
sed -e 's/\(.\)/\1\n/g'
is a single GNU sed substitution command that replaces every character with itself, followed by a newline character. The effect of this is to fold the input into a ...
                
            
       
        
            
                16
            
            votes
        
            
                
                Accepted
            
        
            
        Removing rows containing NA in every column
                    With awk:
awk '{ for (i=2;i<=NF;i++) if ($i!="NA"){ print; break } }' file
Loop through the fields starting at the second field and print the line if a field not containing NA is found. Then break ...
                
            
       
        
            
                15
            
            votes
        
        
            
            
        bash script quoting frustration
                    The shell uses quotes to identify where words (tokens) should be separated. They are not usually part of the text that a command sees.
bcftools filter -i "INFO/RegionType='Core'" -Oz -o ...
                
            
       
        
            
                13
            
            votes
        
            
                
                Accepted
            
        
            
        How to count the number of characters in a line, except a specific character?
                    GNU awk solution:
awk -v FPAT='[^N[:space:]]' '{ print NF }' file
FPAT='[^N[:space:]]' - the pattern defining a field value (any character except N char and whitespace)
The expected output:
1
1
1
0
...
                
            
       
        
            
                12
            
            votes
        
            
                
                Accepted
            
        
            
            
        How to convert all text into 1 except "0"s and first two fields in a csv file?
                    With awk you could do:
awk 'BEGIN {FS=OFS=","} {for (i=3;i<=NF;i++) {$i==0?1:$i=1}} 1' test.csv
BEGIN {FS=OFS=","} - set in- and output separator to comma
for (i=3;i<=NF;i++) ...
                
            
       
        
            
                12
            
            votes
        
        
            
            
        Replace new lines with spaces using awk
                    Assuming that the lines are ordered in the manner that you show in the question, then the paste command can do that:
$ paste - - < input_file
A1_R1.fastq.gz  A1_R2.fastq.gz
A2_R1.fastq.gz  A2_R2....
                
            
       
        
            
                11
            
            votes
        
            
                
                Accepted
            
        
            
            
        AWK to replace character for lines not starting with ">"
                    It seems more natural to do this with sed:
sed '/^>/!y/./X/' Sfr.pep >Sfr2.pep
This would match ^> against the current line ("does this line start with a > character?").  If that ...
                
            
       
        
            
                11
            
            votes
        
        
            
            
        Explanation of a sed command
                    I hope this makes it clearer.
"I know that it is compose of three substitute commands"
It's just one substitute command (if you are referring to the sed command): s/<pattern to search&...
                
            
       
        
            
                11
            
            votes
        
        
            
            
        bash loop to replace middle of string after a certain character
                    I think you were very close to a working command.  This worked for me on the few examples you gave:
sed -E 's/_[0-9]+ /|/' "$file" > "$file.1"
I changed the match expression ...
                
            
       
        
            
                10
            
            votes
        
            
                
                Accepted
            
        
            
            
        Counting a specific consecutive character with its occurrence position and length
                    You could do that with awk, whose match() that sets the RSTART and RLENGTH variable is quite useful for that:
<mySequence.fasta awk -v C=N '{
  i=0
  while (match($0, C "+")) {
    printf "...
                
            
       
        
            
                10
            
            votes
        
        
            
            
        Removing rows containing NA in every column
                    Using GNU sed
sed -e '/g[0-9]\+\(\s*NA\s*\)\+$/d' filename
Short explanation:
g[0-9]\+\(\s*NA\s*\)\+$ is a regex matching g followed by at least one digit, then any number of NAs with optional ...
                
            
       
        
            
                9
            
            votes
        
        
            
        How to count the number of characters in a line, except a specific character?
                    awk '{ gsub("[ N]",""); print length() }'
                
            
       
        
            
                9
            
            votes
        
        
            
            
        extract lines that match a list of words in another file
                    grep -Fw -f words file
This would extract the lines from file that contain any of the words in the words file.
The strings in words are treated as fixed strings (not regular expressions) due to the -...
                
            
       
        
            
                9
            
            votes
        
        
        Removing rows containing NA in every column
                    With all from the Perl List::Util module:
$ perl -MList::Util=all -alne 'shift @F; print unless all { $_ eq "NA" } @F' file
gene  v1  v2  v3  v4
g2    NA  NA  2   3
g4    1   2   3   2
                
            
       
        
            
                9
            
            votes
        
        
            
            
        Removing rows containing NA in every column
                    With grep:
egrep -v -x 'g[0-9]+([[:blank:]]+NA)*[[:blank:]]*' filename
This causes grep to not display (-v) lines where the entire line (-x) matches:
lower case g in first column, followed by one or ...
                
            
       
        
            
                9
            
            votes
        
        
        AWK to replace character for lines not starting with ">"
                    You can try with:
awk '!/^>/ { gsub(/\./, "X") }1' Sfr.pep > Sfr2.pep
Output:
>sequence.1
GTCAGTCAGTCAXGTCAGTCA
                
            
       
        
            
                9
            
            votes
        
        
            
        Replace new lines with spaces using awk
                    For the input you show where all the paired lines are next to each other all you need with any awk is:
$ awk '{ORS=(NR%2 ? "\t" : RS)} 1' file
A1_R1.fastq.gz  A1_R2.fastq.gz
A2_R1.fastq.gz  ...
                
            
       
        
            
                9
            
            votes
        
            
                
                Accepted
            
        
            
        how to pass environment variables to singularity exec
                    The point of Singularity is that it runs software inside a container. The container is isolated from its host so that it works the same everywhere. The behavior of the container does not depend on ...
                
            
       
        
            
                8
            
            votes
        
        
            
            
        How to display the difference between two DNA Sequences via command line tools
                    Is this what you are after?
awk '{$3=$1;sub($2,"",$3)}1' file
$3=$1 copies the 1st field to the 3rd field  and
sub($2,"",$3) looks for the 2nd field in the 3rd field. If there is ...
                
            
       
        
            
                7
            
            votes
        
        
            
        How to count the number of characters in a line, except a specific character?
                    assuming that count is needed for each line other than space character and N
$ perl -lne 'print tr/N //c' ip.txt 
1
1
1
0
1
2
2
return value of tr is how many characters were replaced
c to complement ...
                
            
       
        
            
                7
            
            votes
        
        
            
            
        How to count the number of characters in a line, except a specific character?
                    Another awk approach (will return -1 for empty lines).
awk -F'[^N ]' '$0=NF-1""' infile
Or in complex, it will return -1 on empty lines, 0 on whitespaces (Tabs/Spaces) lines only.
awk -F'[^N \t]+' '$...
                
            
       
        
            
                7
            
            votes
        
            
                
                Accepted
            
        
            
            
        Replace pattern between two characters
                    .* is a greedy regexp, matching the longest possible match. You need to match the shortest match but match it globally on the whole line. Try
sed 's/-[^:-]*:/:/g' 1.file > 2.file
The character ...
                
            
       
        
            
                7
            
            votes
        
        
            
            
        Cleaning a genes database polluted by non-numeric characters except plus and minus signs
                    With tr, transliterating characters from the complement of the wanted set to spaces, and squeezing repeats:
$ tr -sc '[:alnum:][:space:]+-' ' ' < data
chr2 74711 127472363 Pos1 0 +
chr3 74723 ...
                
            
       
        
            
                7
            
            votes
        
            
                
                Accepted
            
        
            
            
        awk cuts strings
                    GFF is a tab-separated format but you are are not using tabs. Unless you use -F'\t' or BEGIN{FS="\t"}, awk will use any whitespace as the field delimiter, and that includes spaces. Since you are ...
                
            
       
        
            
                7
            
            votes
        
        
            
        Remove all sub-fields in column-organized datafile that contain "_XX"
                    Here is the solution:
sed 's/;[^;]*_XX[^;]*//g'
You need to look for _XX within two ;s and so, you should let every other character pass.
                
            
       
        
            
                7
            
            votes
        
        
            
        Identify strings between patterns and print entire region between pattern if string is found. Perferably using awk
                    Using Raku (formerly known as Perl_6)
~$ raku -MXML -e 'my $xml = open-xml($*ARGFILES.Str);  \
                  .say for $xml.getElementsByTagName("entry").grep(/ TSPAN6 | TNMD /).pairs;'  ...
                
            
       
        
            
                7
            
            votes
        
            
                
                Accepted
            
        
            
            
        filter lines based on some criteria
                    Here's a perl way:
$ perl -F'\t' -lane '
  if(/^#/){ print; next }; 
  $F[7] =~ /\bSVLEN=(\d+)/; 
  $svlen=$1; 
  $F[7] =~ /\bSVCALLERS=([^;]+)/; 
  @callers=split(/,/,$1); 
  print if $svlen > 100 ...
                
            
       
        
            
                7
            
            votes
        
        
            
        How to split a given column's string values in a text file
                    With perl:
$ perl -lane 'printf "%6s %s\n", $F[0], join " ", split "", $F[1]' <your-file
 12345 0 1 0 2 0 1 0 2 0 5
 54322 2 2 2 1 1 1 0 0 5 1
123456 1 1 2 2 0 1 1 5 1 ...
                
            
       
        Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
bioinformatics × 322text-processing × 187
awk × 129
sed × 66
linux × 46
shell-script × 36
bash × 31
grep × 27
scripting × 17
command-line × 13
perl × 13
files × 11
python × 10
regular-expression × 9
sort × 9
shell × 8
for × 7
text-formatting × 6
join × 6
pattern-matching × 6
cut × 5
columns × 5
ubuntu × 4
pipe × 3
filenames × 3
 
         
         
         
         
         
         
         
         
         
         
        