Skip to main content
35 events
when toggle format what by license comment
Apr 13, 2017 at 12:36 history edited CommunityBot
replaced http://unix.stackexchange.com/ with https://unix.stackexchange.com/
Apr 7, 2017 at 11:47 history edited user
edited tags
Apr 7, 2017 at 9:05 history edited dr_ CC BY-SA 3.0
Removed noise
Apr 7, 2017 at 8:43 vote accept gugy
Apr 7, 2017 at 8:25 vote accept gugy
Apr 7, 2017 at 8:42
Apr 7, 2017 at 8:25 vote accept gugy
Apr 7, 2017 at 8:25
Apr 7, 2017 at 8:24 vote accept gugy
Apr 7, 2017 at 8:25
Apr 7, 2017 at 8:24 history edited gugy CC BY-SA 3.0
added 380 characters in body
Apr 7, 2017 at 5:24 history tweeted twitter.com/StackUnix/status/850217720729657344
Apr 6, 2017 at 17:39 comment added Konrad Rudolph The dd solution works in a pinch but you need to realise that there are proper parsers for Fasta files. Do not hack your own parser using command line tools, use the proper tools. There are already enough crappy bioinformatics tools that randomly break because people hacked together a bad solution instead of using proper tools.
Apr 6, 2017 at 14:55 answer added user218374 timeline score: 2
Apr 6, 2017 at 13:59 answer added Stéphane Chazelas timeline score: 5
Apr 6, 2017 at 13:51 history edited Jeff Schaller CC BY-SA 3.0
typographical fixes; incorporated newline comment
Apr 6, 2017 at 13:48 history edited Stéphane Chazelas CC BY-SA 3.0
deleted 7 characters in body
Apr 6, 2017 at 13:46 history edited Jeff Schaller
edited tags
Apr 6, 2017 at 13:43 comment added user218374 @gugy Newlines should not be counted but do they need to be present for formatting as your sample o/p shows?
Apr 6, 2017 at 13:20 answer added symcbean timeline score: 2
Apr 6, 2017 at 12:46 comment added gugy @ikkachu: file contains only GATC and newlines, newlines should not be counted. Sourcefile is usually around 20MB (ca 276185 lines) for a full genome and less for only a scaffold
Apr 6, 2017 at 12:43 answer added hschou timeline score: 6
Apr 6, 2017 at 12:11 answer added Jeff Schaller timeline score: 3
Apr 6, 2017 at 12:03 history edited Jeff Schaller CC BY-SA 3.0
brought title in line with body
Apr 6, 2017 at 9:26 comment added ilkkachu @gugy, just to clarify, your count only includes the characters GCAT, but the source file still has newlines which should not be counted on output? Also, what size can the source file be?
Apr 6, 2017 at 9:10 comment added Kamaraj check my answer.. hope it answers for you
Apr 6, 2017 at 9:10 answer added Kamaraj timeline score: 7
Apr 6, 2017 at 9:07 comment added Kamaraj from man command... substr(s, i [, n]) Returns the at most n-character substring of s starting at i. If n is omitted, the rest of s is used.
Apr 6, 2017 at 9:07 comment added gugy The amount of characters per line is always the same. I want to get the characters 10 to 80 from that file, not characters 10 to 80 on each line
Apr 6, 2017 at 9:04 comment added Kamaraj what you exactly want ? say.. we have 100 lines in first line.. 150 lines in second line.. 30 lines in third line.. what is your usecase here ?
Apr 6, 2017 at 9:01 comment added gugy @Kamaraj: this does not work the way I need it. It will print the characters 10 to 70 for each line (if the linea do have that many characters, which they do not...)
Apr 6, 2017 at 8:55 comment added Kamaraj awk '{print substr($0,10,70)}' file
Apr 6, 2017 at 8:45 history edited gugy CC BY-SA 3.0
added 365 characters in body
Apr 6, 2017 at 8:41 comment added Kusalananda Sato's comment gives a neat solution if the file only contains DNA. If it's a fasta-formatted file with one or several headers, his solution sadly won't work.
Apr 6, 2017 at 8:39 comment added gugy @Spandan yes, this is DNA (whole genome, need to extract certain bits of it...)
Apr 6, 2017 at 8:37 comment added Satō Katsura dd if=file bs=1 count=71 skip=9 status=none
Apr 6, 2017 at 8:36 comment added Spandan Is this, a file about DNA information?
Apr 6, 2017 at 8:33 history asked gugy CC BY-SA 3.0