Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

7
  • 1
    This looks like fasta files. How big are the files? This matters for the solution. If there is too much data, you can't simply use the DNA as keys in an associative array. Commented Mar 12, 2018 at 9:53
  • One of the files is fasta format, the other one is text with each line having a different pattern. The file in fasta format is very large (130 million reads, 260 million lines), the pattern file contains 2000 different patterns. Commented Mar 12, 2018 at 10:02
  • Do the sequences in the fasta (search) file span multiple lines, or are they all short sequences on single lines? Commented Mar 12, 2018 at 10:05
  • The sequences in the fasta format are up to 260 characters (nucleotides) long. Commented Mar 12, 2018 at 10:08
  • 1
    to 1) the fasta file starts with > but the test file not just starts with the numbers (I could have been more correct here, sorry) ; to 2) just one sequence per line and no multiple lines Commented Mar 12, 2018 at 11:22