Tag Info

users hot new synonyms

Hot answers tagged bioinformatics

Day Week Month Year All

13 votes

Filter out ambiguous bases from a DNA sequence

Perhaps there aren't enough test cases here, because I don't see why you can't just use: ...

Kraigolas

answered Jun 28, 2021 at 16:16

11 votes

Accepted

Implementing a DNA codon table in C

An array of char* is not a great data structure for storing AA or codon sequence data. A pointer takes 4 or 8 bytes per codon, but there are only 64 possible ...

Peter Cordes

3,761

answered Apr 11, 2018 at 10:10

11 votes

Accepted

Counting relevant entries in a large bioinformatics file

In order to speed this up, you'll need to avoid as many string creation operations as possible, because they are expensive. Especially the split operation is expensive. Not only does this create many ...

RoToRa

11.6k

answered Mar 8, 2018 at 9:35

10 votes

Counting relevant entries in a large bioinformatics file

If you are in for raw performance, try to avoid repeating potentially cost-intensive operations. In this case, you split the lines twice with the same parameter, which repeatedly applies a regular ...

mtj

5,002

answered Mar 8, 2018 at 8:37

10 votes

Accepted

Counting the occurrences of certain amino-acids in a file

Add functions Everything is currently in the global namespace, you can add a few functions to split up the code and make it more readable. You could use a ...

Ludisposed

11.8k

answered Jul 5, 2018 at 9:42

9 votes

Translate nucleic acid sequence into its corresponding amino acid sequence

This covers an interesting topic. Great work! Because I am unfamiliar with this area, I utilized your unit testing to ensure changes I make did not break functionality. If they do, I apologize and ...

esote

3,800

answered Jan 8, 2019 at 7:41

9 votes

Accepted

Simple mutation simulation for use in science class

Naming As already mentioned, you should follow the PEP 8 style guide. But simply converting the leading character of a name to lower case might not be what you want to do in all cases. In function <...

Booboo

3,666

answered Jun 3 at 11:28

8 votes

FASTA-to-tsv conversion script

Welcome to Code Review! I'll add to the other answer from Reinderien. PEP-8 In python, it is common (and recommended) to follow the PEP-8 style guide for writing clean, maintainable and consistent ...

hjpotter92

8,921

answered Oct 13, 2020 at 18:55

8 votes

Filter out ambiguous bases from a DNA sequence

There's an inconsistency between the two functions. check_and_clean_sequence() has an alphabet parameter, but this isn't used ...

Toby Speight

88.4k

answered Jun 28, 2021 at 16:22

7 votes

Implementing a DNA codon table in C

The line (*aminoacid_string) = malloc(aminoacid_count); allocates aminoacid_count of bytes. The code needs that many ...

vnp

58.7k

answered Apr 11, 2018 at 1:43

7 votes

Implementing a DNA codon table in C

You should try to refactor the code to have less hard-coded constants. E.g. nucleobase_to_aminoacid has both tcag and codon_table hard-coded. That is in general something that hinders re-use. You ...

Hans Olsson

answered Apr 11, 2018 at 9:20

7 votes

Accepted

Highly nested bioinformatics processing

Non-code / very-high-level considerations. The first thing you can do to speed up the performance would be to get a better computer. My computer is several years old, but it runs the unmodified code ...

Peter Taylor

24.5k

answered Jun 21, 2018 at 18:50

7 votes

FASTA-to-tsv conversion script

Path? Surely path is not a single path, since you loop through it. So at the least, this is poorly-named and should be paths. ...

Reinderien

71.1k

answered Oct 13, 2020 at 16:56

7 votes

Accepted

Filter out ambiguous bases from a DNA sequence

From the bioinformatics side, not the python one: Your return will be non-useful for further processing whenever an ambiguous base has been present, because it changes index locations! You'll want to ...

Bennie

answered Jun 29, 2021 at 16:02

7 votes

Simple mutation simulation for use in science class

The code adheres to many good coding practices already, and it should be simple for beginners to follow. Here are some minor suggestions. Documentation The PEP 8 style guide recommends adding ...

toolic

15.9k

answered Jun 2 at 18:50

6 votes

Accepted

Genetic Sequence Visualizer - Generating large images

Parser Your parser has a bug in line 62: raw = ''.join([n for n in file.readlines() if not n.startswith('>')]).replace('\n', "").lower() will ...

FirefoxMetzger

1,101

answered Apr 14, 2018 at 18:41

6 votes

Counting relevant entries in a large bioinformatics file

Possible bug: ...

Imus

4,387

answered Mar 8, 2018 at 8:42

6 votes

Implementing a DNA codon table in C

In addition to current (and future) answers: You use malloc(), but from what I can see, you do not free() it later on. In my ...

esote

3,800

answered Apr 11, 2018 at 1:59

6 votes

Accepted

Filtering FASTQ file based on read names from other file (how to increase performance) Python

Do not reinvent the wheel. There are bioinformatics tools that accomplish this task. To extract reads from fastq files by IDs, use seqtk subseq. Extract sequences ...

Timur Shtatland

answered Mar 10, 2021 at 18:51

6 votes

Accepted

Rust program to one hot encode genetic sequences from .fa files

Your programs aren't quite equivalent; one looks at whether a line starts with >, the other looks for chr in each line. That ...

AKX

answered Dec 28, 2021 at 13:17

6 votes

Simple mutation simulation for use in science class

I think the most confusing line for a beginner is: Mutate_mat = list(map(list, zip(*Mutate_mat))) I would personally remove the map for beginners in a course that ...

user286929

answered Jun 3 at 13:45

5 votes

Accepted

Genomic Range Query in Python

I'm not sure that I trust Codility's detected time complexity. As far as I know, it's not possible to programmatically calculate time complexity, but it is possible to plot out a performance curve ...

mochi

1,144

answered Nov 9, 2017 at 9:11

5 votes

Accepted

DNA reverse complement as fast as possible

The main operation is substitution via a small table, which is also what _mm_shuffle_epi8 does. The low 4 bits of the indexes clash though, and I could not find an ...

user555045

12.4k

answered Aug 19, 2018 at 23:03

5 votes

Mapping DNA nucleotides into two-dimensional coordinates

First of all I'd like to say that your code is fast. From studying it, the major bottlenecks that I've found come from the input being a string, and from converting to numpy arrays and back. Since ...

maxb

1,582

answered Aug 29, 2018 at 8:41

5 votes

Accepted

Lazy Loading a Bioinformatic SAM record

Performance There is one thing that I believe could increase the performance of your application. You often call findElement, which goes through the SAM record ...

IEatBagels

12.7k

answered Apr 23, 2019 at 14:13

5 votes

Accepted

Run an external program and extract a pattern match along with the result file

Not looking bad for as far as I can see. If the example file is accurate for the lengths of the input files, then I don't forsee any real problems, though others may of course disagree. Naming: <...

Gloweye

1,746

answered Sep 11, 2019 at 19:38

5 votes

Accepted

GenBank to FASTA format using regular expressions without Biopython

Line iteration ...

Reinderien

71.1k

answered Jun 26, 2020 at 2:57

5 votes

Accepted

FASTA-to-tsv conversion script

In addition to the points raised in the other answers: Extraneous import The first line is import sys, but I don't see sys used ...

Mike

answered Oct 14, 2020 at 4:37

5 votes

Accepted

Counting the number of k-mers like monomers, dimers to hexamers from the fasta file

The code can be simplified quite a bit. Using itertools.product, the code like this: ...

RootTwo

10.7k

answered Oct 24, 2020 at 1:36

Only top scored, non community-wiki answers of a minimum length are eligible

153

questions tagged

bioinformatics

bioinformatics × 153
python × 78
performance × 34
beginner × 28
python-3.x × 22
algorithm × 19
strings × 14
programming-challenge × 12
parsing × 12
java × 11
c++ × 9
c × 9
ruby × 9
regex × 9
csv × 9
r × 8
perl × 7
time-limit-exceeded × 6
rust × 6
file × 6
numpy × 6
statistics × 5
edit-distance × 5
c# × 4
object-oriented × 4

Tag Info

Hot answers tagged bioinformatics

Related Tags