You duplicate the code to 'chunk' strings into lists. This is when you change
'abcdef'to either'abc def'or['abc', 'def']. And so you should make this a function.You should change
check_sequenceto takearather thantypeas an argument. This allows greater control on the function. And allows you to perform an abstraction on your code.You can then move these global inputs to global constants, and use them when needed. Global constants mean that you're less likely to make a typo and unknowingly break your program.
You could change
check_sequenceto use a comprehension andallto reduce the amount of lines. It can also improve clarity as then you don't need to explicitly returnTrueorFalse.You should use arrays rather than strings. As said above you use all three formats,
'abcdef','abc def', and['abc', 'def']. Instead you should only use the last format. This allows you to get easier to read code, and removes the need forninconvert_to_proteins.As stated above you should change
convert_sequenceto take a list. This can allow you to use a list comprehension to build the list with minimal characters. You can also change your code to usestr.translaterather than implementing the translation yourself.You can change both
convert_to_proteinsandconvert_symbolsto be the same function. Movingpickle.loadout of the function, and by passing lists, simplify the code to just a list comprehension, where you're indexing a dictionary.You can move your print to
dump_dataas you're printing and writing the same data. This allows you to useformatto build the string once, and for you to print it and write it.Always use
withwhen you useopen. It closes the stream when you're done with it, and without can lead to bugs and errors.Your code to get user input is confusing, and duplicates logic. Your option for reading from a file or from user input is good. You code initially reads that a user could pick to read from a file, betbut then if the file is malformed it reads from user input. On further reading this doesn't seemyou find out it's not possible, butwhich is a problem for readers.
You duplicate the code to 'chunk' strings into lists. This is when you change
'abcdef'to either'abc def'or['abc', 'def']. And so you should make this a function.You should change
check_sequenceto takearather thantypeas an argument. This allows greater control on the function. And allows you to perform an abstraction on your code.You can then move these global inputs to global constants, and use them when needed. Global constants mean that you're less likely to make a typo and unknowingly break your program.
You could change
check_sequenceto use a comprehension andallto reduce the amount of lines. It can also improve clarity as then you don't need to explicitly returnTrueorFalse.You should use arrays rather than strings. As said above you use all three formats,
'abcdef','abc def', and['abc', 'def']. Instead you should only use the last format. This allows you to get easier to read code, and removes the need forninconvert_to_proteins.As stated above you should change
convert_sequenceto take a list. This can allow you to use a list comprehension to build the list with minimal characters. You can also change your code to usestr.translaterather than implementing the translation yourself.You can change both
convert_to_proteinsandconvert_symbolsto be the same function. Movingpickle.loadout of the function, and by passing lists, simplify the code to just a list comprehension, where you're indexing a dictionary.You can move your print to
dump_dataas you're printing and writing the same data. This allows you to useformatto build the string once, and for you to print it and write it.Always use
withwhen you useopen. It closes the stream when you're done with it, and without can lead to bugs and errors.Your code to get user input is confusing, and duplicates logic. Your option for reading from a file or from user input is good. You code reads that a user could pick to read from a file, bet then if the file is malformed it reads from user input. On further reading this doesn't seem possible, but is a problem for readers.
You duplicate the code to 'chunk' strings into lists. This is when you change
'abcdef'to either'abc def'or['abc', 'def']. And so you should make this a function.You should change
check_sequenceto takearather thantypeas an argument. This allows greater control on the function. And allows you to perform an abstraction on your code.You can then move these global inputs to global constants, and use them when needed. Global constants mean that you're less likely to make a typo and unknowingly break your program.
You could change
check_sequenceto use a comprehension andallto reduce the amount of lines. It can also improve clarity as then you don't need to explicitly returnTrueorFalse.You should use arrays rather than strings. As said above you use all three formats,
'abcdef','abc def', and['abc', 'def']. Instead you should only use the last format. This allows you to get easier to read code, and removes the need forninconvert_to_proteins.As stated above you should change
convert_sequenceto take a list. This can allow you to use a list comprehension to build the list with minimal characters. You can also change your code to usestr.translaterather than implementing the translation yourself.You can change both
convert_to_proteinsandconvert_symbolsto be the same function. Movingpickle.loadout of the function, and by passing lists, simplify the code to just a list comprehension, where you're indexing a dictionary.You can move your print to
dump_dataas you're printing and writing the same data. This allows you to useformatto build the string once, and for you to print it and write it.Always use
withwhen you useopen. It closes the stream when you're done with it, and without can lead to bugs and errors.Your code to get user input is confusing, and duplicates logic. Your option for reading from a file or from user input is good. You code initially reads that a user could pick to read from a file, but then if the file is malformed it reads from user input. On further reading you find out it's not possible, which is a problem for readers.
There's a couple of high level problems with your code:
You should remove your comments. They're as useful as the code they're commenting as they're 'micro' comments.
print(mRNA)is without a doubt printing mRNA. If there was a comment saying that you use lists in both forms,['abc', 'def']and'abc def', then that would have been a good comment.They should be used to describe oddities in your code. Or if we are using optimized code to say what the code is doing, from a high-level perspective.
You should order your code so you can read down the file, and know most things about the code at that point. To fully understand
mainyou need to know every other function, and so that should go to the bottom of the file.You should use
if __name__ == '__main__'.You should pick one quotation mark style. And stick with it.
'or".
The above are simple and easy to fix. But you also have harder to fix problems, which require more effort to fix.
You duplicate the code to 'chunk' strings into lists. This is when you change
'abcdef'to either'abc def'or['abc', 'def']. And so you should make this a function.You should change
check_sequenceto takearather thantypeas an argument. This allows greater control on the function. And allows you to perform an abstraction on your code.You can then move these global inputs to global constants, and use them when needed. Global constants mean that you're less likely to make a typo and unknowingly break your program.
You could change
check_sequenceto use a comprehension andallto reduce the amount of lines. It can also improve clarity as then you don't need to explicitly returnTrueorFalse.You should use arrays rather than strings. As said above you use all three formats,
'abcdef','abc def', and['abc', 'def']. Instead you should only use the last format. This allows you to get easier to read code, and removes the need forninconvert_to_proteins.As stated above you should change
convert_sequenceto take a list. This can allow you to use a list comprehension to build the list with minimal characters. You can also change your code to usestr.translaterather than implementing the translation yourself.You can change both
convert_to_proteinsandconvert_symbolsto be the same function. Movingpickle.loadout of the function, and by passing lists, simplify the code to just a list comprehension, where you're indexing a dictionary.You can move your print to
dump_dataas you're printing and writing the same data. This allows you to useformatto build the string once, and for you to print it and write it.Always use
withwhen you useopen. It closes the stream when you're done with it, and without can lead to bugs and errors.Your code to get user input is confusing, and duplicates logic. Your option for reading from a file or from user input is good. You code reads that a user could pick to read from a file, bet then if the file is malformed it reads from user input. On further reading this doesn't seem possible, but is a problem for readers.
And so if you implement all the above you can get:
import pickle
TRANSLATION_DNA = {'A': 'U', 'T': 'A', 'C': 'G', 'G': 'C'}
TRANSLATION_OTHER = {'A': 'U', 'U': 'A', 'C': 'G', 'G': 'C'}
VALID_INPUT_RNA = 'AUCG'
VALID_INPUT_OTHER = 'ATCG'
def grouper(sequence, n):
return [sequence[i:i + n] for i in range(0, len(sequence), n)]
def check_sequence(sequence, valid_input):
return all(i in valid_input for i in sequence)
def translate_sequence(sequence, conversion_dict):
table = {ord(k): v for k, v in conversion_dict.items()}
return [g.translate(table) for g in sequence]
def convert_sequence(sequence, conversion_dict):
return [conversion_dict[i] for i in sequence]
def dump_data(*args):
output = 'DNA: {}\nmRNA: {}\ntRNA: {}\n{}\n{}'.format(*map(' '.join, args))
print(output)
with open('results.txt', 'w') as f:
f.write(output + '\n')
def safe_read(file_name):
with open(file_name, 'rb') as f:
return pickle.load(f)
def convert(sequence):
mrna_to_protein = safe_read('mRNA_to_protein.p')
protein_symbols = safe_read('symbols.p')
original_sequence = grouper(sequence, 3)
mRNA = translate_sequence(original_sequence, TRANSLATION_DNA)
tRNA = translate_sequence(mRNA, TRANSLATION_OTHER)
proteins = convert_sequence(mRNA, mrna_to_protein)
symbols = convert_sequence(proteins, protein_symbols)
return original_sequence, mRNA, tRNA, proteins, symbols
def read_sequence():
while True:
file_name = input('Enter file name: ')
try:
with open(file_name, 'r') as f:
return f.read()
except FileNotFoundError:
print('File {!r} not found.'.format(file_name))
def remove_spaces(x):
return (''.join(x.split())).strip()
def get_sequence():
while True:
open_choice = input('Do you want to load a file to translate [Y/N]').strip().upper()
if open_choice in ('Y', 'N'):
break
while True:
sequence = (
read_sequence()
if open_choice == 'Y' else
input('Enter the DNA sequence to convert it: ')
)
sequence = remove_spaces(sequence.upper())
if check_sequence(sequence, VALID_INPUT_OTHER):
return sequence
print('Invalid sequence.')
def main():
sequence = get_sequence()
original_sequence, mRNA, tRNA, proteins, symbols = convert(sequence)
dump_data(original_sequence, mRNA, tRNA, proteins, symbols)
input()
if __name__ == '__main__':
main()