Recently, I came across a simple tutorial on using Biopython, and after thoroughly studying the material, I was inspired to build a few projects that combine these foundational bioinformatics concepts with AI. My goal was to gain hands-on experience while creating something both educational and interactive. (https://www.kaggle.com/code/shtrausslearning/biopython-bioinformatics-basics#8-|-PHYLOGENETIC-ANALYSIS)
The first project in this series is called the "Central Dogma Explorer." It's an interactive educational tool designed to take a gene name, retrieve its DNA sequence, and visually demonstrate the biological processes of transcription and translation, ultimately showing how a functional protein is produced.
To make this more manageable, I divided the project into three core components:
Biopython Integration – I wrote functions such as
fetch_gene_sequence(gene_symbol, organism="Homo sapiens")
andtranscribe_translate_dna(dna_seq)
to retrieve and process DNA sequences. These functions fetch gene data, transcribe DNA into mRNA, and translate it into an amino acid sequence.AI Explanation with an LLM – I incorporated a language model to explain the transcription and translation processes in simple terms. Here's the core function:
def generate_explanation(dna_seq):
prompt = PromptTemplate(
input_variables=["sequence"],
template=(
"You are an expert biology tutor. Explain the full process of transcription and translation "
"for this DNA sequence: {sequence}. Your answer should be clear and easy to understand for a first-year biology student."
)
)
chain = LLMChain(llm=llm, prompt=prompt)
explanation = chain.run(sequence=dna_seq[:500]) # Limit to 500 bases
return explanation
- User Interface with Streamlit – I built a simple, user-friendly interface using Streamlit to display the DNA sequence, the corresponding mRNA transcript, the amino acid sequence, and an AI-generated explanation. Example UI elements include:
st.subheader("1️⃣ DNA Sequence")
st.code(dna_seq[:1000] + ("..." if len(dna_seq) > 1000 else ""), language="text")
st.subheader("2️⃣ mRNA Transcript")
st.code(rna_seq[:1000] + ("..." if len(rna_seq) > 1000 else ""), language="text")
st.subheader("3️⃣ Amino Acid Sequence")
st.code(protein_seq, language="text")
st.subheader("4️⃣ AI-Generated Explanation")
st.markdown(explanation)
I've also created a short tutorial video walking through the application and its features (https://youtu.be/2qi5UPiiS1Q). I'm always open to feedback and would love to hear any suggestions for future bioinformatics projects that combine biology, AI, and interactivity. Thanks for reading!
Here is a link to the code: https://github.com/PranavMunigala/CentralDogmaExplorer.git
Top comments (0)