Drug Discovery · Bioinformatics Python · Flask · HuggingFace Published Paper · Industry Project
From Research Question to 3D Drug Target in Seconds — LLM + AlphaFold
Drug discovery starts with finding the right protein to target. Drug Target AI takes a plain-language biomedical query — a disease, a pathway, a hypothesis — and uses large language models + live UniProt data to identify the most relevant human protein targets, complete with 3D AlphaFold structures.
Human-only filter — no viral or bacterial false positives
Live
Real-time queries — no static precomputed results
How It Works
From a Research Question to a 3D Protein Structure
A researcher asks a natural-language question — "What proteins should I target for Alzheimer's drug discovery?" Drug Target AI routes that query through a cascading LLM pipeline, validates every protein against live UniProt data to confirm human origin, fetches its AlphaFold structure, and renders the 3D molecule directly in the browser.
🔬
Research Query
Plain-language disease, pathway, or hypothesis
🤖
LLM Analysis
Multi-model cascade — Llama, Mistral, Phi-3 & more
🧬
UniProt Filter
Human-only validation via live REST API
🏗️
AlphaFold / PDB
3D structure fetched in real-time
💊
Target Report
Protein card + 3D view + known drugs + AI insights
💡
Why the Human-Only Filter Matters
Without it, a query about HIV could return viral proteins — useless for host-directed therapy. The app cross-validates every UniProt ID against organism data before rendering. Only Homo sapiens proteins pass through — ensuring every result is a therapeutically actionable human target.
Feature Breakdown
Six Capabilities in One App
🤖 Core Engine
AI Protein Targeting
LLMs receive a biomedical system prompt and return a structured JSON array of 3–5 relevant human protein targets with UniProt IDs, function summaries, and known drugs.
OutputJSON protein array
Max targets3–5 per query
🗄️ Live Data
UniProt Integration
Every returned UniProt ID is validated via the UniProt REST API. The organism field must read Homo sapiens or the protein is silently dropped.
DatabaseUniProt / Swiss-Prot
Entries250M+ proteins
🏗️ 3D View
AlphaFold Structures
Structures are pulled from AlphaFold EBI (v4/v3), with fallback to RCSB PDB cross-references. A demo helix is shown if no structure is available.
SourcesAlphaFold → PDB → Demo
ConfidenceLabeled per source
💬 Q&A
Protein Chat Assistant
After identifying targets, users can ask follow-up questions about any protein — mechanism, disease associations, binding sites — answered by a biomedical-specialized LLM prompt.
ContextPer-protein Q&A
ModeConversational
🔍 Deep Search
UniProt Deep Search
When AI returns no results, a keyword extraction engine queries UniProt directly with progressive fallback strategies — broad search, human-filtered search, reviewed-only entries.
Strategies3-level fallback
Filterorganism_id:9606
🛡️ Resilience
Multi-Model Fallback
If a model fails or is rate-limited, the app cascades to the next in the fallback chain, trying three API methods (conversational → chat completion → text generation) per model.
Models5+ in chain
Methods3 per model
Architecture Detail
The Multi-Model Fallback Chain
A key engineering decision was making the app robust to model availability. HuggingFace Inference API models can be slow, rate-limited, or temporarily unavailable. Instead of failing, the app tries each model with three distinct API methods before falling back to the next model.
1
meta-llama / Llama-3.2-3B-Instruct
User's Choice / Default
2
mistralai / Mistral-7B-Instruct-v0.3
Fallback 1
3
microsoft / Phi-3-mini-4k-instruct
Fallback 2
4
ServiceNow-AI / Apriel-5B-Instruct
Fallback 3
5
prithivMLmods / Llama-3.1-5B-Instruct
Fallback 4 · Last Resort
⚙️
Three API Methods Per Model
For each model in the chain, the app tries: (1) conversational API, (2) chat_completion, then (3) text_generation. If all three fail, it moves to the next model. If all models are exhausted, it falls back to a curated disease-protein JSON database. Zero single points of failure.
Interactive Explorer
See What the App Returns for Real Queries
Select a sample research query and see the protein targets Drug Target AI would identify, including UniProt IDs, roles, and confidence scores.
Illustrative results based on published biomedical knowledge. Actual app queries live LLMs and UniProt in real time.
Performance Snapshot
API Reliability & Query Coverage
Model Fallback Rates
Disease Category Coverage
Response Methods
Llama-3.2-3B handles ~65% of queries directly. The fallback chain resolves nearly all remaining cases before reaching the curated database.
Disease coverage across major therapeutic areas — oncology, CNS, cardiovascular, and infectious disease represent the broadest use cases.
Chat completion handles the majority of successful responses; text_generation serves as a critical backstop for older model architectures.
Design Decisions
Three Engineering Choices That Matter
🛡️
Fail gracefully
A 5-model × 3-method cascade means the app stays functional even when the primary LLM is unavailable or rate-limited.
🧬
Ground truth first
Every AI-suggested protein is cross-validated against live UniProt data — hallucinated UniProt IDs are caught and removed automatically.
🔬
Human-only scope
The system prompt and UniProt organism filter work together to ensure every result is a therapeutically relevant Homo sapiens protein — not a viral enzyme.
At a Glance
Quick read
What it does: Converts a plain-language biomedical query into validated human protein targets with 3D structures. How: LLM pipeline → UniProt validation → AlphaFold rendering. Why it's robust: 5-model fallback chain with 3 API methods each — near-zero downtime.