LLM Drug Target Discovery: AlphaFold, Hugging Face, UniProt, Protein AI Pipeline

How It Works

From a Research Question to a 3D Protein Structure

A researcher asks a natural-language question — "What proteins should I target for Alzheimer's drug discovery?" Drug Target AI routes that query through a cascading LLM pipeline, validates every protein against live UniProt data to confirm human origin, fetches its AlphaFold structure, and renders the 3D molecule directly in the browser.

🔬

Research Query

Plain-language disease, pathway, or hypothesis

🤖

LLM Analysis

Multi-model cascade — Llama, Mistral, Phi-3 & more

🧬

UniProt Filter

Human-only validation via live REST API

🏗️

AlphaFold / PDB

3D structure fetched in real-time

💊

Target Report

Protein card + 3D view + known drugs + AI insights

Feature Breakdown

Six Capabilities in One App

🤖 Core Engine

AI Protein Targeting

LLMs receive a biomedical system prompt and return a structured JSON array of 3–5 relevant human protein targets with UniProt IDs, function summaries, and known drugs.

OutputJSON protein array

Max targets3–5 per query

🗄️ Live Data

UniProt Integration

Every returned UniProt ID is validated via the UniProt REST API. The organism field must read Homo sapiens or the protein is silently dropped.

DatabaseUniProt / Swiss-Prot

Entries250M+ proteins

🏗️ 3D View

AlphaFold Structures

Structures are pulled from AlphaFold EBI (v4/v3), with fallback to RCSB PDB cross-references. A demo helix is shown if no structure is available.

SourcesAlphaFold → PDB → Demo

ConfidenceLabeled per source

💬 Q&A

Protein Chat Assistant

After identifying targets, users can ask follow-up questions about any protein — mechanism, disease associations, binding sites — answered by a biomedical-specialized LLM prompt.

ContextPer-protein Q&A

ModeConversational

🔍 Deep Search

UniProt Deep Search

When AI returns no results, a keyword extraction engine queries UniProt directly with progressive fallback strategies — broad search, human-filtered search, reviewed-only entries.

Strategies3-level fallback

Filterorganism_id:9606

🛡️ Resilience

Multi-Model Fallback

If a model fails or is rate-limited, the app cascades to the next in the fallback chain, trying three API methods (conversational → chat completion → text generation) per model.

Models5+ in chain

Methods3 per model

Architecture Detail

The Multi-Model Fallback Chain

A key engineering decision was making the app robust to model availability. HuggingFace Inference API models can be slow, rate-limited, or temporarily unavailable. Instead of failing, the app tries each model with three distinct API methods before falling back to the next model.

meta-llama / Llama-3.2-3B-Instruct

User's Choice / Default

mistralai / Mistral-7B-Instruct-v0.3

Fallback 1

microsoft / Phi-3-mini-4k-instruct

Fallback 2

ServiceNow-AI / Apriel-5B-Instruct

Fallback 3

prithivMLmods / Llama-3.1-5B-Instruct

Fallback 4 · Last Resort

Interactive Explorer

See What the App Returns for Real Queries

Select a sample research query and see the protein targets Drug Target AI would identify, including UniProt IDs, roles, and confidence scores.

Illustrative results based on published biomedical knowledge. Actual app queries live LLMs and UniProt in real time.

Performance Snapshot

API Reliability & Query Coverage

Model Fallback Rates

Disease Category Coverage

Response Methods

Llama-3.2-3B handles ~65% of queries directly. The fallback chain resolves nearly all remaining cases before reaching the curated database.

Disease coverage across major therapeutic areas — oncology, CNS, cardiovascular, and infectious disease represent the broadest use cases.

Chat completion handles the majority of successful responses; text_generation serves as a critical backstop for older model architectures.

Design Decisions

Three Engineering Choices That Matter

🛡️

Fail gracefully

A 5-model × 3-method cascade means the app stays functional even when the primary LLM is unavailable or rate-limited.

🧬

Ground truth first

Every AI-suggested protein is cross-validated against live UniProt data — hallucinated UniProt IDs are caught and removed automatically.

🔬

Human-only scope

The system prompt and UniProt organism filter work together to ensure every result is a therapeutically relevant Homo sapiens protein — not a viral enzyme.

From Research Question to 3D Drug Target in Seconds — LLM + AlphaFold

From a Research Question to a 3D Protein Structure

Why the Human-Only Filter Matters

Six Capabilities in One App

The Multi-Model Fallback Chain

Three API Methods Per Model

See What the App Returns for Real Queries

API Reliability & Query Coverage

Three Engineering Choices That Matter

At a Glance

Try It Live

Project Info

Tech Stack

Related Work