ESM-2 × Bayesian Optimisation × ProteinMPNN × REINFORCE RL
| Module | Purpose | Key API | Status |
|---|---|---|---|
src/embeddings.py |
ESM-2 feature extraction, lazy load, batched inference, mean pooling | ESM2Embedder.transform(seqs) | ✅ Tested |
src/predictor.py |
MLP surrogate (LayerNorm + Dropout), AdamW training, evaluate with Pearson / Spearman | PredictorTrainer.fit() .evaluate() | ✅ Tested |
src/bayes_opt.py |
Gaussian Process + qLogEI, PCA dimensionality reduction, BoTorch integration | BayesianOptimizer.run(n_iter) | ✅ Tested |
src/protein_mpnn.py |
k-NN Cα graph builder, MessagePassingLayer, cross-entropy training | ProteinMPNNTrainer.train_demo() | ✅ Tested |
src/rl_reinforce.py |
LSTM policy, REINFORCE update, multi-objective reward, teacher-forcing grad | REINFORCETrainer.run(episodes) | ✅ Tested |
src/data_prep.py |
Synthetic demo data generator + ProteinGym CSV loader | make_demo_data(n, seq_len) | ✅ Tested |
run_pipeline.py |
CLI entry point, orchestrates all modules | --mode all/bo/rl/mpnn | ✅ Tested |
demo_notebook.ipynb |
Interview live demo — step-by-step walkthrough with inline plots | Jupyter Notebook | ✅ Ready |
# Install dependencies (~2 min)
pip install -r requirements.txt
# Full pipeline with real ESM-2 embeddings (downloads ~30 MB on first run)
python run_pipeline.py --mode all
# Individual modes
python run_pipeline.py --mode bo --epochs 100 --bo-iters 20
python run_pipeline.py --mode rl --rl-episodes 50
python run_pipeline.py --mode mpnn
# Interactive live demo (Jupyter)
jupyter notebook demo_notebook.ipynb