Contributed Talks

Contributed Talks

Contributed Talk I

Improving Fragment-Based Deep Molecular Generative Models

Presented by: Panukorn Taleongpong, Brooks Paige

Deep molecular generative models have shown promising results and paved a new way for drug discovery. Their ability to explore the molecular space, estimated to be 1060, is significantly greater than traditional methods used for the virtual screening of existing databases. We introduce a novel fragmentation algorithm particularly suitable for use in deep generative models. In contrast to existing fragmentation algorithms, our procedure sequentially breaks a molecule along BRIC bonds in such a manner that the linearization of fragments is directly invertible, guaranteed to be able to reconstruct the original molecule from the fragment sequence. This makes it appropriate for use in deep generative models trained with sequential models as likelihoods. We compare with previous fragment-based SMILES VAE methods and observe that our approach significantly enhances coverage of the molecular space and outperforms on distribution learning benchmarks.

Contributed Talk II

FlowBack: A Flow-matching Approach for Generative Backmapping of Macromolecules

Presented by: Mike Jones

Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. Although these models considerably accelerate sampling, it remains challenging to recover an ensemble of all-atom structures corresponding to coarse-grained simulations. In this work, we introduce a generative approach called FlowBack that uses a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be amenable to any coarse-grained map and any type of macromolecule, and we find that generated structures are more robust and contain less steric clashes than those generated by previous approaches. We train a protein-specific model on structures from the Protein Data Bank which achieve state-of-the-art results on bond quality on clash score. Furthermore, we train a model on DNA-protein data which achieves excellent reconstruction and generative capabilities on complexes from the PDB as well as on coarse-grained simulations of DNA-protein binding.