Please visit Jefferson Lab Event Policies and Guidance before planning your next event:
May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

An Intelligent Data Analysis System for Biological Macromolecule Crystallography

May 9, 2023, 5:30 PM
Chesapeake Meeting Room (Norfolk Waterside Marriott)

Chesapeake Meeting Room

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 5 - Sustainable and Collaborative Software Engineering Track 5 - Sustainable and Collaborative Software Engineering


Sun, Hao-Kai (IHEP, CAS)


With the construction and operation of fourth-generation light sources like European Synchrotron Radiation Facility Extremely Brilliant Source (ESRF-EBS), Advanced Photon Source Upgrade (APS-U), Advanced Light Source Upgrade (ALS-U), High Energy Photon Source (HEPS), etc., several advanced biological macromolecule crystallography (MX) beamlines are or will be built and thereby the huge amount of raw experimental data will be accumulated. Besides, high-resolution hybrid pixel array detectors are equipped and thus such large-scale and excellent-quality data will bring stringent challenges on the traditional manual or semi-automatic processing procedures. In this report, we will introduce a user-friendly, AI-empowered, auto-pipelining data analysis system for MX. It consists of four modules: (1) a boosted decision tree (BDT) based module to intelligently utilize suitable tools or algorithms for data reduction i.e. from X-ray diffraction images (TIFF/HDF5 files) to reference reflection files (MTZ); (2) a structure prediction module using database-querying or AlphaFold/OpenFold real-time prediction, i.e. from FASTA sequences to protein data bank (PDB) files; (3) a model auto-building module composed of two branches, one is for high accuracy which is time-consuming and the other is fast with lose of accuracy; (4) a structure refinement module by deep learning. This system works in two modes. One is for real-time/online analysis that operated automatically in the background by monitoring the user experimental data folder and taking default processing parameters. And the other is usually called batch mode. Firstly, users will configure the analysis procedures in GUI and then process multiple data concurrently for performances. All the equipped tools or algorithms are designed as plugins and can be substituted in a convenient way. This data analysis system is based on and developed for HEPS initially, aiming at an automatic, intelligent, and high-efficiency software and will be open-source for academic research.

Consider for long presentation Yes

Primary authors

Sun, Hao-Kai (IHEP, CAS) Hu, Yu (IHEP, CAS) Liu, Jianli (Institute of High Energy Physics, CAS) Wang, Lei ( Institute of High Energy Physics, CAS) Fu, Shiyuan ( Institute of High Energy Physics, CAS) Liu, Rui (Institute of High Energy Physics, CAS) Wang, Shuang ( Institute of High Energy Physics, CAS) Qi, Fazhi ( Institute of High Energy Physics, CAS)


Prof. Ding, Wei (Institute of Physics, CAS) Mr Zhang, Xin (Hongkong University; Institute of Physics, CAS) HU, QINGBAO (IHEP) Zhao, Haifeng ( Institute of High Energy Physics, CAS)

Presentation materials

Peer reviewing