26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Name: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)
Start: 2023-05-08T08:00:00-04:00
End: 2023-05-12T16:00:00-04:00
Location: Norfolk Waterside Marriott

May 8 – 12, 2023

Norfolk Waterside Marriott

US/Eastern timezone

Conference Secretariat

chep2023-secretariat@jlab.org

An Intelligent Data Analysis System for Biological Macromolecule Crystallography

May 9, 2023, 5:30 PM

15m

Chesapeake Meeting Room (Norfolk Waterside Marriott)

Chesapeake Meeting Room

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510

Oral Track 5 - Sustainable and Collaborative Software Engineering Track 5 - Sustainable and Collaborative Software Engineering

Sun, Hao-Kai (IHEP, CAS)

With the construction and operation of fourth-generation light sources like European Synchrotron Radiation Facility Extremely Brilliant Source (ESRF-EBS), Advanced Photon Source Upgrade (APS-U), Advanced Light Source Upgrade (ALS-U), High Energy Photon Source (HEPS), etc., several advanced biological macromolecule crystallography (MX) beamlines are or will be built and thereby the huge amount of raw experimental data will be accumulated. Besides, high-resolution hybrid pixel array detectors are equipped and thus such large-scale and excellent-quality data will bring stringent challenges on the traditional manual or semi-automatic processing procedures. In this report, we will introduce a user-friendly, AI-empowered, auto-pipelining data analysis system for MX. It consists of four modules: (1) a boosted decision tree (BDT) based module to intelligently utilize suitable tools or algorithms for data reduction i.e. from X-ray diffraction images (TIFF/HDF5 files) to reference reflection files (MTZ); (2) a structure prediction module using database-querying or AlphaFold/OpenFold real-time prediction, i.e. from FASTA sequences to protein data bank (PDB) files; (3) a model auto-building module composed of two branches, one is for high accuracy which is time-consuming and the other is fast with lose of accuracy; (4) a structure refinement module by deep learning. This system works in two modes. One is for real-time/online analysis that operated automatically in the background by monitoring the user experimental data folder and taking default processing parameters. And the other is usually called batch mode. Firstly, users will configure the analysis procedures in GUI and then process multiple data concurrently for performances. All the equipped tools or algorithms are designed as plugins and can be substituted in a convenient way. This data analysis system is based on and developed for HEPS initially, aiming at an automatic, intelligent, and high-efficiency software and will be open-source for academic research.

Consider for long presentation	Yes

Sun, Hao-Kai (IHEP, CAS) Hu, Yu (IHEP, CAS) Liu, Jianli (Institute of High Energy Physics, CAS) Wang, Lei ( Institute of High Energy Physics, CAS) Fu, Shiyuan ( Institute of High Energy Physics, CAS) Liu, Rui (Institute of High Energy Physics, CAS) Wang, Shuang ( Institute of High Energy Physics, CAS) Qi, Fazhi ( Institute of High Energy Physics, CAS)

Prof. Ding, Wei (Institute of Physics, CAS) Mr Zhang, Xin (Hongkong University; Institute of Physics, CAS) HU, QINGBAO (IHEP) Zhao, Haifeng ( Institute of High Energy Physics, CAS)

Intelligent_Data_AS4BMX_HKS.pdf

Paper files:

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

An Intelligent Data Analysis System for Biological Macromolecule Crystallography

Chesapeake Meeting Room

Norfolk Waterside Marriott

Speaker

Description

Authors

Co-authors

Presentation materials

Peer reviewing

Paper

Choose timezone

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

Speaker

Description

Authors

Co-authors

Presentation materials

Peer reviewing

Paper