26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Name: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)
Start: 2023-05-08T08:00:00-04:00
End: 2023-05-12T16:00:00-04:00
Location: Norfolk Waterside Marriott

May 8 – 12, 2023

Norfolk Waterside Marriott

US/Eastern timezone

Conference Secretariat

chep2023-secretariat@jlab.org

Data popularity for the Cache Eviction Algorithms using Random Forests

May 11, 2023, 2:15 PM

15m

Norfolk Ballroom III-V (Norfolk Waterside Marriott)

Norfolk Ballroom III-V

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510

Oral Track 1 - Data and Metadata Organization, Management and Access Track 1 - Data and Metadata Organization, Management and Access

Chuchuk, Olga (CERN, University of Cote d'Azur)

In the HEP community, the prediction of Data Popularity is a topic that has been approached for many years. Nonetheless, while facing increasing data storage challenges, especially in the HL-LHC era, we are still in need for better predictive models to answer the questions of whether particular data should be kept, replicated, or deleted.

The usage of caches proved to be a convenient technique that partially automates storage management and seems to eliminate some of these questions. While on one hand, we can benefit even from simple caching algorithms like LRU, on the other hand, we show that incorporation of the knowledge about the future access patterns can greatly improve the cache performance.

In this paper, we study the data popularity on the file level, where the special relation between files belonging to the same dataset could be used in addition to the standard attributes. We start by analyzing separate features and try to find the relation with the target variable: the reuse distance of the files. After, we turn to Machine Learning algorithms, such as Random Forest, which is well suited to work with Big Data: it can be parallelized, is more lightweight and easier to interpret than Deep Neural Networks. Finally, we compare the results with standard cache retention algorithms and with the theoretical optimum.

Consider for long presentation	Yes

Chuchuk, Olga (CERN, University of Cote d'Azur) Dr Schulz, Markus (CERN)

Data popularity for the Cache Eviction Algorithms using Random Forests.pdf

Data popularity for the Cache Eviction Algorithms using Random Forests.pptx

Paper_2.pdf

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

Data popularity for the Cache Eviction Algorithms using Random Forests

Norfolk Ballroom III-V

Norfolk Waterside Marriott

Speaker

Description

Authors

Presentation materials

Peer reviewing

Paper

Choose timezone

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

Speaker

Description

Authors

Presentation materials

Peer reviewing

Paper