Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Data popularity for the Cache Eviction Algorithms using Random Forests

May 11, 2023, 2:15 PM
15m
Norfolk Ballroom III-V (Norfolk Waterside Marriott)

Norfolk Ballroom III-V

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 1 - Data and Metadata Organization, Management and Access Track 1 - Data and Metadata Organization, Management and Access

Speaker

Chuchuk, Olga (CERN, University of Cote d'Azur)

Description

In the HEP community, the prediction of Data Popularity is a topic that has been approached for many years. Nonetheless, while facing increasing data storage challenges, especially in the HL-LHC era, we are still in need for better predictive models to answer the questions of whether particular data should be kept, replicated, or deleted.

The usage of caches proved to be a convenient technique that partially automates storage management and seems to eliminate some of these questions. While on one hand, we can benefit even from simple caching algorithms like LRU, on the other hand, we show that incorporation of the knowledge about the future access patterns can greatly improve the cache performance.

In this paper, we study the data popularity on the file level, where the special relation between files belonging to the same dataset could be used in addition to the standard attributes. We start by analyzing separate features and try to find the relation with the target variable: the reuse distance of the files. After, we turn to Machine Learning algorithms, such as Random Forest, which is well suited to work with Big Data: it can be parallelized, is more lightweight and easier to interpret than Deep Neural Networks. Finally, we compare the results with standard cache retention algorithms and with the theoretical optimum.

Consider for long presentation Yes

Primary authors

Chuchuk, Olga (CERN, University of Cote d'Azur) Dr Schulz, Markus (CERN)

Presentation materials

Peer reviewing

Paper