






Today’s growth in data is enormous, and this growth is coming from a vast amount of sources such as industrial IoT devices (internet of things), medical imaging systems, synthetic data generators for AI model training, and big science instruments like CERN’s Large Hadron Collider (particle physics) and the Vera C. Rubin Observatory (in astronomy). Current scientific experiments can already generate tens of terabytes of data on a daily basis, while future ones will push the scale to hundreds of terabytes per day.
While the current data explosion is nothing new, managing these increasing data volumes with current technologies poses challenges which require new approaches and technologies. As an example, storage media like tape and hard disk drives are reaching their physical limits regarding data densities.
Simultaneously, the variety of data sets is expanding due to various types of data. Innovative approaches and technologies are necessary, not only for the proper management of vast amounts of data, but also to combine data originating from different domains such as scientific disciplines, industries, and societal domains. This serves to enhance the capability for those managing data to uncover the key insights contained within data sets.
Next to the amount of data, the data complexity presents certain challenges. For example, on the multiple roles of organisations – like research organisations – regarding the processing of large data sets. Not only as a producer of data, not only as an user of data, but also as an actor that combines, enriches, co-creates, and aggregates large data sets for and with a variety of other actors. Data management principles and tools help to unlock the value in data. Data management covers the systematic process of handling data throughout its lifecycle: collecting, organising, analysing, sharing, and preserving data while ensuring its integrity, accessibility, and security. AI has shown great promise already in this area, where AI-driven automation can minimise manual effort. Beyond current standard data storage solutions, there is a growing demand for data and content-aware solutions for data management as well as for offering new data insights.
Recently, it became clearer how important data management and data preservation are. Looking at developments on data sovereignty, data ownership and security, and open science. These developments are decisive for the way researchers and research organisations cooperate internationally. An example is the recent activity of the research community to preserve large climate data sets stored in the US by saving them on EU-based servers to keep the data freely available for the international climate research community. Besides this data repatriation in the scientific community, national governments and organisations in the EU are also aware of taking stronger measures to secure data ownership. For example, regarding the usage of cloud services by relocating data from the (big tech) servers in the US to servers in Europe.
New data management practices and technologies are very much on the horizon and are being formulated to tackle the current and future data challenges.