eJournals International Colloquium Tribology 24/1

International Colloquium Tribology
ict
expert verlag Tübingen
131
2024
241

The Data Science Frontier in Tribology

131
2024
Nick Garabedian
Ilia Bagov
Malte Flachmann
Nuoyao Ye
Floriane Bresser
Christian Greiner
Milosz Meller
ict2410033
24th International Colloquium Tribology - January 2024 33 The Data Science Frontier in Tribology Nick Garabedian 1* , Ilia Bagov 1 , Malte Flachmann 1 , Nuoyao Ye 1 , Miłosz Meller 2 , Floriane Bresser 1 , Christian Greiner 1 1 Karlsruhe Institute of Technology, Institute for Applied Materials, Germany 2 Helmholtz-Zentrum Hereon, Institute of Membrane Research, Germany * Corresponding author: Nikolay.Garabedian@kit.edu 1. Introduction Tribodigitalization stands as a pivotal process in tribology, representing a transformative journey towards harnessing the full potential of tribological data sets. This process is aimed at developing nuanced and efficient solutions for unraveling the complexities associated with friction and wear. In the ever-evolving landscape of scientific exploration, achieving comprehensive digitalization in tribology demands a shift in the way we conduct research and development. Central to that shift is the responsibility we have to take when we produce, share, store, re-use, or analyze data. To establish the global guidelines of what “good” data management means goes through embracing the FAIR data principles, emphasizing the fundamental attributes of research data being Findable, Accessible, Interoperable, and Reusable. So far, most tribological research data is not shared. The majority of datasets that are used within tribology publications are only “available upon request” - our findings showing statistics on data sharing in tribology will be shown during the presentation at the 24 th International Colloquium Tribology. Not sharing research data puts a limiting barrier on the speed our field can innovate at, as this precludes our ability to collaborate quickly and efficiently, and relies on direct inter-personal relationships. Of course, this also means that a machine-learning algorithm cannot be employed in a big data setting in tribology as most data is residing behind email requests. The reasons why tribologists keep data non-open are numerous. Firstly, in terms of research culture, it is not widely expected that a publication needs to have its raw data shared by default. Secondly, even if a scientist is willing to share her or his data, the technical solutions for doing that might be more confusing than helpful, as there is no widely-agreed framework for doing that within tribology. Lastly, since the questions of sharing data usually become important much later than the point of data collection, essential metadata is impossible to retrieve from the past, and it would be time-prohibitive to annotate existing data according to available schemata. That’s why it is ideal to have data FAIR by design since the first moment it is produced until it is published. To deal with all of these issues, we have designed one possible workflow for production of FAIR data directly from the lab. We have developed software solutions which assist tribologists in their scientific workflows, and make it easy to share the entire set of raw and processed FAIR data with a few clicks. Importantly, the proposed framework integrates with frameworks that other research domains have designed for themselves and utilize. 2. Methods Amidst the complex tapestry of tribological research, intuitive solutions need to illuminate the path toward seamless integration of traditional tribological meth-ods and cutting-edge data science techniques. In our framework, we rely on a knowledge manager which we designed with the goal of creating the schemata necessary for the annotation of FAIR tribological data. VocPopuli [1] is a tool designed to facilitate collaborative creation of FAIR controlled vocabular-ies that are meticulously tailored to the unique speci-fications of individual laboratories. These controlled vocabularies serve as the bedrock upon which FAIR data collection within electronic lab notebooks is structured. Through curation and refinement of these controlled vocabularies, tribologists create a knowledge base with rich, dynamic, and interopera-ble metadata. This structured approach not only en-sures the consistency and harmonization (not stand-ardization) of experiments but also serves as a cata-lyst for elevating the quality and reliability of re-search outcomes Additionally, we rely on the Kadi4Mat [2] to store and share our data internally. For the input of data into Kadi4Mat we designed a tablet-based application which enables lab scientists to enter their data next to any manual process in the lab, thus, removing the need for a paper lab journal. We also have designed integrations with LabVIEW and MATLAB, which further let tribologists preserve their usual workflows, while taking care for the collection of FAIR data and metadata. Finally, we have recently designed a reporting tool which lets users explore the trends in their datasets based on filters derived from the VocPopuli knowledge base. 3. Results The current outcomes of this work constitute a FAIR SKOS controlled vocabulary [3] and a FAIR dataset that was produced as a result of three Master’s Thesis projects [4]. The vocabulary contains 1,067 terms that are hierarchically organized, while the dataset has the following characteristics: • 151,045 RDF triples • 51 Experimental Series, 542 Individual Events • 89 Lab Equipment-Descriptions • 108 Experimental Object Descriptions • 412.1 GB in Total Size. The experiments test the reciprocation sliding of a 10-mm single-crystal sapphire sphere against a polycrystal (average size ~45 µm) copper base body. The range of normal loads is 0-4.5 N, the sliding velocity is always 0.5 mm/ s, and the experiments are performed in ambient 50% RH atmosphere. 34 24th International Colloquium Tribology - January 2024 The Data Science Frontier in Tribology 4. Outlook As we have already published our first results, we are now expanding the coverage of our FAIR data framework. We aim to have a continuous flow of published tribological data, and we are looking forward to having that data reused by other scientists. At the same time, we are looking for external FAIR datasets which we can integrate into our own research, so that we can start to see the scalability which big data approaches promise. References [1] I. Bagov, C. Greiner, and N. Garabedian, “Collaborative Metadata Definition using Controlled Vocabularies, and Ontologies,” Res. Ideas Outcomes, vol. 8, 2022. [2] N. Brandt et al., “Managing FAIR tribological data using Kadi4Mat,” Data, vol. 7, no. 2, p. 15, Jan. 2022. [3] I. Bagov et al., “Vocabulary of Materials Tribology Lab at KIT.” 10-May-2023. [4] M. Flachmann, J. Biesinger, M. Gorenflo, I. Bagov, C. Greiner, and N. Garabedian, “Copper Tribology FAIR Data Experiments - Sapphire Counterbody.” 11-May- 2023.