eJournals International Colloquium Tribology 23/1

International Colloquium Tribology
ict
expert verlag Tübingen
125
2022
231

Artificial Intelligence in Tribology: Design of new dispersants using artificial intelligence tools

125
2022
Nuria E. Campillo
Pablo Talavantge
Ignacio Ponzoni
Axel J. Soto
María J. Martinez
Roí Naveiro
Ramón Gómez-Arrayas
Mario Franco
Shin-Ho Kim Lee
Pablo Mauleón
Guillermo Revilla-Lopez
Marco Bernabei
ict2310423
23rd International Colloquium Tribology - January 2022 423 Artificial Intelligence in Tribology: Design of new dispersants using artificial intelligence tools Nuria E. Campillo ICMAT (CSIC). Nicolás Cabrera, nº 13-15. Campus de Cantoblanco, UAM. 28049, Madrid, Spain. CoFounder of AItenea Biotech. CIB Margarita Salas (CSIC) Ramiro de Maeztu, 9. 28740, Madrid, Spain Corresponding author: nuria.campillo@csic.es Pablo Talavante AItenea Biotech. Parque Científico de Madrid. Ciudad Univer.de Cantoblanco. Calle Faraday, 7. 28049, Madrid, Spain. Ignacio Ponzoni Institute for Computer Science and Engineering (UNS-CONICET), Bahía Blanca, Argentina. Department of Computer Science and Engineering, Universidad Nacional del Sur, Bahía Blanca, Argentina. Axel J. Soto Institute for Computer Science and Engineering (UNS-CONICET), Bahía Blanca, Argentina. Department of Computer Science and Engineering, Universidad Nacional del Sur, Bahía Blanca, Argentina. María J. Martínez ISISTAN (CONICET - UNCPBA) Campus Universitario - Paraje Arroyo Seco, Tandil, Argentina. Roí Naveiro ICMAT (CSIC). Nicolás Cabrera, nº 13-15. Campus de Cantoblanco, UAM. 28049, Madrid, Spain. Ramón Gómez-Arrayas Dep. of Organic Chemistry and Institute for Advanced Research in Chemical Sciences, UAM. 28049, Madrid, Spain. CoFounder of AItenea Biotech. Mario Franco Dep. of Organic Chemistry and Institute for Advanced Research in Chemical Sciences, UAM. 28049, Madrid, Spain. Shin-Ho Kim Lee AItenea Biotech. Parque Científico de Madrid. Ciudad Univer.de Cantoblanco. Calle Faraday, 7. 28049, Madrid, Spain. Pablo Mauleón Dep. of Organic Chemistry and Institute for Advanced Research in Chemical Sciences, UAM. 28049, Madrid, Spain. Guillermo Revilla-Lopez Repsol Technology Lab DC Tech. & Corporate Venturing, Agustín de Betancourt s/ n, 28935 Móstoles, Madrid, Spain. Marco Bernabei Repsol Technology Lab DC Tech. & Corporate Venturing, Agustín de Betancourt s/ n, 28935 Móstoles, Madrid, Spain. 1. Introduction Dispersants are the main additives in oils and lubricants to help keep engines clean and free of deposits. These polymeric surfactant-like molecules are characterized by at least one hydrophobic, oil soluble ‘tail’ polymer backbone component, often polyisobutylene (PIB), and at least one hydrophilic, polar ‘head’ unit that adsorbs onto the carbon deposit precursors (mainly sludge soot particles). An efficient dispersant design requires tailoring the nature of the chemical interactions to meet the performance characteristics of a particular engine, for which a number of parameters need to be fine-tuned. Despite the knowledge available, the chemistry for production of 424 23rd International Colloquium Tribology - January 2022 Artificial Intelligence in Tribology: Design of new dispersants using artificial intelligence tools dispersants in use today remains limited. The design of dispersants is typically carried out through trial and error, coupled with chemical intuition, but this process is expensive and time-consuming. In sharp contrast, artificial intelligence (AI) has the potential to guide the design of next generation materials, allowing both economic and time savings. Herein we describe an AI framework for dispersant design and optimization. Two complementary strategies were developed using unsupervised and supervised learning, dimensionality reduction methods and data visualization approaches. This framework predicts performance properties of new dispersants as part of virtual screening (VS) strategies to identify the most promising candidates. 2. Computational Modelling Framework A dataset with 83 PIB derivatives was collected from the literature, and a wide range of molecular descriptors were computed using Mordred library for each chemical structure. In addition, SMILES embeddings were also computed using a Transformer-based model. Then, two AI models were learned providing different approaches to rank candidate compounds during virtual screening. Therefore, the best candidate is selected by a consensus. 2.1 Model based on structural similarity distances The first step was to conduct a feature selection procedure to identify a reduced set of molecular descriptors statistically related to the target property. This procedure was carried out by analyzing the outputs of several feature selection techniques using VIDEAN [1]. The selected descriptors are used to study the structural similarity between the compounds in the database, and to visually project those molecules on two-dimensional spaces, using for example tSNE[2]. Subsequently, to rank the candidates during the VS stage, the k-nearest-neighbors method is used in the original representation space to identify the projected regions where each candidate is located, and to infer from their locations which of them are the most promising chemical structures. Simultaneously with the visualization of the two-dimensional projections, a pairwise analysis of the relationship between the selected descriptors is also shown by means of a scatter plot matrix. The pairwise analysis enables a better interpretation of the low-dimensional projection in terms of the structural similarity defined by the descriptors. Finally, for each candidate, a numerical estimation of its target property value and its closeness to the chemical space represented by the database molecules is provided. 2.2 Supervised model To identify the single best probabilistic model, comparisons between different models, hyperparameter tuning and feature selection were carried out. Mean Absolute Error estimated via cross validation was used as a performance metric. The features included came from both, molecular descriptors and SMILES embeddings. The best resulting model was BART[3] including 14 features, half of them being molecular descriptors, while the other half being SMILES embeddings. Variable selection was performed using the procedure described by Bleich et al. [4]. Given a new candidate molecule, the model produces samples from the posterior predictive dispersancy. These samples are then used to compute different metrics to evaluate the candidate. In particular, the expected improvement with respect to the best available molecule is computed to identify which candidates for further development. Other helpful metrics are: mean predictive dispersancy, predictive standard deviation and probability of improvement. Finally, to gain some interpretability, the model is used to compute partial dependence plots. These serve to illustrate how each of the features affects dispersancy, on average. Additionally, to give some interpretation to the covariates coming from SMILES embeddings, molecular descriptors highly correlated with each of the embedding-based variables are computed. 3. Conclusion The use of AI is having a growing impact on the design of new molecular compounds. Although it does not replace some of the traditional wet-lab experimentation, it is playing a key role in accelerating discovery/ design of new materials such as dispersants. In this work, we have illustrated how unsupervised and supervised learning can be successfully combined for virtual screening in the design of new dispersants. We also concluded that visual analytical strategies help to chemical experts with the outputs produced by the machine learning models, contributing to the interpretability of the results. In brief, our AI methodology gives useful insights to material designers beyond the limits of a classical Edisonian approach to materials discovery. References [1] Martínez, M.J., Ponzoni, I., Díaz, M. et al. “Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods”. J Cheminform 7, 39, 2015. [2] van der Maaten, L. & Hinton, G. “Visualizing Data Using t-SNE”. J Mach Learn Res 9: 2579-2605, 2008. [3] Chipman, H., George, E., & McCulloch, R. “BART: Bayesian additive regression trees.” The Annals of Applied Statistics, 4(1), 266-298, 2010. [4] Bleich, J., Kapelner, A., Jensen, S. & George, E. “Variable selection inference for bayesian additive regression trees”. arXiv: 1310.4887v1, 2013.