Bridging First-principles Science Models and Machine Learning

Lessons Learnt from Chemical Engineering 

INTRODUCTION 

In chemical engineering, the complexity of modeling physiochemical systems often leads to knowledge gaps. Traditional science-based models rely on first principles but can struggle with intricate processes. Meanwhile, machine learning (ML) models excel at prediction but may lack scientific consistency. A hybrid approach, integrating science-guided models with machine learning, offers a promising solution. An article published in AIChE Journal (Vol 68(5), 2022). titled “A hybrid science-guided machine learning approach for modeling chemical processes: A review”, by Niket Sharma, Y. A. Liu ,provides an overview of hybrid approaches, and, although at first sight the work focuses on chemical industrial processes, it gives valuable insights on several different configurations involving science-guided models and Machine Learning models. 

WHAT IS A HYBRID SCIENCE-GUIDED MACHINE LEARNING (SGML) APPROACH? 

SGML combines scientific principles (captured by science-based models) and data-driven models to enhance predictive accuracy and scientific consistency. The integration can work in two ways: 

  1. Machine Learning Complementing Science – ML models fill gaps in science-based models that are based on first principles, offering better predictions. 
  1. Science Complementing Machine Learning – Science-based models refine ML models, ensuring predictions remain grounded in reality. 

 

KEY FEATURES OF SGML MODELS

Direct Hybrid Models 

Direct hybrid modeling integrates science-based and ML models in different configurations to leverage their combined strengths. 

  • Parallel Configuration: Both science-based and ML models independently generate predictions, which are then combined to enhance accuracy. For example, a neural network might predict temperature variations while a physical model predicts reaction kinetics. 
  • Series Configuration: One model augments the input or output of the other. For instance, a physics-based model might infer missing features of the data, which are then fed into an ML model that provides more precise predictions. 
  • Series-Parallel or Combined Configuration: These combine the benefits of both approaches, using ML to fine-tune parameters for physical models while simultaneously addressing residual errors. 

Example application. In polymer production, hybrid models combine kinetic equations with neural networks to optimize batch processes, leading to better extrapolation and reduced experimental costs. 

Inverse Modeling 

Inverse modeling works backward from desired outcomes to infer the required inputs, addressing challenges like finding the optimal operating conditions. 

  • This approach is especially beneficial in industries like pharmaceuticals, where quality targets such as drug composition must be achieved by adjusting process variables. 
  • By integrating ML with first-principles models, inverse modeling enhances reliability and reduces computational costs. 

Example application. In polymer manufacturing, inverse models predict the operating conditions needed to achieve specific polymer grades, using melt index and density as quality targets. 

Reduced-Order Models (ROMs) 

Reduced-order models simplify complex systems into low-dimensional representations, making them computationally efficient while retaining predictive accuracy. 

  • ROMs use techniques like dimensionality reduction or ML-based surrogates to approximate behaviors of full-scale models. 
  • They are particularly useful for simulating complex scenarios, such as fluid dynamics or reactor operations, where detailed computations are (extremely demanding or even infeasible) otherwise infeasible. 

Example application. In high-density polyethylene production, ROMs combine process simulations with ML to predict properties like melt index, enabling real-time optimization and scaling. 

CHALLENGES & OPPORTUNITIES OF SGML MODELS

Challenges 

  1. Dependence on Accurate Scientific Models. Hybrid models rely on first-principles foundations, and inaccuracies in these models can lead to flawed predictions. Ensuring robust scientific assumptions is critical. 
  1. Complex Integration. Combining diverse datasets and aligning them with scientific models can be time-consuming and requires domain-specific expertise. 
  1. High Computational Costs. Techniques like inverse modeling and uncertainty quantification often demand significant computational resources, particularly for large-scale industrial processes. 
  1. Interdisciplinary Expertise Gap. SGML modeling requires knowledge in both machine learning and domain science. This dual expertise is not yet widespread, slowing adoption. 
  1. Uncertainty and Interpretability. While SGML models can quantify uncertainties, conveying these uncertainties effectively to stakeholders remains challenging. Similarly, interpreting hybrid model recommendations for actionable use requires tailored tools. 

Opportunities 

  1. Enhanced Extrapolation. SGML excels at predicting beyond tested operating conditions, making it invaluable for process development, scale-up, and optimization. 
  1. Real-Time Monitoring and Control. Hybrid models enable continuous process monitoring and dynamic optimization, addressing issues like model-plant mismatches on the fly. 
  1. Faster Material and Product Development. SGML accelerates innovation by predicting conditions for new materials or optimized processes, reducing dependency on experimental trials. 
  1. Improved Reliability through Uncertainty Quantification. By providing error estimates and prediction intervals, SGML improves confidence in critical decisions, such as optimizing reaction conditions or equipment maintenance schedules. 
  1. Integration with Smart Manufacturing. The approach aligns perfectly with Industry 4.0, creating opportunities for smarter, more autonomous manufacturing systems powered by AI and data analytics. 

SGML MODELS AND AI-DAPT

One of the main objectives of the AI-DAPT project is to deliver novel methods for AI pipelines that reconcile science-guided models and AI/data-driven approaches, and coordinate appropriate tasks for training, explainability, evaluation, execution and optimization for those SGML models. We envision the following two key contributions: 

  • A hybrid AI model engineering engine to support the investigation of  loosely or tightly coupling scenarios for combining science-guided and AI models towards the needs of the demonstrator use cases: (a) Health ‘Personalised medicine based on non-invasive Glucose monitoring’, where science-guided models exist to analyse optical microvascular signals, (b) Robotics & Cognitive Ergonomics ‘Human-centered automation’, where science-guided models can assist the analysis of body-signal information by ML models, Energy ‘Cross-vector Residential DR through Smart Heating’, where science-guided models exist to analyse energy consumption patterns, and Manufacturing ‘Predictive Maintenance of Production Assets’, where science-guided models can improve predictive maintenance tasks assisted by ML models. 
  • A set of hybrid explainable AI techniques to support post-hoc explainability, targeted at feature relevance explanations, local explanations, representative examples (e.g. deletion diagnostics, counterfactuals), and generally model simplification in order to explain the hybrid AI models along with their results, at various levels.  

more insights