Revolutionizing Chemical Product Discovery with Generative AI (GenAI) and Retrieval-Augmented Generation (RAG)

25 / Mar / 2025 by Priya Yadav 0 comments

Introduction

The chemical industry is an important part of the economy as it produces and supplies raw materials for various ingredients and use cases. There are thousands of products on the market, all of which have different compositions, technical properties, and requirements for their usage. The correct selection of products is, therefore, complicated and very time-consuming.

Keyword-based search methods, combined with manual selection of the products, do not provide effective and precise results. This occurs due to the intricate technical details, complex terminology, and the specific requirements of each user. Because of this, selecting products becomes extremely time-consuming and inefficient, as vast amounts of domain knowledge paired with brute force filtering are required.

This blog describes an AI-driven search and recommendation system based on Intent Extraction, Generative AI (Gen AI,) and Agentic Retrieval Augmented Generation (RAG) method that improves product discovery by focusing on user intentions and key dimensions, which are retrieved from user queries. Unlike traditional search engines, which only perform keyword matching and retrieval, this system processes the natural language queries to identify relevant products and further refines them with a Judge LLM.

The Limitations of Traditional Search Systems

The products within the chemical industry are very broad, and each product has different compositions, technical requirements, and specific uses. The manual selection and traditional keyword searching are highly inaccurate in suggesting products due to some key challenges, such as:

  • Ambiguous and Unstructured Queries: Users have a tendency to use their own words, which leads to erroneous results.
  • Overwhelming Data Volume: The number of available chemical mixtures and their specifications makes doing so unmanageable.
  • Lack of Contextual Understanding: Traditional search engines use exact phrases and keywords to find information, preventing them from understanding user intent and other important product factors.
  • Scattered and Unstructured Information: Product information is spread out across various sources such as databases, PDFs, and technical sheets, which makes finding information difficult.

The reasons listed above make searching for products highly ineffective, lengthy, and invasive because a very high amount of domain knowledge is required to go through and compare possible solutions. There is a clear need for an AI-enabled solution that can interpret the intent of a user and understand the technical details to effectively provide the relevant chemical products.

Gen AI and RAG: A Smarter Approach to Product Discovery

To address these challenges, an AI-driven approach is used that leverages GenAI and Agentic RAG architecture. It works by focusing on understanding the user’s intent. Product Dimensions are extracted from the user’s query. Dimensions are – Nature, Application, Performance, and Properties. The retriever is then used to extract the products falling under those dimensions. This ensures that results are not just relevant but highly specific to the user’s needs.

How It Works: The Power of Agentic RAG and Judge LLM

 

Architecture Diagram of GenAI Powered solution for Chemical Product Discovery

Architecture Diagram of GenAI Powered solution for Chemical Product Discovery

The system uses a multi-step approach to ensure that product search is both efficient and reliable.

Pre-Processing

BMX-Based Search:

Product data is first organized into key-value pairs. Once the information is structured, BMX indexes are created. These indexes are like a high-speed lookup table, allowing us to perform fast similarity searches.

Semantic Search:

For product data:

1. We start by cleaning the data, removing any inconsistencies and extraneous details.
2. Next, we split the cleaned data into smaller, context-rich segments using recursive chunking.
3. Each segment is then converted into a numerical vector with our text-embedding-3-large model, and these vectors are stored in a vector database for quick retrieval.

For solution data:

1. The process is similar: clean the data, break it into manageable chunks, generate embeddings, and store everything in the vector database.

Handling User Queries :

When you submit a query, the system kicks into gear by first figuring out whether your question is about a product or a solution.

Product Queries:

  1. Previous queries are checked to maintain a smooth conversation and context.
  2. Then, specifics are analyzed —if the query includes numbers, we apply a ±5% tolerance to account for slight variations. For vague terms like “low” or “high,” these descriptions are mapped to more structured values.
  3. Key product attributes are then extracted, such as nature, application, performance, and properties. If the “nature” of the product is mentioned, then three semantically similar sub-queries are generated to cover all possible angles.
  4. Finally, multiple searches are run (semantic, MMR, BMX, and BM25), and a judge LLM reviews the results to ensure everything fits perfectly. If crucial details are missing, then the system defaults to a broader answer.

Solution Queries:

  1. For questions related to solutions, the process is streamlined. The system simply uses a semantic search retriever to fetch the most relevant, context-rich results from the vector database.

Answer Generation:

In the final step, the system generates the most relevant products.

  • For product queries, the results from different sub-queries are reduced to one answer, which is then passed to the Judge LLM.
  • For solution queries, the semantic search provides a relevant response.

This approach ensures that the user’s query is handled with the appropriate method and the responses are accurate and relevant.

Real-World Applications: A Game Changer for Chemical Sectors

The Chemical Industry is experiencing the benefits of this AI-powered product discovery system. The integration of intent-based retrieval and context-aware recommendations has dramatically reduced the time spent searching for products, thus enabling faster decision-making processes and more informed decisions. This system helps the user get products for their specific requirements.

Conclusion: A Smarter, Faster, and More Efficient Chemical Product Search

The future of chemical product search is bright with the integration of Generative AI and Agentic Retrieval-Augmented Generation. This solution not only increases the accuracy of product recommendations but also streamlines decision-making processes for industries that rely on complex chemical products. With continued advancements in AI, this approach will continue to revolutionize the way people search for and select chemical products, making the process smarter, faster, and more efficient.

As the chemical industry continues to evolve, embracing AI-driven solutions will be key to staying competitive and driving innovation in product discovery.

FOUND THIS USEFUL? SHARE IT

Leave a Reply

Your email address will not be published. Required fields are marked *