Building a GenAI-RAG Microservice with Observability and CI/CD

Table of Contents
Problem Statement
Introduction
Understanding the Concepts
Microservices Architecture
RAG (Retrieval-Augmented Generation)
Observability in Microservices
CI/CD Pipeline
Tools and Technologies Overview
Google Kubernetes Engine (GKE)
OpenTelemetry
Langchain
Jenkins
PGVector
Terraform
Ansible
Docker Compose
Argo CD
Project Overview
Architecture and Workflow
Implementation Details
Deployment Steps
Final Thoughts
Future Enhancements
Conclusion
Problem Statement
In the AI era, people seek quick and efficient ways to digest information without spending hours reading through lengthy blogs, articles, or documentation. To address this need, we have developed the "GENAI-URL-DIGEST" service. This solution allows users to upload URLs of blogs, readings, or documentation and ask relevant questions. The service then provides concise answers, citing the sources from the provided URLs, enabling users to get the information they need quickly and accurately.
Introduction
In this blog, we'll explore the "GENAI-URL-DIGEST" project, a simple yet powerful GENAI-RAG microservice deployed on Google Kubernetes Engine (GKE). This project integrates observability through OpenTelemetry and sets up a continuous integration (CI) pipeline using Jenkins and Docker, configured via Ansible. We'll break down the core concepts, tools, and technologies used, providing a step-by-step guide to help you understand and replicate the process.
Understanding the Concepts
Microservices Architecture
Microservices are a way of designing software applications as a collection of loosely coupled, independently deployable services. Each service represents a small piece of the larger application, typically responsible for a specific function, such as user authentication, data processing, or in this case, AI-powered content generation.
RAG (Retrieval-Augmented Generation)
RAG stands for Retrieval-Augmented Generation, a technique that combines retrieval-based and generation-based models to generate more accurate and relevant responses. In the context of this project, we use Langchain to create a microservice that leverages this technique.
Observability in Microservices
Observability refers to the capability to measure the internal states of a system by examining its outputs. In microservices, this is crucial for understanding system performance, identifying issues, and ensuring reliability. In this project, we use OpenTelemetry, a powerful tool to implement observability by collecting traces, metrics, and logs from distributed systems.
Continuous Integration (CI) and Continuous Deployment (CD) are practices in software development that enable teams to integrate and deploy code changes frequently and reliably. In this project, Jenkins, along with Docker and Ansible, is used to set up a robust CI/CD pipeline for both frontend and backend microservices.
Tools and Technologies Overview
Google Kubernetes Engine (GKE): A managed Kubernetes service offered by Google Cloud for deploying containerized applications. It simplifies the deployment, management, and scaling of applications in a Kubernetes environment.
OpenTelemetry: A set of APIs, libraries, agents, and instrumentation to provide observability for cloud-native software. It is used to collect traces, metrics, and logs from distributed systems to monitor application performance.
Langchain: A framework for building applications with large language models (LLMs). It enables the integration of retrieval-based and generation-based models for more accurate AI responses.
Jenkins: An open-source automation server widely used for CI/CD. Jenkins automates the building, testing, and deployment of applications, ensuring that code changes are integrated and delivered efficiently.
PGVector: A PostgreSQL extension that enables vector similarity search. It is used to store and search embeddings in PostgreSQL databases, allowing for efficient retrieval of relevant data.
Terraform: An infrastructure as code (IaC) tool used to automate cloud infrastructure. Terraform allows for the provisioning and management of cloud resources using a declarative configuration language.
Ansible: A configuration management tool used to set up Jenkins and Docker on a GCE VM instance. Ansible automates the setup and management of infrastructure, ensuring consistency and reducing manual effort.
Docker Compose: A tool for defining and running multi-container applications. It enables the management of multiple services in a containerized environment, streamlining development and deployment processes.
ArgoCD: A declarative, GitOps continuous delivery tool for Kubernetes. ArgoCD automates the deployment of applications by monitoring changes in a Git repository and synchronizing them with the Kubernetes cluster.
Project Overview
Architecture and Workflow
The architecture consists of several microservices, each performing specific tasks within the "GENAI-URL-DIGEST" project:
Frontend: Handles UI rendering and user authentication, utilizing distributed session management to avoid sticky sessions and ensure reliable authentication.
Redis: Deployed as a service to enable distributed session management, which is crucial for maintaining consistent user sessions across multiple instances.
Backend: Utilizes Langchain to process the problem statement and generate AI-powered answers. It interacts with the database and other microservices to fulfill user queries.
OpenTelemetry Operator: Enhances observability by exporting traces, enabling real-time monitoring and troubleshooting.
Postgres: Enabled with the PGVector extension to store vector embeddings generated by the backend. This allows for efficient vector search operations within the database.
Zipkin: Visualizes traces exported by the OpenTelemetry Operator, providing a clear view of the system's performance and behavior.
Project Architecture Image

Implementation Details
Infrastructure & DevOps Tools:
Terraform: Automates the provisioning of GKE clusters, VPC networks, and compute engine VMs.
Ansible: Configures the installation of Jenkins and Docker on a compute engine VM, turning it into a Jenkins master node.
Docker: Provides a scalable solution for running Jenkins worker agents in containers, simplifying job management and improving performance.
Google Kubernetes Engine: Hosts the microservices as Kubernetes deployments, ensuring high availability and scalability.
CI/CD Pipeline: We have implemented CI/CD pipelines for both the frontend and backend microservices, ensuring that changes are automatically built, tested, and deployed. Jenkins manages the pipeline, with Docker used to containerize the applications and ArgoCD handling the deployment to the Kubernetes cluster.
Application Implementation:
Frontend: Developed using HTML templates rendered by Flask, the frontend handles user interactions and supports distributed session management for user authentication.
Backend: Flask is used to convert backend processes into an API accessible by the frontend. Langchain powers the AI-driven responses based on the problem statement, while PGVector in Postgres handles vector embeddings.
OpenTelemetry SDK: Integrated into both frontend and backend to collect and expose traces to an OpenTelemetry collector endpoint.
Deployment Steps
To deploy the "GENAI-URL-DIGEST" project, please refer to the detailed setup steps provided in the project's GitHub repository. The README includes step-by-step guidance for both local and cloud deployment, ensuring that you can get the project up and running smoothly.
Final Thoughts
Future Enhancements
Support for Additional File Formats: Enhance the service to process various document types such as PDFs, Word documents, or images with text (using OCR), broadening its utility.
Improved User Interface: Develop a more user-friendly frontend, possibly including drag-and-drop functionality for easier URL and document uploads, as well as an interactive dashboard for querying.
Multi-Language Support: Integrate language translation models to allow queries and document processing in multiple languages, making the service more accessible to a global audience.
Conclusion
The "GENAI-URL-DIGEST" project demonstrates how modern tools and techniques can be used to build an intelligent, AI-powered microservice with robust observability and CI/CD capabilities. By following this guide, you can replicate and build upon this project, applying these principles to your own DevOps and AI endeavors.
If you found this blog helpful, please follow and like my posts on LinkedIn. Stay tuned for more insights and updates on cloud-native development, DevOps, and AI. Your support helps me share more valuable content with the community! #DevOps #CloudNative #AI #Microservices #Kubernetes #CI/CD #cloudcomputing #googlecloud #containers #clouddevelopment #cloudengineering
