Building a Medical Chatbot with Pinecone, LLaMA 2, and Flask

June 19, 2024

In the rapidly evolving field of healthcare technology, the integration of artificial intelligence (AI) into medical applications has become a significant breakthrough. One of the most promising developments in this domain is the creation of medical chatbots. These intelligent assistants can provide users with immediate responses to medical inquiries. Recently, I build a sophisticated medical chatbot using Pinecone for vector database management, LLaMA 2 for natural language processing, and Flask for the front-end interface. Here’s a detailed look at how I accomplished this, with the original project owner’s permission and a thorough understanding of the code.

You can access my repository by using the provided link: Chatbot-using-Llama2

Selecting the Right Tools

To build an efficient and reliable medical chatbot, choosing the right technologies is crucial. Here’s a breakdown of the tools I used:

Pinecone

Pinecone is a vector database designed to handle large-scale machine learning applications. It enables efficient similarity search and helps in managing embeddings generated from text inputs, which is critical for a chatbot that needs to understand and retrieve relevant medical information quickly.

LLaMA 2

Developed by Meta, LLaMA 2 is an advanced language model that excels in understanding and generating human-like text. Its ability to comprehend complex queries and provide coherent responses makes it ideal for a medical chatbot that interacts with users in natural language.

Flask

Flask is a lightweight web framework for Python, perfect for developing the front-end of the chatbot. It allows for quick development and easy integration with other components of the system.

Setting Up Pinecone

The first step in building the chatbot was setting up Pinecone. This involved creating a Pinecone index to store and manage the embeddings. These embeddings represent the semantic meaning of the medical queries and responses, allowing for efficient retrieval and matching.

We require the below credentials to access our pinecone vector database.

PINECONE_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
INDEX_NAME=medical-chatbot

store_index.py

from src.helper import load_pdf, text_split, download_hugging_face_embeddings
from dotenv import load_dotenv
import os
from langchain_pinecone import PineconeVectorStore

load_dotenv()

index_name = os.environ.get('INDEX_NAME')

try:
    # Get the current working directory (where the script is executed)
    current_working_directory = os.getcwd()

    data_directory = os.path.join(current_working_directory, "data")

    extracted_data = load_pdf(data_directory)
    print("PDFs loaded successfully.")
except FileNotFoundError as e:
    print(e)

text_chunks = text_split(extracted_data)
embeddings = download_hugging_face_embeddings()


#Creating Embeddings for Each of The Text Chunks & storing
docsearch=PineconeVectorStore.from_texts([t.page_content for t in text_chunks], embeddings, index_name=index_name)

Integrating LLaMA 2 with Retrieval-Augmented Generation (RAG)

Instead of fine-tuning the LLaMA 2 model, I used Retrieval-Augmented Generation (RAG). RAG combines retrieval of relevant documents with generation capabilities, enhancing the model’s ability to provide accurate and contextually relevant responses.

app.py

from flask import Flask, render_template, jsonify, request
from langchain_pinecone import PineconeVectorStore
from src.helper import download_hugging_face_embeddings
from langchain.prompts import PromptTemplate
from langchain.llms import CTransformers
from langchain.chains import RetrievalQA
from dotenv import load_dotenv
from src.prompt import *
import os

app = Flask(__name__)

load_dotenv()

index_name = os.environ.get('INDEX_NAME')

embeddings = download_hugging_face_embeddings()


#Loading the index
docsearch=PineconeVectorStore.from_existing_index(index_name, embeddings)


PROMPT=PromptTemplate(template=prompt_template, input_variables=["context", "question"])

chain_type_kwargs={"prompt": PROMPT}

llm=CTransformers(model="model/llama-2-7b-chat.ggmlv3.q4_0.bin",
                  model_type="llama",
                  config={'max_new_tokens':512,
                          'temperature':0.8})


qa=RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=docsearch.as_retriever(search_kwargs={'k': 2}),
    return_source_documents=True, 
    chain_type_kwargs=chain_type_kwargs)

Developing the Flask Front-End

With the backend set up, I turned to Flask to create the user interface. Flask’s simplicity and flexibility made it easy to develop a web-based interface where users can interact with the chatbot.

app.py

@app.route("/")
def index():
    return render_template('chat.html')



@app.route("/chat", methods=["GET", "POST"])
def chat():
    msg = request.form["msg"]
    input = msg
    print(input)
    result=qa({"query": input})
    print("Response : ", result["result"])
    return str(result["result"])



if __name__ == '__main__':
    app.run()

Medical chatbot

Related:

← Back