Are you interested in building an open-source chat app that allows users to ask questions about PDFs? Look no further, this guide will walk you through the process of developing an open source chat PDF app with the OpenAI API.
Github code: https://github.com/whyashthakker/chatpdf
UI (no code) playground: https://aichatpdf.streamlit.app/
Step 1: Start the App
First, you need to import the necessary libraries and initialize your app. The libraries we'll use include streamlit for the web interface, PyPDF2 for handling PDFs, langchain for language modeling and text embedding, dotenv for loading environment variables, and os for handling operating system tasks.
import streamlit as st
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.callbacks import get_openai_callback
import pickle
from dotenv import load_dotenv
import os
Step 2: Define Sidebar
Streamlit offers the option of a sidebar for user input or instructions. Here, we have some instructions and an area where users will input their OpenAI API key.
with st.sidebar:
st.write("## Talk to PDFs")
st.markdown("### Ask questions about Compliance, Accounting, and anything else.")
Step 3: Define Main Function
In the main function of the application, you first create a header for the application. After that, you provide a text input box for users to enter their OpenAI API key.
def main():
st.header("OPEN SOURCE CHAT PDF APP.")
OPENAI_API_KEY = st.text_input(
"Enter your OPEN AI API key", type="password"
)
Step 4: Provide Open Source Details
Next, you provide information about the open-source nature of the app, assuring users that their API key won't be stored or sent anywhere.
st.write(
"Your API key is not stored or sent anywhere, entire code is open source, see this: https://github.com/whyashthakker/chatpdf, drop a ⭐️"
)
Step 5: Load Environment Variables and Accept File Upload
Then, you load the environment variables and provide a file uploader for PDF files.
load_dotenv()
pdf = st.file_uploader("Upload a PDF file", type="pdf")
Step 6: Read the PDF
If a PDF is uploaded, the app reads the file, extracts the text, and splits it into chunks.
if pdf is not None:
pdf_reader = PdfReader(pdf)
st.write(pdf.name)
text = ""for page in pdf_reader.pages:
text += page.extract_text()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200, length_function=len)
chunks = text_splitter.split_text(text=text)
Step 7: Create VectorStore
The app uses OpenAIEmbeddings to create a vector store of the chunks of text from the PDF file.
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
VectorStore = FAISS.from_texts(chunks, embedding=embeddings)
store_name = pdf.name[:-4]
Step 8: Load or Create VectorStore
The app checks if a vector store already exists for the uploaded PDF. If it does, it loads the vector store; if not, it creates a new one.
if os.path.exists(f"{store_name}.pk1"):
with open(f"{store_name}.pk1", "rb") as f:
VectorStore = pickle.load(f)
else:
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
VectorStore = FAISS.from_texts(chunks, embedding=embeddings)
with open(f"{store_name}.pk1", "wb") as f:
pickle.dump(VectorStore, f)
Step 9: Accept User Questions
The app provides an input box where users can ask questions about the uploaded PDF.
query = st.text_input("Let's ask questions :)")
st.write(query)
Step 10: Generate Response
If a query is inputted, the app finds the most similar chunks of text to the query, runs them through a language model, and generates a response.
if query:
docs = VectorStore.similarity_search(query=query, k=3)
llm = OpenAI(model_name="gpt-3.5-turbo", max_tokens=100, openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm=llm, chain_type="stuff")
with get_openai_callback() as cb:
response = chain.run(input_documents=docs, question=query)
st.write(response)
Step 11: Run the App
Finally, you run your app by calling the main function.
if __name__ == "__main__":
main()
And there you have it! A step-by-step guide to building an open-source chat PDF app using OpenAI's API. This application allows users to upload a PDF and ask questions about the content, using AI to generate accurate and helpful responses.
Comments