top of page

latest stuff in ai, directly in your inbox. 🤗

Thanks for submitting!

Writer's pictureYash Thakker

Step-by-Step Guide: Building an Open Source Chat PDF App using Langchain

Are you interested in building an open-source chat app that allows users to ask questions about PDFs? Look no further, this guide will walk you through the process of developing an open source chat PDF app with the OpenAI API.



UI (no code) playground: https://aichatpdf.streamlit.app/


Step 1: Start the App


First, you need to import the necessary libraries and initialize your app. The libraries we'll use include streamlit for the web interface, PyPDF2 for handling PDFs, langchain for language modeling and text embedding, dotenv for loading environment variables, and os for handling operating system tasks.


import streamlit as st
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.callbacks import get_openai_callback
import pickle
from dotenv import load_dotenv
import os


Step 2: Define Sidebar


Streamlit offers the option of a sidebar for user input or instructions. Here, we have some instructions and an area where users will input their OpenAI API key.


with st.sidebar:
    st.write("## Talk to PDFs")
    st.markdown("### Ask questions about Compliance, Accounting, and anything else.")


Step 3: Define Main Function


In the main function of the application, you first create a header for the application. After that, you provide a text input box for users to enter their OpenAI API key.

def main():
    st.header("OPEN SOURCE CHAT PDF APP.")
    OPENAI_API_KEY = st.text_input(
        "Enter your OPEN AI API key", type="password"
    )


Step 4: Provide Open Source Details


Next, you provide information about the open-source nature of the app, assuring users that their API key won't be stored or sent anywhere.

st.write(
    "Your API key is not stored or sent anywhere, entire code is open source, see this: https://github.com/whyashthakker/chatpdf, drop a ⭐️"
)


Step 5: Load Environment Variables and Accept File Upload


Then, you load the environment variables and provide a file uploader for PDF files.

load_dotenv()
pdf = st.file_uploader("Upload a PDF file", type="pdf")


Step 6: Read the PDF


If a PDF is uploaded, the app reads the file, extracts the text, and splits it into chunks.

if pdf is not None:
    pdf_reader = PdfReader(pdf)
    st.write(pdf.name)
    text = ""for page in pdf_reader.pages:
        text += page.extract_text()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200, length_function=len)
    chunks = text_splitter.split_text(text=text)


Step 7: Create VectorStore


The app uses OpenAIEmbeddings to create a vector store of the chunks of text from the PDF file.

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
VectorStore = FAISS.from_texts(chunks, embedding=embeddings)
store_name = pdf.name[:-4]


Step 8: Load or Create VectorStore


The app checks if a vector store already exists for the uploaded PDF. If it does, it loads the vector store; if not, it creates a new one.

if os.path.exists(f"{store_name}.pk1"):
    with open(f"{store_name}.pk1", "rb") as f:
        VectorStore = pickle.load(f)
else:
    embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
    VectorStore = FAISS.from_texts(chunks, embedding=embeddings)
    with open(f"{store_name}.pk1", "wb") as f:
        pickle.dump(VectorStore, f)


Step 9: Accept User Questions


The app provides an input box where users can ask questions about the uploaded PDF.

query = st.text_input("Let's ask questions :)")
st.write(query)


Step 10: Generate Response


If a query is inputted, the app finds the most similar chunks of text to the query, runs them through a language model, and generates a response.

if query:
    docs = VectorStore.similarity_search(query=query, k=3)
    llm = OpenAI(model_name="gpt-3.5-turbo", max_tokens=100, openai_api_key=OPENAI_API_KEY)
    chain = load_qa_chain(llm=llm, chain_type="stuff")
    with get_openai_callback() as cb:
        response = chain.run(input_documents=docs, question=query)
        st.write(response)


Step 11: Run the App


Finally, you run your app by calling the main function.

if __name__ == "__main__":
    main()

And there you have it! A step-by-step guide to building an open-source chat PDF app using OpenAI's API. This application allows users to upload a PDF and ask questions about the content, using AI to generate accurate and helpful responses.

348 views0 comments

Comments


TOP AI TOOLS

snapy.ai

Snapy allows you to edit your videos with the power of ai. Save at least 30 minutes of editing time for a typical 5-10 minute long video.

- Trim silent parts of your videos
- Make your content more interesting for your audience
- Focus on making more quality content, we will take care of the editing

Landing AI

A platform to create and deploy custom computer vision projects.

SupaRes

An image enhancement platform.

MemeMorph

A tool for face-morphing and memes.

SuperAGI

SuperAGI is an open-source platform providing infrastructure to build autonomous AI agents.

FitForge

A tool to create personalized fitness plans.

FGenEds

A tool to summarize lectures and educational materials.

Shortwave

A platform for emails productivity.

Publer

An all-in-one social media management tool.

Typeface

A tool to generate personalized content.

Addy AI

A Google Chrome Exntesion as an email assistant.

Notability

A telegrambot to organize notes in Notion.

bottom of page