Introduction: Large language models (LLMs) have revolutionized the development of applications by enabling powerful natural language processing capabilities. However, integrating LLMs with other sources of computation or knowledge can further enhance their functionality. This article introduces a simple script that leverages LangChain, a library designed to combine LLMs with various tools and utilities. Specifically, we demonstrate how to use ChatGPT, one of the LLMs, on your own text files to facilitate question answering and chatbot functionalities.

Source : https://github.com/techleadhd/chatgpt-retrieval

Step-by-step guide:

  1. Installation and configuration: Start by installing the LangChain library by running "pip install langchain" or "conda install langchain -c conda-forge" in your terminal.
  2. Importing libraries: Import the required libraries into your Python script. This includes "os", "sys", "openai" and several modules of the LangChain library, such as "RetrievalQA", "ChatOpenAI", "DirectoryLoader", "TextLoader", "OpenAIEmbeddings", "VectorstoreIndexCreator" and "Chroma".
  3. Setting the API key: Set your OpenAI API key by assigning it to the environment variable "OPENAI_API_KEY". Replace "constants.APIKEY" in the script with your actual API key.
  4. Loading and indexing data: Define the source of your data by creating a "TextLoader" or "DirectoryLoader" object, depending on whether you want to work with a single text file or a directory with multiple files. Adjust the loader accordingly and specify the file or folder.
  5. Creating the index: Create an index using "VectorstoreIndexCreator" by passing the loader as a parameter. If you want to cache and reuse the index, set "PERSIST" to True and specify a persistent directory. Otherwise, set "PERSIST" to False.
  6. Initializing the ChatGPT-LangChain Chain: Create an instance of a RetrievalQA-chain using "ChatOpenAI" as the language model (for example, model="gpt-3.5-turbo"). Specify the retriever as the previously created index vector store using "as_retriever(search_kwargs={"k": 1})".
  7. Run searches: Specify a query as an argument to the script, which can be passed through the command line. For example, run the script with "python script.py "What is the capital of France?"". The script will print ChatGPT's response based on the query provided.

Conclusion: By following the steps outlined in this article and utilizing the provided script, you can easily incorporate ChatGPT and LangChain to perform question answering and chatbot tasks on your own text files. The LangChain library offers additional functionalities and integrations, allowing you to explore various applications of large language models in your projects. Refer to the LangChain documentation for more details and advanced usage.

ChatGPT script with LangChain integration

import os
import sys

import openai
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma

import constants

os.environ["OPENAI_API_KEY"] = constants.APIKEY

# Enable to cache & reuse the model to disk (for repeated queries on the same data)
PERSIST = False

query = sys.argv[1]

if PERSIST and os.path.exists("persist"):
  print("Reusing index...\n")
  vectorstore = Chroma(persist_directory="persist", embedding_function=OpenAIEmbeddings())
  from langchain.indexes.vectorstore import VectorStoreIndexWrapper
  index = VectorStoreIndexWrapper(vectorstore=vectorstore)
else:
  loader = TextLoader('data.txt')
  # This code can also import folders, including various filetypes like PDFs using the DirectoryLoader.
  # loader = DirectoryLoader(".", glob="*.txt")
  if PERSIST:
    index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":"persist"}).from_loaders([loader])
  else:
    index = VectorstoreIndexCreator().from_loaders([loader])

chain = RetrievalQA.from_chain_type(
  llm=ChatOpenAI(model="gpt-3.5-turbo"),
  retriever=index.vectorstore.as_retriever(search_kwargs={"k": 1}),
)
print(chain.run(query))