Extract Information From YouTube Videos Using GPT

8 min readMay 4, 2023

Skip the Mindless Video-Watching and Instead Focus on Extracting the Information You Need Using Your Personal YouTube Assistant

Raise your hand if you’ve ever found yourself watching YouTube videos at 1.5X or 2X, waiting for the good stuff to come. Or how about dragging the timeline back and forth like a detective trying to find the exact moment you need? Let’s face it, we’ve all been there! Today, we’re going to try build our way out of this. As part of this, we are going to explore LlamaHub and use the YouTube loader. Then, we are going to build, test and tune our YouTube assistant. However, as with any technology, there are potential drawbacks to this approach. We’ll discuss these drawbacks and provide some ideas on how to customise our application to improve it’s efficiency and overcome these shortcomings. So, sit back, relax, and let’s enjoy the process of building our YouTube Assistant!

Before we dive into building our application, I wanted to introduce you all to another popular language model that I came across. While we’re all familiar with the famous GPT models from OpenAI (at least the ChatGPT application), did you know that there are other powerful language models available out there? One such model is Llama, developed by Meta. Let’s get to know Llama.

Llama is not your average language model! Developed by Meta’s Fundamental AI Research (FAIR) team, this collection of foundational language models ranges from 7 billion to 65 billion parameters and has been trained on trillions of tokens using publicly available datasets. Notably, LLaMA-13B(13 billion parameters) model surpasses OpenAI’s GPT-3(175 billion paramters) while being over ten times smaller while the Llama-65B model is right up there with the best models, Chinchilla70B and PaLM-540B. And the best part? All Llama models are available for use by the research community for free. All these features along with its ability to require far less computing power and resources, Llama is an ideal tool for building natural language applications.

Llama models have been put to the test on a variety of benchmark datasets for natural language processing tasks, and they’ve performed impressively across the board. From the CommonsenseQA dataset for commonsense reasoning, to the STS-B dataset for textual similarity, and the SQuAD dataset for question answering, Llama models have proven their dominance over GPT-3 time and time again.

At the moment, access to Llama model weights is available on a request basis by the Meta’s AI team. I’ll be sure to share my findings and provide tips for building natural language applications with Llama. If you’re interested in trying out Llama models for yourself, head over to this link and submit a request form. It is available under the tips section after the overview.

If you don’t want to wait for this, you can start using the Dalai repository and install your models using a simple command npx dalai llama install [model]. The model can be replaced with 7B, 13B, 30B or 65B depending on the hardware that you have access to. There has been some issues reported while installation of 13B and 30B models, but the 7B models are mostly tested and it is working fine. There are also other applications built on top of llama. Some popular applications can be seen here.

Alright now, let’s dive back into building our awesome YouTube querying application! Firstly, we are going to explore LlamaHub, a repository of data loaders developed and maintained by the community.

LlamaHub - Explore Endless, Versatile Data

LlamaHub is your one-stop-shop for data loaders and readers. It’s a simple and easy-to-use library that connects large language models to a wide range of knowledge sources. Whether you’re building an index or finding different tools for an agent, LlamaHub’s general-purpose utilities make it a breeze to load data into LlamaIndex and LangChain. And with the ability to use multiple loaders together, you can easily create the perfect index for your needs. Plus, with just one line of code, you can download a loader from LlamaIndex and get started right away!

LlamaHub’s loaders are versatile and flexible, allowing you to choose from a range of options or customize them to suit your needs. LlamaHub has you covered, whether you need to extract information from documents, your digital workspace apps, note-taking apps, email, or messaging apps. For example, there are loaders to parse Google Docs, SQL Databases, PDF files, PowerPoints, asana, Notion, Slack, Obsidian, and many more. You can even combine multiple tasks with ease! So, let your creativity run wild and see how LlamaHub can transform your workflow.

To start using these loaders, visit the Llama Hub website and start searching for your tool. The loaders are named corresponding to the applications they parse so it is easier to find your loader. Once you find your loader, you will have information on how the data should be inputted to the loader and the code snippet that shows how to use the loader. The general steps involved are:-

Using the download_loader method of Llama Index to intialize the loader.
creating an instance of your loaded model
Using the load_data to parse your data

These loaders typically leverage existing libraries. For instance, “youtube_transcript,” the loader we will be using, employs the “youtube-transcript-api”. If you wish to comprehend the source code for debugging purposes or to enhance the loader’s functionality, you may select the “View on GitHub” link to investigate the source code. The class name and folder name will always be the same as what you enter in the download_loader. You can download the loader and locate this class in the “base.py” file of the loader. You can then modify the load_data functionality.

If you are not satisfied with the existing loaders you can build one too. Follow the steps in llama-hub to contribute your loader.

If you’re an enthusiastic coder looking to personalize your application we can explore the “youtube-transcript-api”.However, if you’re keen to dive straight into building the application without any modifications, feel free to skip the section below.

youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!

You can refer the documentation and installation instructions from the pip projects.

By default, the “youtube_loader” uses the find_transcript method to locate manually generated transcripts, but we can bypass this behavior by using the find_generated_transcript method to access automatically generated transcripts. This is especially useful if we have an English transcript available, as GPT models have a vast English vocabulary. However, if we want to create a custom GPT model for a different language, we can use the translate method, which is a feature of YouTube that can translate any transcript into a specified language. The quality of the translation, of course, depends on the quality of the available transcript.

Youtube Transcript Loader

The youtube_loader uses the “youtube-transcript-api” Python package to retrieve text transcripts of YouTube videos. It then parses the data into a format easier to load into llama-index.

Now that we have obtained the lowdown, let’s start developing the code! To get guidance on setting up an OpenAI key, installing the recommended libraries, and gaining a more thorough comprehension of the “llama_index,” I suggest checking out my previous blog post.

Let’s start by designing the function that will obtain the YouTube URL, extract the transcript, and transform it into the necessary Document format using the BaseDocument schema of the “llama_index”.

def prepare_transcript(url):
    YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")
    loader = YoutubeTranscriptReader()
    documents = loader.load_data(ytlinks=[url])
    return documents

The steps involved in this function are :-

Initializing the “YoutubeTranscriptReader” loader
Passing the required list of YouTube URLs to parse the transcripts into “BaseDocuments”
Return the list of “BaseDocuments”

Next we must get the index ready by utilizing the GPTSimpleVectorIndex from the list of documents returned. We have two options: directly utilize the GPTSimpleVectorIndex to prepare the index, or utilize “prompt_helper” and “llm_predictor ”to provide context to the index. The outcome for each option will vary, so it’s important to test both before deciding which to use. At times, the GPTSimpleVectorIndex performs adequately as is.

response generated using GPTSimpleVectorIndex without providing any context

response generated using GPTSimpleVectorIndex after using ServiceContext

As evident from the above images, the model necessitates a certain level of manual tuning and testing before being utilized. In my case, I used a video entitled “World’s Best Android Camera vs iPhone!” to assess this model, which exemplifies the challenge of context in this scenario.

I will demonstrate both approaches: utilizing the GPTSimpleVectorIndex as is, and utilizing it by establishing a context for indexing.

# The simple way
def prepareindex(documents):
  ytindex=GPTSimpleVectorIndex.from_documents(documents)
  return ytindex

# This involves tuning the parameters to get the best possible outcome
def prepareindex(documents):
    max_input_size = 10000
    num_outputs = 2000
    max_chunk_overlap = 10
    chunk_size_limit = 100 
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap,chunk_size_limit=chunk_size_limit)
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
    context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    ytindex=GPTSimpleVectorIndex.from_documents(documents, service_context=context)
    return ytindex

Now, we will construct the final code snippet to retrieve our prepared index and respond to the requested prompt. Supply a prompt that is appropriate for your requirements to setup the context to your queries.

# Setup a context for your prompt depending on your use case to get the best
# possible results
def chat(documents):
  chat_index=prepareindex(documents)
  While True:
    query=input('What do you need from the video/s? ')
    response=ytindex.query(f"Analyze the transcript and answer the query. query : {query}")
    print(f" Response : {response}")

We will put together the functions and start using the application.

if __name__=='__main__':
  os.environ['OPENAI_API_KEY']='input your OpenAI API key'
  documents=prepare_transcript('pass the list of URLs as comma seperated strings')
  chat(documents)

Limitations of This Application

While this application functions admirably with English videos since transcripts are readily available for most of them, finding transcripts for videos in other languages may be challenging.Here’s a helpful tip for dealing with videos in other languages: you could try utilizing the “languages” parameter and verify if an English transcript is available. If a transcript is available but in a different language, we can always ask the GPT model to provide answers in English. However, a more efficient approach would be to translate transcripts and index them to reduce the token usage.

To improve this application, we can optimize the source code located in youtube_transcript/base.py by incorporating text-translation functionality and exception handling for the “get_transcript ”method.

In conclusion, we have successfully constructed a YouTube assistant utilizing GPT. We’ve thoroughly comprehended the components involved in this application, as well as the limitations of the process. Additionally, we’ve explored ways to optimize the process, including modifying the source code and incorporating exception handling. With this application, we can efficiently extract YouTube video transcripts and rapidly retrieve answers to our queries. So, get experimenting and exploring the possibilities that GPT and Llama Hub have to offer!

Useful Resources

LlamaHub - GitHub repository containing the source code for all loaders

Llama - Llama documentation maintained by Hugging Face community

Vicuna - locally hosted chatbot with 90% accuracy of ChatGPT