# SubQuestionQueryEngine

Often, we encounter scenarios where our queries span across multiple documents. 

In this notebook, we delve into addressing complex queries that extend over various documents by breaking them down into simpler sub-queries and generate answers using the `SubQuestionQueryEngine`.

### Installation

In [None]:
!pip install llama-index
!pip install llama-index-llms-anthropic
!pip install llama-index-embeddings-huggingface

### Setup API Key

In [1]:
import os
os.environ['ANTHROPIC_API_KEY'] = 'YOUR ANTHROPIC API KEY'

### Setup LLM and Embedding model

We will use anthropic latest released `Claude-3 Opus` LLM.

In [2]:
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

In [3]:
llm = Anthropic(temperature=0.0, model='claude-3-opus-20240229')
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

In [4]:
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512

### Setup logging

In [5]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

import logging
import sys

# Set up the root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)  # Set logger level to INFO

# Clear out any existing handlers
logger.handlers = []

# Set up the StreamHandler to output to sys.stdout (Colab's output)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)  # Set handler level to INFO

# Add the handler to the logger
logger.addHandler(handler)

from IPython.display import display, HTML

### Download Data

We will use Uber and Lyft 2021 10K SEC Filings

In [6]:
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O './uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf' -O './lyft_2021.pdf'

--2024-03-08 07:07:32--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [application/octet-stream]
Saving to: ‘./uber_2021.pdf’


2024-03-08 07:07:32 (87.4 MB/s) - ‘./uber_2021.pdf’ saved [1880483/1880483]

--2024-03-08 07:07:33--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1440303 (1.4M) [application/octet-stream]
Sa

### Load Data

In [7]:
from llama_index.core import SimpleDirectoryReader
lyft_docs = SimpleDirectoryReader(input_files=["lyft_2021.pdf"]).load_data()
uber_docs = SimpleDirectoryReader(input_files=["uber_2021.pdf"]).load_data()

In [8]:
print(f'Loaded lyft 10-K with {len(lyft_docs)} pages')
print(f'Loaded Uber 10-K with {len(uber_docs)} pages')

Loaded lyft 10-K with 238 pages
Loaded Uber 10-K with 307 pages


### Index Data

In [9]:
from llama_index.core import VectorStoreIndex
lyft_index = VectorStoreIndex.from_documents(lyft_docs[:100])
uber_index = VectorStoreIndex.from_documents(uber_docs[:100])

### Create Query Engines

In [10]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=5)

In [11]:
uber_engine = uber_index.as_query_engine(similarity_top_k=5)


### Querying

In [12]:
response = await lyft_engine.aquery('What is the revenue of Lyft in 2021? Answer in millions with page reference')
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"


In [13]:
response = await uber_engine.aquery('What is the revenue of Uber in 2021? Answer in millions, with page reference')
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"


### Create Tools

In [14]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(name='lyft_10k', description='Provides information about Lyft financials for year 2021')
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(name='uber_10k', description='Provides information about Uber financials for year 2021')
    ),
]

### Create `SubQuestionQueryEngine`

In [15]:
sub_question_query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)

### Querying

In [17]:
response = await sub_question_query_engine.aquery('Compare revenue growth of Uber and Lyft from 2020 to 2021')

HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
Generated 4 sub questions.
[1;3;38;2;237;90;200m[uber_10k] Q: What was Uber's revenue in 2020?
[0m[1;3;38;2;90;149;237m[uber_10k] Q: What was Uber's revenue in 2021?
[0m[1;3;38;2;11;159;203m[lyft_10k] Q: What was Lyft's revenue in 2020?
[0m[1;3;38;2;155;135;227m[lyft_10k] Q: What was Lyft's revenue in 2021?
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;2;11;159;203m[lyft_10k] A: According to Lyft's consolidated statements of operations data, Lyft's total revenue in 2020 was $2,364,681,000.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;2;90;149;237m[uber_10k] A: According to Uber's consolidated statements of operations, Uber's revenue in 2021 was $17,455 million.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;2;237;90;200m[uber_10k] A: According to Uber's consolidated statements of o

In [18]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [19]:
response = await sub_question_query_engine.aquery('Compare the investments made by Uber and Lyft')

HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
Generated 4 sub questions.
[1;3;38;2;237;90;200m[uber_10k] Q: What investments did Uber make in 2021
[0m[1;3;38;2;90;149;237m[uber_10k] Q: What was the total amount invested by Uber in 2021
[0m[1;3;38;2;11;159;203m[lyft_10k] Q: What investments did Lyft make in 2021
[0m[1;3;38;2;155;135;227m[lyft_10k] Q: What was the total amount invested by Lyft in 2021
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;2;90;149;237m[uber_10k] A: Based on the context provided, in 2021 Uber invested:

- $2.3 billion in acquisition of businesses, net of cash acquired
- $1.1 billion in purchases of marketable securities  
- $982 million in purchases of non-marketable equity securities
- $297 million in purchases of notes receivable
- $298 million in purchases of property and equipment

So in total, Uber invested approximately $5.0 billion in 2021 across business acquisitions, marketable 

In [20]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))