Set pdf pages range to ask
lexiconlp opened this issue · comments
Is there a way to ask a pdf only for 1 page o range of pages ?
Right now:
from pathlib import Path
from google import genai
import os
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
client = genai.Client(api_key=GEMINI_API_KEY)
pdf_file = Path("my-local-file.pdf")
sample_file = client.files.upload(file=pdf_file)
prompt = f"Ask this question only foy 1 page: YOUR QUESTION"
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[sample_file, prompt])
print(response.text)but If my pdf has 2000 pages, I spend too much tokens. Exist a way to set the page or range of pages to ask ?
something like:
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[sample_file, prompt]
pdf_pages = (1,1)
)
Unfortunately, only look at a particular page or page range is not supported. Can you try to create a new file that only contain the desirable portion of the original file?
Can you try to create a new file that only contain the desirable portion of the original file?
Seems like something that could be pretty easy using pypdf. Something like this seems to work:
from pypdf import PdfReader, PdfWriter
def create_new_pdf_from_range(input_pdf_path, output_pdf_path, start_page, end_page):
"""
Creates a new PDF from a range of pages in an existing PDF.
Args:
input_pdf_path: Path to the input PDF file.
output_pdf_path: Path to the output PDF file.
start_page: The starting page number (inclusive, 0-indexed).
end_page: The ending page number (exclusive, 0-indexed).
"""
reader = PdfReader(input_pdf_path)
writer = PdfWriter()
for page_num in range(start_page, end_page):
page = reader.pages[page_num]
writer.add_page(page)
with open(output_pdf_path, "wb") as output_pdf:
writer.write(output_pdf)
# Example usage:
input_pdf = "test.pdf" # Replace with your input PDF file path
output_pdf = "output.pdf" # Replace with your desired output PDF file path
start_page_index = 2 # Start page index (0 for the first page)
end_page_index = 3 # End page index (exclusive, so this will be up to page 2)
create_new_pdf_from_range(input_pdf, output_pdf, start_page_index, end_page_index)
Note that client.files.upload can take an open file object as input, so you can skip even writing it to disk
@lexiconlp Does that solve your problem?
I'd thought about doing that too @MarkDaoust , but I thought there was an option.
Thanks anyway @MarkDaoust : ) . I'm closing this issue.