J-F-Liu / lopdf

A Rust library for PDF document manipulation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crash in `document.extract_text`

cooperll opened this issue · comments

I'm seeing a strange crash intermittently when iterating through a list of PDFs, and then trying to read their contents using this code:

pub fn read_pdf_contents(path: &Path) -> Result<String, Box<dyn std::error::Error>> {
  match Document::load(path) {
    Ok(document) => {
      let pages = document.get_pages();
      let mut texts = Vec::new();

      for (i, _) in pages.iter().enumerate() {
        let page_number = (i + 1) as u32;
        let text = document.extract_text(&[page_number]);
        texts.push(text.unwrap_or_default());
      }

      let full_text = texts.join("\n");
      Ok(full_text)
    }
    Err(err) => {
      Err(format!("Error reading PDF contents: {}", err).into())
    }
  }
}

This is the crash that I see, which happens during execution of document.extract_text:

MyApp(23287,0x16f64b000) malloc: tiny_free_list_remove_ptr: Internal invariant broken (next ptr of prev): ptr=0x138e14f80, prev_next=0x60000000138e14f
MyApp(23287,0x16f64b000) malloc: *** set a breakpoint in malloc_error_break to debug

Has anyone seen this before? This has been quite a challenging bug to fix.

Can you provide a PDF to reproduce this bug?