Crash in `document.extract_text`
cooperll opened this issue · comments
cooper commented
I'm seeing a strange crash intermittently when iterating through a list of PDFs, and then trying to read their contents using this code:
pub fn read_pdf_contents(path: &Path) -> Result<String, Box<dyn std::error::Error>> {
match Document::load(path) {
Ok(document) => {
let pages = document.get_pages();
let mut texts = Vec::new();
for (i, _) in pages.iter().enumerate() {
let page_number = (i + 1) as u32;
let text = document.extract_text(&[page_number]);
texts.push(text.unwrap_or_default());
}
let full_text = texts.join("\n");
Ok(full_text)
}
Err(err) => {
Err(format!("Error reading PDF contents: {}", err).into())
}
}
}
This is the crash that I see, which happens during execution of document.extract_text
:
MyApp(23287,0x16f64b000) malloc: tiny_free_list_remove_ptr: Internal invariant broken (next ptr of prev): ptr=0x138e14f80, prev_next=0x60000000138e14f
MyApp(23287,0x16f64b000) malloc: *** set a breakpoint in malloc_error_break to debug
Has anyone seen this before? This has been quite a challenging bug to fix.
Arif Driessen commented
Can you provide a PDF to reproduce this bug?