J-F-Liu / lopdf

A Rust library for PDF document manipulation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

it seems not get all decoded elements while reading a pdf generated by ghostscript 9.27

BXHlixiaodong opened this issue · comments

Hi all.

I got a pdf generated by ghostscript on debian buster. I tried to parsing its page contents:

let mut doc = Document::load("./test04.pdf")?;
let page_content_ids: Vec<ObjectId> = doc
    .page_iter()
    .flat_map(|page_id| doc.get_page_contents(page_id))
    .collect();

for id in page_content_ids.into_iter() {
    let stream = doc.get_object_mut(id)?.as_stream_mut()?;
    stream.decompress();

    let content = stream.decode_content()?;

    content
        .operations
        .iter()
        .for_each(|op| println!("{:?}", op));    // here print all operators
    stream.set_content(content.encode()?); // also boxed into stream, nothing changed
    stream.compress()?;
}
doc.save("./test04.output.pdf)?;  // saved as other pdf file

here is the test04.pdf.

I found no Tj operator from stdout, and the test04.output.pdf lose some elements, test04.output.pdf is not the same with test04.pdf.

Does anyone know how to fix it? or should I use other methods base on Stream or Object?
Thanks.