it seems not get all decoded elements while reading a pdf generated by ghostscript 9.27
BXHlixiaodong opened this issue · comments
BXHlixiaodong commented
Hi all.
I got a pdf generated by ghostscript on debian buster. I tried to parsing its page contents:
let mut doc = Document::load("./test04.pdf")?;
let page_content_ids: Vec<ObjectId> = doc
.page_iter()
.flat_map(|page_id| doc.get_page_contents(page_id))
.collect();
for id in page_content_ids.into_iter() {
let stream = doc.get_object_mut(id)?.as_stream_mut()?;
stream.decompress();
let content = stream.decode_content()?;
content
.operations
.iter()
.for_each(|op| println!("{:?}", op)); // here print all operators
stream.set_content(content.encode()?); // also boxed into stream, nothing changed
stream.compress()?;
}
doc.save("./test04.output.pdf)?; // saved as other pdf file
here is the test04.pdf.
I found no Tj
operator from stdout, and the test04.output.pdf
lose some elements, test04.output.pdf
is not the same with test04.pdf
.
Does anyone know how to fix it? or should I use other methods base on Stream
or Object
?
Thanks.