J-F-Liu / lopdf

A Rust library for PDF document manipulation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PDFs protected from Adobe Acrobat online get corrupted when saved with lopdf

shantanugoel opened this issue · comments

Any files password protected from Adobe acrobat online, fail to open in any pdf viewer after saving from lopdf, even if the operation is as simple as opening and then saving it without performing any other op on them.
The issue could be for any files generated by acrobat, but I dont have a paid account so can only test the ones I generated from their free protection offering.
test_protected.pdf
test_protected_lopdfsaved.pdf
test_orig.pdf

Attaching files:

  • test_orig.pdf - Original file
  • test_orig_protected.pdf - acrobat generated
  • test_orig_protected_lopdfsaved.pdf - saved by lopdf

The last file is corrupt. Password is "aaaaaa".
The files save well from other libs and utilities, with or without decrypting them, (I tried pdf-rs, qpdf, pdfium, etc)

A relevant piece of info that I found while debugging it was that these files have v 4, r 4, which lopdf doesn't currently support for decryption (aes128). I added that support in my local branch of lopdf, but then found that the issue happens even with or without decryption.

Hi @J-F-Liu I'm facing a similar issue. The file seems to have AES128 encryption but is not password protected. I'm hoping you can advise what sort of crypto capabilities lopdf has. I see pdf-rs seems to have some capabilities in this regard but those crates are not really dependable.

Here's the debug out from the get_encrypted function performed on the Document struct:
<</Filter /Standard/Length 128/V 4/R 4/U <17b478e015e10c0d80cd94e92a11e9fc00000000000000000000000000000000>/O <47049aec225d12e5d35a369667756393aa48a808d49426128e08c55c657c3395>/P -4/StmF /StdCF/StrF /StdCF/EncryptMetadata false/CF <</StdCF <</Length 16/CFM /AESV2/AuthEvent /DocOpen>>>>>>

Here's the full debug output from the Document struct:
Document { version: "1.7", trailer: <</Root 18 0 R/Info 15 0 R/Encrypt 22 0 R/ID [<0bac900a9f44a170bd291323ebf2008b> <0bac900a9f44a170bd291323ebf2008b>]/Type /XRef/Size 26>>, reference_table: Xref { cross_reference_type: CrossReferenceStream, entries: {1: Compressed { container: 24, index: 0 }, 2: Normal { offset: 15, generation: 0 }, 4: Normal { offset: 53998, generation: 0 }, 6: Normal { offset: 54094, generation: 0 }, 7: Compressed { container: 24, index: 1 }, 9: Compressed { container: 24, index: 2 }, 10: Compressed { container: 24, index: 3 }, 11: Normal { offset: 61520, generation: 0 }, 13: Normal { offset: 61619, generation: 0 }, 14: Compressed { container: 24, index: 4 }, 15: Compressed { container: 24, index: 5 }, 16: Compressed { container: 24, index: 6 }, 17: Compressed { container: 24, index: 7 }, 18: Normal { offset: 68134, generation: 0 }, 22: Normal { offset: 68196, generation: 0 }, 23: Normal { offset: 68492, generation: 0 }, 24: Normal { offset: 69480, generation: 0 }, 25: Normal { offset: 70486, generation: 0 }}, size: 26 }, objects: {(2, 0): <</Subtype /Image/Width 656/Height 225/BitsPerComponent 8/ColorSpace /DeviceRGB/Filter /DCTDecode/Length 53840>>stream...endstream, (4, 0): <</Type /Page/Parent 1 0 R/Contents 6 0 R/Resources 7 0 R/MediaBox [0 0 612 792]>>, (6, 0): <</Length 7376>>stream...endstream, (11, 0): <</Type /Page/Parent 1 0 R/Contents 13 0 R/Resources 14 0 R/MediaBox [0 0 612 792]>>, (13, 0): <</Length 6464>>stream...endstream, (18, 0): <</Type /Catalog/Pages 1 0 R/Metadata 23 0 R>>, (22, 0): <</Filter /Standard/Length 128/V 4/R 4/U <17b478e015e10c0d80cd94e92a11e9fc00000000000000000000000000000000>/O <47049aec225d12e5d35a369667756393aa48a808d49426128e08c55c657c3395>/P -4/StmF /StdCF/StrF /StdCF/EncryptMetadata false/CF <</StdCF <</Length 16/CFM /AESV2/AuthEvent /DocOpen>>>>>>, (23, 0): <</Type /Metadata/Subtype /XML/Filter /Crypt/Length 899>>stream...endstream, (24, 0): <</Type /ObjStm/N 8/First 54/Length 0>>stream...endstream, (25, 0): <</Root 18 0 R/Info 15 0 R/Encrypt 22 0 R/ID [<0bac900a9f44a170bd291323ebf2008b> <0bac900a9f44a170bd291323ebf2008b>]/Type /XRef/W [1 4 2]/Filter /FlateDecode/Index [0 26]/Size 26/Length 108>>stream...endstream}, max_id: 25, max_bookmark_id: 0, bookmarks: [], bookmark_table: {}, xref_start: 70486 }

@shantanugoel I've used the mupdf crate (a safe wrapper around mupdf) to create a function that cleans and removes encryption from these troublesome files. I'll just drop the code with full crate paths in case you would be open to a workaround until lopdf can handle this encryption.

fn mu_parse(path: &str) -> std::io::Result<Vec<u8>> {
    let doc = mupdf::pdf::document::PdfDocument::open(path).unwrap();
    let mut buffer: Vec<u8> = Vec::new();

    let mut options = mupdf::pdf::document::PdfWriteOptions::default();
    options
        .set_encryption(mupdf::pdf::Encryption::None)
        .set_clean(true)
        .set_sanitize(true)
        .set_pretty(true);
    doc.write_to_with_options(&mut buffer, options).unwrap();

    Ok(buffer)
}

This allows me to handle lopdf encryption errors by passing the file to the above function and the output back to lopdf.