`.replace_text` does not work as intended.

Question

`.replace_text` does not work as intended.

jymchng opened this issue a year ago · comments

L105    pdf.replace_text(2, "jdoe123@mycompany.net", "hello WORLD").unwrap();
L106    dbg!(pdf.extract_text(&[2]).unwrap());

Logs

[src\redact.rs:106] pdf.extract_text(&[2]).unwrap() = "For example, john.doe@example.com, jdoe123@mycompany.net, \nalice_123+test@gmail.co.uk, and jane\n-\ndoe@my\n-\nuniversity.edu all match this pattern, \nand are therefore considered valid email addresses.\n \n \n"

Apparently, directly replacing text in a page doesn't work?

Jim Chng commented a year ago

#217

Jim Chng · Answer 1 · Mon Mar 27 2023 12:11:08 GMT+0800 (China Standard Time)

@J-F-Liu Hi J-F-Liu, just thinking about this replace_text method that returns a Result<()> - it means there is a contract between the caller and callee such that if replace_text indeed does replace the text in the .pdf, it returns an Ok(()), else it returns an Err variant.

For this function, particularly on Line 138, it seems that the function does nothing when the encoding is not within the pre-defined 'able-to-parse' encodings ("Tf" or "Tj"), the match arm _ => {} evaluates to an empty scope. Would it be better to return an Err so that the caller knows it is not getting what the function promises to do because it is unable to parse any other type of encodings?

Junfeng Liu · Answer 2 · Mon Mar 27 2023 20:06:14 GMT+0800 (China Standard Time)

Yes, text processing is not implemented completely.

Jim Chng · Answer 3 · Sun Mar 31 2024 18:54:20 GMT+0800 (China Standard Time)

@J-F-Liu Hi Liu, do you think this issue can be fixed?