Only checkboxes with value "Yes" are detected as `value: true` in JSON export
simosdev opened this issue · comments
Relevant pdfcpu code
pdfcpu/pkg/pdfcpu/form/export.go
Line 277 in dfaa588
pdfcpu/pkg/pdfcpu/form/form.go
Line 390 in dfaa588
Pseudo call graphs:
export.go
ExportFormJSON
ExportForm
exportPageFields
exportBtn
extractCheckbox
form.go
FormFields
collectFields
collectPageFields
collectPageField
collectBtn
Noticed the checkbox value problem when using pdfcpu as go library with ExportFormJSON
function. Checkboxes that were checked in the PDF form resulted in checkboxes with value: false
in the exported JSON. Reason for this was customized export value in the PDF. For example in Adobe Acrobat Pro, this would be defined in "Check Box Properties" - "Options" - "Export value". PDF forms created with different language might have different default value here, at least translated values of "Yes" were noticed on some PDFs we were processing.
Tried to bypass this issue by iterating checkbox fields manually via FormFields
function, but codepath used here has the same issue of checking for hardcoded "Yes"
value.
Patched around this issue by checking for custom values in pdfcpu code, but this is not really a general fix for the problem.
Can you share a test file for analysis?
Thank you!
Here you go.
All checkboxes are checked, so all values should be true from pdfcpu. Only fields a
and e
result in true value from pdfcpu as they have export value "Yes"
.
Noticed another potential issue with the test form. First checkbox a
with export value "Yes"
is detected as radio button group by pdfcpu. All fields were created as checkboxes in Adobe Acrobat Pro. Not really experienced with the software, so there might be something I overlooked, but field e
with default export value "Yes"
is correctly detected as checkbox
Tested with latest pdfcpu release v.0.7.0.
pdfcpu form list checkbox-test-form-filled.pdf
output:
Pg L Field │ Id │ Name │ Value │ Options
━━━━━━━━━━━━━━━┿━━━━━┿━━━━━━┿━━━━━━━┿━━━━━━━━
1 RadioBGr. │ 10 │ a │ Yes │ Yes
CheckBox │ 53 │ b │ │
CheckBox │ 54 │ c │ │
CheckBox │ 55 │ d │ │
CheckBox │ 56 │ e │ Yes │
For comparison custom export created via python library PyMuPDF
Value here is just strings. Value on field c
does not show correctly, but we have noticed issues with non-ASCII character like ä
in other PDF libraries as well in checkbox export value.
| page | field_name | type | flags | flags_str | value |
|--------|--------------|----------|---------|-------------|-------------|
| 1 | a | CheckBox | 0 | | Yes |
| 1 | b | CheckBox | 0 | | true |
| 1 | c | CheckBox | 0 | | Kyll� |
| 1 | d | CheckBox | 0 | | custom val |
| 1 | e | CheckBox | 0 | | Yes |
pdfcpu form export checkbox-test-form-filled.pdf
out.json:
{
"header": {
"source": "checkbox-test-form-filled.pdf",
"version": "pdfcpu v0.7.0 dev",
"creation": "2024-03-07 06:14:51 EET",
"creator": "Adobe Acrobat Pro (64-bit) 23.8.20533",
"producer": "Adobe Acrobat Pro (64-bit) 23.8.20533"
},
"forms": [
{
"checkbox": [
{
"pages": [
1
],
"id": "53",
"name": "b",
"default": false,
"value": false,
"locked": false
},
{
"pages": [
1
],
"id": "54",
"name": "c",
"default": false,
"value": false,
"locked": false
},
{
"pages": [
1
],
"id": "55",
"name": "d",
"default": false,
"value": false,
"locked": false
},
{
"pages": [
1
],
"id": "56",
"name": "e",
"default": false,
"value": true,
"locked": false
}
],
"radiobuttongroup": [
{
"pages": [
1
],
"id": "10",
"name": "a",
"options": [
"Yes"
],
"value": "Yes",
"locked": false
}
]
}
]
}
Thanks, I'll take a look.