pdfcpu / pdfcpu

A PDF processor written in Go.

Home Page:http://pdfcpu.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Only checkboxes with value "Yes" are detected as `value: true` in JSON export

simosdev opened this issue · comments

Relevant pdfcpu code

Pseudo call graphs:

export.go

ExportFormJSON
  ExportForm
    exportPageFields
      exportBtn
        extractCheckbox

form.go

FormFields
  collectFields
    collectPageFields
      collectPageField
        collectBtn

Noticed the checkbox value problem when using pdfcpu as go library with ExportFormJSON function. Checkboxes that were checked in the PDF form resulted in checkboxes with value: false in the exported JSON. Reason for this was customized export value in the PDF. For example in Adobe Acrobat Pro, this would be defined in "Check Box Properties" - "Options" - "Export value". PDF forms created with different language might have different default value here, at least translated values of "Yes" were noticed on some PDFs we were processing.

Tried to bypass this issue by iterating checkbox fields manually via FormFields function, but codepath used here has the same issue of checking for hardcoded "Yes" value.

Patched around this issue by checking for custom values in pdfcpu code, but this is not really a general fix for the problem.

Can you share a test file for analysis?
Thank you!

checkbox-test-form-filled.pdf

Here you go.
All checkboxes are checked, so all values should be true from pdfcpu. Only fields a and e result in true value from pdfcpu as they have export value "Yes".

Noticed another potential issue with the test form. First checkbox a with export value "Yes" is detected as radio button group by pdfcpu. All fields were created as checkboxes in Adobe Acrobat Pro. Not really experienced with the software, so there might be something I overlooked, but field e with default export value "Yes" is correctly detected as checkbox

Tested with latest pdfcpu release v.0.7.0.

pdfcpu form list checkbox-test-form-filled.pdf output:

Pg L Field     │ Id  │ Name │ Value │ Options
━━━━━━━━━━━━━━━┿━━━━━┿━━━━━━┿━━━━━━━┿━━━━━━━━
1    RadioBGr. │ 10  │ a    │ Yes   │ Yes
     CheckBox  │ 53  │ b    │       │
     CheckBox  │ 54  │ c    │       │
     CheckBox  │ 55  │ d    │       │
     CheckBox  │ 56  │ e    │ Yes   │

For comparison custom export created via python library PyMuPDF
Value here is just strings. Value on field c does not show correctly, but we have noticed issues with non-ASCII character like ä in other PDF libraries as well in checkbox export value.

|   page | field_name   | type     |   flags | flags_str   |  value      |
|--------|--------------|----------|---------|-------------|-------------|
|      1 | a            | CheckBox |       0 |             | Yes         |
|      1 | b            | CheckBox |       0 |             | true        |
|      1 | c            | CheckBox |       0 |             | Kyll�      |
|      1 | d            | CheckBox |       0 |             | custom val  |
|      1 | e            | CheckBox |       0 |             | Yes         |

pdfcpu form export checkbox-test-form-filled.pdf
out.json:

{
	"header": {
		"source": "checkbox-test-form-filled.pdf",
		"version": "pdfcpu v0.7.0 dev",
		"creation": "2024-03-07 06:14:51 EET",
		"creator": "Adobe Acrobat Pro (64-bit) 23.8.20533",
		"producer": "Adobe Acrobat Pro (64-bit) 23.8.20533"
	},
	"forms": [
		{
			"checkbox": [
				{
					"pages": [
						1
					],
					"id": "53",
					"name": "b",
					"default": false,
					"value": false,
					"locked": false
				},
				{
					"pages": [
						1
					],
					"id": "54",
					"name": "c",
					"default": false,
					"value": false,
					"locked": false
				},
				{
					"pages": [
						1
					],
					"id": "55",
					"name": "d",
					"default": false,
					"value": false,
					"locked": false
				},
				{
					"pages": [
						1
					],
					"id": "56",
					"name": "e",
					"default": false,
					"value": true,
					"locked": false
				}
			],
			"radiobuttongroup": [
				{
					"pages": [
						1
					],
					"id": "10",
					"name": "a",
					"options": [
						"Yes"
					],
					"value": "Yes",
					"locked": false
				}
			]
		}
	]
}

Thanks, I'll take a look.