pdfcpu / pdfcpu

A PDF processor written in Go.

Home Page:http://pdfcpu.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

xRefTable failed: pdfcpu: can't find last xref section

tobwen opened this issue · comments

requested details

  • issue is based on dfaa588
  • Debian 12.5 on AMD64

summary of the bug

pdfcpu validate test.pdf throws error Read: xRefTable failed: pdfcpu: can't find last xref section.

reproduce me

Other PDF tools verify that this PDF is valid.

%PDF-1.1
%µ¶

1 0 obj
<</Type/Catalog/Pages 2 0 R>>
endobj

2 0 obj
<</Type/Pages/Count 1/Kids[3 0 R]/MediaBox[0 0 3 3]>>
endobj

3 0 obj
<</Type/Page/Parent 2 0 R>>
endobj

xref
0 4
0000000000 65535 f 
0000000016 00000 n 
0000000062 00000 n 
0000000132 00000 n 

trailer
<</Size 4/Root 1 0 R>>
startxref
176
%%EOF

long log

$ pdfcpu validate -vv test.pdf
validating(mode=relaxed) test.pdf ...
 READ: 2024/04/19 21:41:53 Read: begin
 INFO: 2024/04/19 21:41:53 PDF Version 1.5 conforming reader
 READ: 2024/04/19 21:41:53 readXRefTable: begin
Fatal: pdfcpu: can't find last xref section
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.offsetLastXRefSection
        /Users/horstrutter/Documents/go/pdfcpu/pkg/pdfcpu/read.go:169
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.readXRefTable
        /Users/horstrutter/Documents/go/pdfcpu/pkg/pdfcpu/read.go:1616
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.ReadWithContext
        /Users/horstrutter/Documents/go/pdfcpu/pkg/pdfcpu/read.go:99
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.Read
        /Users/horstrutter/Documents/go/pdfcpu/pkg/pdfcpu/read.go:74
github.com/pdfcpu/pdfcpu/pkg/api.ReadContext
        /Users/horstrutter/Documents/go/pdfcpu/pkg/api/api.go:74
github.com/pdfcpu/pdfcpu/pkg/api.Validate
        /Users/horstrutter/Documents/go/pdfcpu/pkg/api/validate.go:43
github.com/pdfcpu/pdfcpu/pkg/api.ValidateFile
        /Users/horstrutter/Documents/go/pdfcpu/pkg/api/validate.go:91
github.com/pdfcpu/pdfcpu/pkg/api.ValidateFiles
        /Users/horstrutter/Documents/go/pdfcpu/pkg/api/validate.go:110
github.com/pdfcpu/pdfcpu/pkg/cli.Validate
        /Users/horstrutter/Documents/go/pdfcpu/pkg/cli/cli.go:27
github.com/pdfcpu/pdfcpu/pkg/cli.Process
        /Users/horstrutter/Documents/go/pdfcpu/pkg/cli/process.go:35
main.process
        /Users/horstrutter/Documents/go/pdfcpu/cmd/pdfcpu/process.go:150
main.processValidateCommand
        /Users/horstrutter/Documents/go/pdfcpu/cmd/pdfcpu/process.go:207
main.commandMap.process
        /Users/horstrutter/Documents/go/pdfcpu/cmd/pdfcpu/cmd.go:143
main.main
        /Users/horstrutter/Documents/go/pdfcpu/cmd/pdfcpu/main.go:56
runtime.main
        /usr/local/go/src/runtime/proc.go:271
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1695
Read: xRefTable failed
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.ReadWithContext
        /Users/horstrutter/Documents/go/pdfcpu/pkg/pdfcpu/read.go:100
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.Read
        /Users/horstrutter/Documents/go/pdfcpu/pkg/pdfcpu/read.go:74
github.com/pdfcpu/pdfcpu/pkg/api.ReadContext
        /Users/horstrutter/Documents/go/pdfcpu/pkg/api/api.go:74
github.com/pdfcpu/pdfcpu/pkg/api.Validate
        /Users/horstrutter/Documents/go/pdfcpu/pkg/api/validate.go:43
github.com/pdfcpu/pdfcpu/pkg/api.ValidateFile
        /Users/horstrutter/Documents/go/pdfcpu/pkg/api/validate.go:91
github.com/pdfcpu/pdfcpu/pkg/api.ValidateFiles
        /Users/horstrutter/Documents/go/pdfcpu/pkg/api/validate.go:110
github.com/pdfcpu/pdfcpu/pkg/cli.Validate
        /Users/horstrutter/Documents/go/pdfcpu/pkg/cli/cli.go:27
github.com/pdfcpu/pdfcpu/pkg/cli.Process
        /Users/horstrutter/Documents/go/pdfcpu/pkg/cli/process.go:35
main.process
        /Users/horstrutter/Documents/go/pdfcpu/cmd/pdfcpu/process.go:150
main.processValidateCommand
        /Users/horstrutter/Documents/go/pdfcpu/cmd/pdfcpu/process.go:207
main.commandMap.process
        /Users/horstrutter/Documents/go/pdfcpu/cmd/pdfcpu/cmd.go:143
main.main
        /Users/horstrutter/Documents/go/pdfcpu/cmd/pdfcpu/main.go:56
runtime.main
        /usr/local/go/src/runtime/proc.go:271
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1695

Hell... I figured it out. The PDF needs to have a minimum size of 512 byte to get recognized by pdfcpu!

%PDF-1.0
%äöüß

1 0 obj
<<
    /Type
    /Catalog
    /Pages 2 0 R
>>
endobj

2 0 obj
<<
    /Type
    /Pages
    /Count 1
    /Kids [ 3 0 R ]
    /MediaBox [ 0 0 5 5 ]
>>
endobj

3 0 obj
<<
    /Type
    /Page
    /Parent 2 0 R
>>
endobj

xref
0 4
0000000000 65535 f 
0000000016 00000 n 
0000000078 00000 n 
0000000180 00000 n 

trailer
<<
    /Root 1 0 R
    /Size 4
>>

%We need padding to reach a minimum filesize of 512 bytes.
%We need padding to reach a minimum filesize of 512 bytes.

startxref
240
%%EOF

Can you do me a favor and share this file for analysis?
Thank you!

Can you do me a favor and share this file for analysis?

Hm? I did. Just copy the code into a file and save it as PDF.

However, the error would have to be renamed. The problem was not my PDF, but that the minimum size in pdfcpu must be 512 bytes - as far as I know, this does not have to be the case.

under512byte.pdf

Ah, I can upload it by drag'n'drop. Here it is.

Nope that's not really how it works.
Many times the issue is how the file was saved, specific offsets and such,
A real file needs to be the starting point for analysis.

Nope that's not really how it works.

Interesting view. This is the first time I've heard that someone prefers a binary file

Many times the issue is how the file was saved, specific offsets and such,

I've been working with handcrafted PDFs for many years and never ran into such issues. But okay...

There are a lot of things that can go wrong just during the initial process of detecting xref sections, xref streams and such, because PDFWriters mess up. That's why.

This is fixed with the latest commit!

This is fixed with the latest commit!

Thank you! I'll test it later. The error totally confused me because I couldn't find anywhere that the PDF file had to have at least 512 bytes and all other interpreters could handle it :-)

Well, thx for discovering this!