ethanhwang1024 / pdf-parser

A parser for pdf that can extract paragraphs, tables and pictures (PDF解析器)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pdf-parser

java语言的针对PDF的段落,表格和图片提取器,可以同时提取并生成html文件或json文件
Paragraph, table and image extractor for PDF, can extract and generate html or json files

For table:


For imgs:

无边框表格和分栏pdf暂时不支持
Borderless tables and split-column pdfs are not supported for extraction

About

A parser for pdf that can extract paragraphs, tables and pictures (PDF解析器)

License:Apache License 2.0


Languages

Language:Java 100.0%