Giters
bitextor
/
pdf-extract
PDF parser and converter to HTML
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
82
Watchers:
17
Issues:
51
Forks:
14
bitextor/pdf-extract Issues
Install on mac
Updated
5 months ago
incompatible version protobuf
Updated
a year ago
Comments count
1
protobuf does not contain autogen.sh
Updated
a year ago
Comments count
1
Install for Win10
Updated
2 years ago
Comments count
1
Paracrawl Sentence Join tool does not exist
Closed
3 years ago
Comments count
1
Sentence join fails when using a batch file
Closed
3 years ago
Comments count
5
Over 2M of trash files produced while crawling
Closed
4 years ago
Comments count
4
Use stdin for pdftohtml
Closed
4 years ago
Comments count
5
java.lang.Exception: This binary file contains trie with quantization and array-compressed pointers.
Closed
4 years ago
Comments count
9
Show warning if "sentencejoin_model" path or a used file is missing
Closed
4 years ago
Comments count
1
Installation instruction not working
Closed
4 years ago
Comments count
1
Bad redirection of kenlm stderr
Closed
4 years ago
Comments count
2
Spurious warnings about sentenceJoin models
Closed
4 years ago
catch and throw again anti-pattern
Closed
4 years ago
Race condition collecting output
Closed
4 years ago
Deadlock on stderr from pdftohtml
Closed
4 years ago
Deadlock if SentenceJoin writes to stderr
Closed
4 years ago
Comments count
1
Error handling
Closed
4 years ago
mostHeight set but not read
Closed
4 years ago
Semantics of LOADING state
Closed
4 years ago
Sorting to get the maximum?
Closed
4 years ago
Why can't bold have tags inside it?
Closed
4 years ago
Comments count
2
fontweight set but not read
Closed
4 years ago
Link pattern does not allow nested tags
Closed
4 years ago
missing script and model as specified in config file
Closed
4 years ago
Comments count
1
pdf-extract timeout option
Closed
4 years ago
Comments count
3
Run on CSD3
Closed
4 years ago
Comments count
3
Branch poppler-rewrite does mark all sentences as lang="en" if protobuf not found
Closed
4 years ago
Comments count
3
Make dependency installation optional in poppler-rewrite setup.sh
Closed
4 years ago
Comments count
3
PDFExtract.json should be passed as an argument
Closed
4 years ago
Comments count
1
Exception in thread "main" java.lang.UnsatisfiedLinkError: /tmp/native-forcld3-350533629840224/libforcld3.so: libprotobuf.so.9: cannot open shared object file: No such file or directory
Closed
5 years ago
Comments count
9
Branch poppler-rewrite does not extract any text
Closed
5 years ago
Comments count
4
Document new dependencies
Closed
5 years ago
Comments count
1
Sentence rejoining
Closed
5 years ago
Comments count
3
pdf-extract in warc2htmlwarc uses >1 processor
Closed
5 years ago
Comments count
5
stuck warcs
Closed
5 years ago
Comments count
8
Exception always
Closed
5 years ago
Comments count
3
common.print use
Closed
5 years ago
Comments count
1
Convert pdf to html fail.: null
Closed
5 years ago
Comments count
2
Use git properly
Closed
5 years ago
Comments count
3
Hash table of floats increasing by 0.5?
Closed
5 years ago
Comments count
1
Are you using a HashMap on sequential integers keys?
Closed
5 years ago
Comments count
2
Indentation
Closed
5 years ago
Comments count
1
ArrayList appears to exist solely to have its minimum taken
Closed
5 years ago
Comments count
1
How many times do I have to parse the same number from a string?
Closed
5 years ago
Comments count
1
com.java package
Closed
5 years ago
Comments count
1
Hand-crafted spin loops for thread completion?
Closed
5 years ago
Comments count
1
Is catching an exception to rethrow really adding value here?
Closed
5 years ago
Comments count
1
Pass PDF document for extraction in RAM
Closed
5 years ago
Comments count
16
Regression -- mangled text
Closed
5 years ago
Comments count
5
Previous
Next