natestemen / mathdb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

convert PDFs to data format we can use

Shearyar opened this issue · comments

Datamining a pdf after turning it to latex is difficult. Attached is the result.

sorry, im not seeing any result?

I really don't think we should be shooting to go from pdf to latex either. That's a very complicated mapping in a lot of ways, and it's definitely not unique.

I think we should first try just going pdf -> txt and trying to get the data from there first and see how messy it is. It might even be okay, without having to do all that extra work!

https://github.com/danigm/poppler looks like it might be a good option.

@Shearyar any luck with this?