jdumas / autobib

Automatic bibtex generation from a file list, auto formatting, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is very nice! And a very small issue.

yangyi02 opened this issue · comments

I recently found this repo and tried it. Very nice! It converts my pdf papers into a bibtex file which could potentially save me a lot of time for writing papers! Thanks for the work.

One little issue that you may consider, I find the bibtex format is a little different from google scholar's original version.

If you go to google scholar, search for a paper, let's say "High accuracy optical flow estimation based on a theory for warping", and press "import into BibTex" link, you will see

@Article{brox2004high,
title={High accuracy optical flow estimation based on a theory for warping},
author={Brox, Thomas and Bruhn, Andr{'e}s and Papenberg, Nils and Weickert, Joachim},
journal={Computer Vision-ECCV 2004},
pages={25--36},
year={2004},
publisher={Springer}
}

If I use the autobib.py to generate, the result looks like below:

@Article{Brox:2004:HAO,
author = {Brox, Thomas and Bruhn, Andrés and Papenberg, Nils and Weickert, Joachim},
eprint = {https://scholar.google.comhttp://www.weizmann.ac.il/mathusers/vision/courses/2006_2/papers/optic_flow_multigrid/brox_eccv04_of.pdf},
file = {:High Accuracy Optical Flow Estimation Based on a Theory for Warping.pdf:PDF},
journal = {Computer Vision-ECCV 2004},
link = {http://www.springerlink.com/index/87a4ckjqm92lp3j9.pdf},
pages = {25--36},
publisher = {Springer},
title = {High Accuracy Optical Flow Estimation Based on a Theory for Warping},
year = {2004}
}

So you can see the difference in the first line: where google scholar is "brox2004high" and autobib is "Brox:2004:HAO". This is not a big problem, but do you think this can be also formatted to google scholar's style? Or this is what you intentionally want to do?

Hi @yangyi02. Glad you like this program. It is very much in alpha stage at this point, and I plan to do a fair amount of code refactoring and API changes to make it easy to use by everyone.

You raised several points that need commenting:

  1. With the option autobib -cg, the bibtex file is first fetched from Crossref, and then unmatched entries are searched via Google Scholar. I found results from Google Scholar to be very inaccurate in general, so I prefer querying from Crossref first (Crossref is an organism that registers DOIs). Results with Crossref are similar to what you can find on the doi2bib website. Note that right now the search via Crossref is somewhat broken (see #170 for more details), so I need to fix that first. Research via Google Scholar is also a bit more tricky because you need to use a random wait before each query, otherwise Google might block you.
  2. The bibtex key is generated consistently from each record by the function gen_bibkey() in nomenclature.py. You can modify this function to format the key to your liking. Eventually i would like to introduce some formatting options to let the user define keys however they like.
  3. There is still some weird stuff happenning with the accents sometimes. I haven't had the time to investigate the issue properly, and right now the workaround is to hardcode all your weird character substitutions in the file nomenclature.py (you can have a look at the existing rules). Again, this is something I plan to work on in the future.

Awesome! Glad to hear about the future plans, I will keep following your work.

And thanks for introducing these details. I will take a look at nomenclature.py.

I also find using Crossref is somewhat broken so I only use Google Scholar during the first testing. Hope this can be fixed soon.

Again, very nice work, I will introduce this to my colleagues when they need this. Thanks.

The major task on my todolist is indeed to fix the Crossref search. Once this is done, I hope I will be able to spend more time fixing the minor stuff I described, but it may be a few months before things change ... So I hope you will be able to use it as is in the meantime (apart from the problem with Crossref, it is pretty much usable, I referenced more than 700 papers with it during my PhD).

Of course, I will give it a try. And if I find bugs or make modifications, I will let you know. :)