cpitclaudel / biblio.el

Browse and import bibliographic references from CrossRef, DBLP, HAL, arXiv, Dissemin, and doi.org from Emacs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hook for doi-insert-bibtex (or rather, hook for auto-editing a bib entry)?

jowens opened this issue · comments

I'd like to write a function that edits a bib entry when it's called with a keystroke (and maybe it runs automatically when I run doi-insert-bibtex). Does biblio support this?

I'll have to learn enough bibtex to make this work, but definitely I would like to replace something like month = {Aug} with month=aug, and I'd also probably like to delete a url field if that field just duplicated the info in the doi field. Tips appreciated!

Does biblio support this?

Yup, definitely. You can extend what biblio-insert-bibtex does using add-function on biblio-cleanup-bibtex-function

I'll have to learn enough bibtex to make this work, but definitely I would like to replace something like month = {Aug} with month=aug

Sounds quite reasonable. Bibtex-mode is very flexible, so I wouldn't be surprised if it had a way to do that already. Look into bibtex-entry-format, which is set to biblio--bibtex-entry-format in biblio-cleanup-bibtex-function

Hi Clément, I would super appreciate a little advice here. I hope that ~5 minutes from you will save me a boatload of time, since I've been spinning my wheels a bunch over the last few days trying to figure it out.

I am relatively familiar with Lisp as a language, but all the stuff atop that used for getting actual work done in Emacs has been a challenge. I am trying to trace the call stack, but what I've found (M-x trace-function) is only vaguely helpful since it just says individual calls and not the stack. What I want instead is "I call M-x doi-insert-bibtex and see the entire call stack that results from that". One of the challenges has been that some of these calls are not exactly human-readable:

(biblio-url-retrieve "http://doi.org/10.1145/3350755.3400231" #[257 "p\303\304\"\216\301\206�\305 \210\30612�\307!\211\203#�\310\302\"\210\300!\202,�\311 \210e`|\210\300 \2620\2028�\312\313\"\262)\207" [#[256 "\301\300\"\207" [#[257 "\211\203�\301!\207\302\300\301\"\207" ["10.1145/3350755.3400231" #[257 "\301\300\"\207" [#<buffer compbio.bib> biblio-doi--insert] 4 "

I have also had trouble reproducing the data that comes back from (I think) crosscite, because a plain URL gives me back an "OK". I think I have to also send an "Accept" header, but don't know how to do that in a web browser, and Python + requests has not been fruitful so far (but if if that's the right approach please let me know).

May I trouble you to answer what I hope are a few simple questions?

  • What is the function that returns the bibliographic data from the online source and how can I actually see that data? (In a debugger, in another buffer, etc.)
  • Can you describe "add-function"? Is this "advice", which I am to understand modifies an existing function? (A pointer to someplace that does this would be super helpful.) Is this a "hook", which I am to understand is called at a particular time (I have a bunch in my init file)? But I don't know how to do either with arguments.
  • Am I correct in saying biblio--bibtex-entry-format is a list of transformations and I can somehow add to those if I define a new transformation? I am having trouble grokking the relationship between biblio--bibtex-entry-format and biblio-cleanup-bibtex-function (which I expected would be a function but is instead a variable) and biblio--cleanup-bibtex.

The simplest thing to do is probably a two-liner that says "here is how you add a new transformation that, say, adds a new bib field to every bib returned by doi-insert-bibtex, or deletes a field, or changes the page numbers from a--b to b--a.

Sorry I'm slow! Your work makes my life better, thanks.

Sorry I'm slow!

No worries :) The code is pretty complicated because it uses callbacks. Answer below:

What is the function that returns the bibliographic data from the online source and how can I actually see that data? (In a debugger, in another buffer, etc.)

In a lambda in biblio-doi-insert-bibtex:

(defun biblio-doi-insert-bibtex (doi)
  "Insert BibTeX entry matching DOI."
  (interactive "MDOI: ")
  (let ((target-buffer (current-buffer)))
    (biblio-doi-forward-bibtex
     (biblio-cleanup-doi doi)
     (lambda (result)
       (biblio-doi--insert
        (biblio-format-bibtex result biblio-bibtex-use-autokey)
        target-buffer)))))

You can add a print inside this function to see the result. Notice in particular the call to biblio-format-bibtex.

Can you describe "add-function"? Is this "advice", which I am to understand modifies an existing function? (A pointer to someplace that does this would be super helpful.) Is this a "hook", which I am to understand is called at a particular time (I have a bunch in my init file)? But I don't know how to do either with arguments.

In the past Emacs had "advice" (functions tucked onto other functions) and "hooks" (lists of functions, either called on by one to produce multiple effects or called one by one until one returns non-nil to compute a value).

Now Emacs is moving towards "function variables": instead of a hook there is a single function, and you use advice (through add-function) to attach additional functionality to it.

Am I correct in saying biblio--bibtex-entry-format is a list of transformations and I can somehow add to those if I define a new transformation? I am having trouble grokking the relationship between biblio--bibtex-entry-format and biblio-cleanup-bibtex-function (which I expected would be a function but is instead a variable) and biblio--cleanup-bibtex.

  • biblio-format-bibtex is responsible for the cleanup. You don't have to touch it, I think. It sets up a temporary bibtex buffer and then calls biblio-cleanup-bibtex-function
    • biblio-cleanup-bibtex-function does the real cleanup work. It can be set to any function you like. Its default value is biblio--cleanup-bibtex.
  • biblio--cleanup-bibtex uses biblio--bibtex-entry-format internally; it's a lightweight wrapper around bibtex-clean-entry.
  • biblio--bibtex-entry-format overrides bibtex-entry-format within biblio--cleanup-bibtex.

The simplest thing to do is probably a two-liner that says "here is how you add a new transformation that, say, adds a new bib field to every bib returned by doi-insert-bibtex, or deletes a field, or changes the page numbers from a--b to b--a.

A bit more than two lines, but I've done that just now in the README: https://github.com/cpitclaudel/biblio.el#adding-custom-bibtex-filters . It would be much appreciated if you could give each of these a try, since I haven't had much time to test them.

First, this is awesome! Second, can you tell me just a little more about what you're doing under the hood?

What I thought you were doing was getting some sort of structured data from the publisher/Crossref (e.g., https://api.crossref.org/v1/works/10.1145/3448016.3452841) and then formatting it yourself. But now I am thinking what you are doing is getting Bibtex straight from the source.

If it's the latter, then where I'm having difficulty is reproducing exactly what you're requesting from the publisher. Like, what is the URL that I can visit to see what your input is? I am thinking at this point that you are just visiting doi.org/XXX.YYY but that you're doing something alongside that, and I'm guessing it's perhaps adding the Accept: header and it returns bibtex? Specifically, I'd like to know where I go if I do doi-insert-bibtex, and I think that's just a doi.org address perhaps with additional headers (and I think that resolves to Crossref).

The proximate reason I'm asking is that I want to know what the input is for the functions I'd like to write. If it's only bibtex, then I'm expecting I can't do as much as if it's, say, the JSON linked above. (For instance, many bibtools return only a "title" and not a "subtitle" from Crossref, whereas the actual publication title is "Title: Subtitle". If I only get to see the returned bibtex, I will never see the subtitle, and can't write a fix to put it in.)

And if I know that references come through Crossref, I can complain to them if they're the ones that actually write the bibtex.

The above is mentally inconsistent, so thanks in advance for clearing it up for me.

https://crosscite.org/citeproc/format?doi=10.1145/3350755.3400231&style=bibtex&lang=en-US gives me back the two-character not-particularly-help message OK.

Each of your four new functions in the readme works as advertised. Thank you for those great examples!

Each of your four new functions in the readme works as advertised.

Sweet, thanks!

Second, can you tell me just a little more about what you're doing under the hood?

Yep, certainly.

There are two places in the code in which we want to download a bibtex entry: when we insert a search result, and when we call doi-insert-bibtex directly.

The former uses a backend-specific method to fetch the bibtex; each backend defines a forward-bibtex function which is called with a search result. The latter tries two sources: the DOI's publisher website, and crosscite if that doesn't work.

All methods, onces they get a bibtex string, then clean it up and insert it, which ends up calling biblio-bibtex-cleanup-function.

IIRC we're discussing the biblio-doi-insert-bibtex case here, so let me dive into that one (and correct me if I'm misremembering, I admit I didn't scroll back up to read ^^). To understand what's going on you have to know that the code uses continuation-passing style for asynchronicity; that it to say, most functions also take a callback (another function) to run once the download task completes.

  • Start in biblio-doi-insert-bibtex
  • It calls biblio-doi-forward-bibtex with a callback (lambda (result) … that cleans and inserts the downloaded bibtex.
  • -doi-forward-bibtex calls biblio-doi--forward-bibtex-dx, which makes a first attempt at downloading from dx.doi.org. Note the new callback (lambda (result) (if result (funcall forward-to result …:
    • if dx.doi responds with some bibtex code it just calls the final callback
    • otherwise it tries again with biblio-doi--forward-bibtex-crosscite
  • biblio-doi--forward-bibtex-dx and biblio-doi--forward-bibtex-crosscite both just try to fetch a URL, computed with biblio-doi--dx-url and biblio-doi--crosscite-url, with a special header indicating that they expect bibtex computed by biblio-doi--set-mime-accept.

We can see this in action by setting url-debug to t:

http -> Contacting host: doi.org:443
http -> Request is: 
GET /10.1145/1159890.806466 HTTP/1.1
MIME-Version: 1.0
Connection: keep-alive
Extension: Security/Digest Security/SSL
Host: doi.org
Accept-encoding: gzip
Accept: text/bibliography;style=bibtex, application/x-bibtex
User-Agent: URL/Emacs Emacs/28.2.50 (X11; x86_64-pc-linux-gnu)
http -> Parsed HTTP headers: class=3 status=302
http -> Contacting host: api.crossref.org:443
http -> Request is: 
GET /v1/works/10.1145%2F1159890.806466/transform HTTP/1.1
MIME-Version: 1.0
Connection: keep-alive
Extension: Security/Digest Security/SSL
Host: api.crossref.org
Accept-encoding: gzip
Accept: text/bibliography;style=bibtex, application/x-bibtex
User-Agent: URL/Emacs Emacs/28.2.50 (X11; x86_64-pc-linux-gnu)


http -> Activating callback in buffer ( *http api.crossref.org:443*)

… so here we see that dx.doi redirected up to crossref. I don't have an example at hand right now where dx.doi fails (maybe it always redirects to crossref now and didn't use to?)

I'm just pasting this here for posterity (if someone wants to get the same response in Python):

$ python
Python 3.11.1 (main, Jan 12 2023, 08:19:11) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> response = requests.get(
... "https://doi.org//10.1145/1159890.806466",
... headers={'Accept': 'text/bibliography;style=bibtex, application/x-bibtex'})
>>> response.text
' @article{1981, title={EMACS the extensible, customizable self-documenting display editor}, volume={2}, ISSN={0737-819X}, url={http://dx.doi.org/10.1145/1159890.806466}, DOI={10.1145/1159890.806466}, number={1â\x80\x932}, journal={ACM SIGOA Newsletter}, publisher={Association for Computing Machinery (ACM)}, author={Stallman, Richard M.}, year={1981}, month={Apr}, pages={147â\x80\x93156} }\n'

@cpitclaudel Thank you for such a thoughtful and detailed response. I will continue my learning!

My pleasure! I will close this issue for now, but feel free to reopen it if you run into issues! (And if you write cool biblio.el extensions, please do post the here!)