New Feature: internal links to tables and figures and headers

Question

New Feature: internal links to tables and figures and headers

GeraldLoeffler opened this issue 11 years ago · comments

It's currently possible to include internal links to sections. I'd like to propose a similar feature for links to figures/images and tables.

It may make sense to provide this feature only if the figure/image or table that is being linked to has a caption. In that case Pandoc can today automatically generate a number for the figure or table and include it in the caption, e.g. "Figure 15".

At the most basic, the text of the link would be provided by the user, as is currently the case for links to sections.

Of course it would be very convenient if the automatically generated number for the figure or table would also be used for the text of the link, e.g. "as can be seen in Figure 15, blah", where "Figure 15" would be the internal link whose text is auto-generated from the figure it points to.

Shaun Jackman commented 9 years ago

👍

Maxim K · Answer 1 · Fri May 03 2013 21:37:41 GMT+0800 (China Standard Time)

That would be lovely indeed. In academic writing it is quite often necessary, and while automatic numbering of figures and tables is nice, it really should be linked to what is in the text.

Jakob Voß · Answer 2 · Wed May 22 2013 00:53:58 GMT+0800 (China Standard Time)

One could use the figure caption as link target, similar to links to captions:

![la lune](lalune.jpg "Voyage to the moon")

...is shown in figure [la lune]...

And/or without automatic generation of link text:

...is shown in [the figure](#la-lune)...

See also issue #615 on automatic numbering of figures and tables in HTML output.

Hinrich B. Winther · Answer 3 · Tue Jul 23 2013 20:47:27 GMT+0800 (China Standard Time)

I concur. However, @nichtich suggestion breaks the current syntax. Maybe a less intrusive approach would be a syntax like:

![Voyage to the moon](lalune.jpg){la lune}

It would be great to be able to reference figures. As @nichtich said: it is nearly a requirement in academic writing.

John MacFarlane · Answer 4 · Tue Jul 23 2013 22:21:07 GMT+0800 (China Standard Time)

A more consistent format would be

![Voyage to the moon](lalune.jpg){#lalune}

See the current attribute format for headers.

Hinrich B. Winther · Answer 5 · Sat Jul 27 2013 19:00:26 GMT+0800 (China Standard Time)

indeed, that is a more consistent format.

About the implementation:
I see 2 major ways to implement this feature:

Emulate something like the latex figure environment and output the figure as image with plain text underneath. Very much like figures are handled now in docx format, except that you put "Figure 1:" at the beginning. This would be the most portable way and should be fairly easy to implement in all format writers. However, than pandoc has to keep track of the references itself for cross referencing.
Implement it the "proper" way in the corresponding format writer. Sticking with the docx example: Adding a caption to the image and then cross reference it in the text.

Can anybody (@jgm ?) make an educated guess on how much work either of the solutions will be?

Aaron O'Leary · Answer 6 · Tue Sep 10 2013 19:43:40 GMT+0800 (China Standard Time)

I agree - this is essential for academic writing. I wish I knew Haskell!

The current way around this, in the mailing list discussion, is functional but clumsy.

Would this mean using \autoref in the latex? Then from markdown input:

...is shown in [the figure](#la-lune)...

you would get the latex output:

...is shown in \autoref{la-lune}...

Aaron O'Leary · Answer 7 · Tue Sep 10 2013 23:12:48 GMT+0800 (China Standard Time)

sort of relevant pr: #509

Mailing list discussion:

Figure specific
- How to reference a figure in pandoc markdown?
- Referring to figures (and other 'objects')
More general

Giuseppe C · Answer 8 · Thu Sep 19 2013 22:45:42 GMT+0800 (China Standard Time)

![Voyage to the moon](lalune.jpg){#lalune}

I just tried to write something like

Some text

![Bla blah](pic.png)   {#something}

Some other text

I was surprised that did not work. It showed the image without caption, and a raw "{#something}" afterwards.

I assumed curly braces were for assigning attributes to anything... :D

CFCF · Answer 9 · Sun Nov 17 2013 00:31:06 GMT+0800 (China Standard Time)

A workaround with numbered example lists is added to #904

For my purposes, this method works well with docx.

Oliver Dew · Answer 10 · Sat Mar 29 2014 18:52:48 GMT+0800 (China Standard Time)

I agree that being able to reference figures is essential to academic writing. The workarounds linked to above aren't really satisfactory, in my opinion

![Voyage to the moon](lalune.jpg){#lalune} would be perfect

Sarah Brofeldt · Answer 11 · Fri Apr 25 2014 04:52:21 GMT+0800 (China Standard Time)

Similar syntaxes would be very interesting for equations, too. In fact, why not adopt a completely general syntax? It would be especially nice if it could carry over to LaTeX bits, once you have to bail out and use say \begin{align} and friends.

Frederik Elwert · Answer 12 · Fri Apr 25 2014 11:33:46 GMT+0800 (China Standard Time)

I have sympathy for the numbered example list approach, mainly for two reasons: Firstly, what we want are not really links but references, and secondly, the use case for numbered example lists is already close to, e.g., numbered equations. The example from the docs is close to a typical use case for figure references:

(@good)  This is a good example.

As (@good) illustrates, ...

This mechanism can already be used for figure references, as CFCF pointed out:

![Figure (@primitive_hut): The primitive hut](Illustrations\primitive_hut.png)

As can be seen in Figure (@primitive_hut), huts may be primitive.

# Index of Figures

(@primitive_hut) *Primitive hut* from the frontispiece of Marc-Antoine Laugier’s 1755 second edition of *Esssay on Architecture*, illustration by Charles-Dominique-Joseph-Eisen.

However, there are a few drawbacks:

You currently need an index of figures, since example lists require the (@id) to be at the beginning of a line at least once.
You have to add the Figure (@id): bit to the caption manually.
This breaks LaTeX/PDF output, since LaTeX adds a “Figure” prefix itself.

Thus, a proper referencing scheme would need a bit additional thinking. Especially, PDF and HTML output should work alike, probably by pandoc adding the Figure: bit to HTML output, while leaving it to LaTeX in the PDF case. Additionally, this should also work for referencing numbered sections, like in see chapter (@mychapter).

Bartosz Telenczuk · Answer 13 · Sat Apr 26 2014 06:30:51 GMT+0800 (China Standard Time)

Your workaround works as suggested, but I had to remove the parentheses when referencing the label, otherwise they were rendered in the output. After this modification my example looks like this:

Figure @figure is about being in time

![Figure @figure: Cubes](cubes.png)

(@figure) Figure 1

To remove the automatic numbering in LaTex (Figure 1:, etc.) you can add to the template:

\usepackage[labelformat=empty]{caption}

After rendering to pdf this produces the following output:

Johan van der Knijff · Answer 14 · Thu May 08 2014 23:35:08 GMT+0800 (China Standard Time)

Just came across this issue as well and ended up here. I'm also really in favor of support for the syntax suggested by @jgm above:

![Voyage to the moon](lalune.jpg){#lalune}

Especially since this is the standard way of dealing with this in PHP Markdown Extra:

http://michelf.ca/projects/php-markdown/extra/#spe-attr

mangecoeur · Answer 15 · Thu Jul 10 2014 22:46:59 GMT+0800 (China Standard Time)

Has there been any developments on this? It also seems to me that @jgm suggestiong

![Voyage to the moon](lalune.jpg){#lalune}

is the most consistent internally and with other tools. What would need to happen for this to be implemented?

Edward Abraham · Answer 16 · Wed Jul 23 2014 09:34:52 GMT+0800 (China Standard Time)

I was wanting to add support for this addition to the syntax. When trying to replicate papers using markdown for the scholmd project, this is the feature that stands out as most needed by Pandoc . In short this can be addressed through the general use of {#lalune} for labelling elements, and of @lalune for referencing the number of the corresponding element. The syntax (@) may be used to number elements that are otherwise unnumbered.

A general syntax for labels {#lalune}, that are associated with the preceding element would allow for any element to be labelled (paragraphs, equations, tables, etc.). By associating the label with an element in the abstract syntax tree, the properties of the element would be available when the reference was made, and so they can be numbered appropriately. This syntax is already used in one context in Pandoc (section heading labels), and is used by PHP Markdown extra. For elements that don't have numbers, such as equations, the syntax (@) may be used (from the example_lists extension). So an equation would be numbered and labelled as $$ F = G{m_1 m_2 \over r^2$$ (@) {#gravity}. (An alternative could be to use the example_lists extension style and number and label it in one go as $$ F = G{m_1 m_2 \over r^2$$ (@gravity). There are clearly some details and edge cases to be thought through here.)

When the document is rendered, Pandoc would associate a number with each labelled element, based on its type, and its position in the document. This logic would need to be carried out in Pandoc, so that it was available to the range of back-end writers (including HTML). The philosophy would be similar to Pandoc-citeproc, which carries out its own formatting of citations, rather than delegating to writers that support this approach (such as bibtex for latex). An option would to have this behaviour depend on the backend (so that it in latex it inserts \label and \ref commands), but elsewhere it may insert calculated numbers, if referencing is not supported by the backend. This has the advantage that it will work easily in contexts where only a fragment of the document is rendered. If pandoc is calculating the numbering, a syntax would be needed for specifying the start numbers in a fragment that wasn't being compiled in stand alone mode.

Labelled elements may be linked to, with the @ symbol being used to indicate the reference. So
a trip to [the moon](@lalune) would be an anchor link to the element labelled {#lalune}. In this case the text is rendered as a trip to the moon.

The syntax The moon is illustrated in Figure @lalune may be used to insert the number of the referenced element, as well as a link to that element, with the text rendered as The moon is illustrated in Figure 1. This follows the syntax used for referencing numbered lists with the example_lists extension.

A further syntax could be to use square brackets [@lalune] to insert the type and number of the element that is referenced, similar to the behaviour of latex's \autoref command. So, the moon is illustrated in [@lalune] would be rendered as the moon is illustrated in Figure 1 (including a link to the anchor). To implement this feature would require some localisation or customisation capability, so that the word used to describe the element could be specified. In its simplest, this customisation could be put in the YAML header, with for example figure_label: Fig. if the style required a shortened label. The syntax for the reference, [@lalune], is the same as is used by the pandoc-citeproc library, so it would be overloading that usage to implement a self-citation. Pandoc would have the information on the context that is needed to either format it as a citation, or as a reference, assuming that there was no collision between the labels and the citation keys.

Maxim K · Answer 17 · Wed Jul 23 2014 14:07:54 GMT+0800 (China Standard Time)

@edwardabraham It must be pointed out that the syntax [@lalune] is already used in pandoc for bibliographical citations.

Tim T.Y. Lin · Answer 18 · Wed Jul 23 2014 15:44:30 GMT+0800 (China Standard Time)

@kovla @edwardabraham I don't see why #lalune couldn't be used also as a reference to the defined symbol. With this scheme [the moon](#lalune) could be a normal text link to the figure, while [#lalune] could do the numbered reference thing as mentioned. In fact, I have a custom build of Pandoc that does exactly this.

Edward Abraham · Answer 19 · Thu Jul 24 2014 13:19:20 GMT+0800 (China Standard Time)

@kovla The idea was to deliberately overload the [@lalune] syntax that is used for citations. The reason being that references to another part of the document are similar to citations (in essence they are self-citations). This has the benefit of avoiding introducing additional syntax. During processing the filter would identify which element the label was attached to, and use that information to appropriately format the text that is inserted into the document.

@evitaerc I prefer using the @ symbol, as it extends functionality that is already used by example_lists. Are you able to structure your pandoc build so that it may be implemented as a filter?

Note that this extension is [@lalune] a convenience and is not necessary, provided that the numbers are able to be accessed through the @lalune method.

Tim T.Y. Lin · Answer 20 · Thu Jul 24 2014 13:40:55 GMT+0800 (China Standard Time)

@edwardabraham I tried the @ approach as well, but internal feedback in our lab showed that people get confused by what is a citation and what is an internal reference even when editing. The conclusion was that the mental model of keeping # for internal refs and using @ for external refs is the simplest to grok.

In fact, no one out of ten or so people have used example_lists (we are mostly writing extended abstracts and journal papers in the field of physics/engineering/applied math). When encountering a "list of scenarios" situation, the content was so static that people simply used literal numbers without issue.

Unfortunately the internal reference mechanism required heavy modification of the Markdown reader (additional state must be kept during the parsing process) and a custom AST, so I can't conceive of a filter implementation in the near future.

Jaremy Creechley · Answer 21 · Fri Jul 25 2014 00:29:08 GMT+0800 (China Standard Time)

Personally, the # symbol and the {#label} syntax would be easier to understand and use. In my mind citations and internal references follow very distinct "mental models". Many academic papers use distinct numbering for figures, tables, and equations but the proposed syntaxes don't appear to have a way to support distinct numberings by type. It would be an important design criteria (I only got to skim the comments, hopefully its not a redundant suggestion).
@edwardabraham You mentioned the scholmd. Is it currently just a repository of ideas or have they implemented any of the academic markdown features?
@evitaerc Great work! Is it possible for you to propose submitting the changes to the pandoc project or alternately creating a github fork to allow others to experiment?

mangecoeur · Answer 22 · Fri Jul 25 2014 00:36:04 GMT+0800 (China Standard Time)

+1 for use of # symbol for internal references. But it's really important that the references can distinguish between equations, figures, and tables to have distinct numbering sequences.

There are two approaches to my mind

make the "thing referenced" explicit in the tag, for instance using namespaces like #eqn.maxwells, #fig.hockeystick. Pandoc would have to track the objects in each namespace and format the references appropriately
depend on pandoc's parser to know what type of thing is referenced and handle appropriately. So if you tag an image and then use a # reference pandoc automatically treats it as a "fig" reference, if you embed latex formula it because an equation reference etc. This would be cool but i suspect it would be a) complex and b) fragile - you get issues for instance if someone wants to embed an image for a formula.

Benct Philip Jonsson · Answer 23 · Fri Jul 25 2014 03:02:15 GMT+0800 (China Standard Time)

I agree very much that internal references, citations and references to numbered examples are different, and @ is already too overloaded by being used for both the latter. The problem with #reference which I can see is that it might get confused with a level one header since atx headers don't requre a space after the hashmarks as far as I know. I think {#anchor} and [#reference] would be good because then the id could be any valid HTML id including LaTeX-y things like {#img:la-lune}. As for doing different things with different anchors that is probably best left to filters.

Note that you could already do something like <span id="img:la-lune">![Voyage to the Moon](lalune.jpg)</span> and then [Voyage to the Moon](#img:la-lune) and get anchors/labels and links which work in both HTML and LaTeX. If you don't want hyperlinks in your LaTeX, numbered images in your HTML etc. that can be done with filters, e.g. replacing links with URLs like the one in my example with a span containing the link text plus a raw LaTeX string \ref{img:la-lune}.
(See
http://johnmacfarlane.net/pandoc/try/?text=%3Cspan+id%3D%22img%3Afoo-bar%22%3E!%5BA+bar+frequented+by+foos%5D(foo-bar)%3C%2Fspan%3E%0A%0A%5BThe+foo+bar%5D(%23img%3Afoo-bar).&from=markdown&to=latex)

It would be nice to have a less ugly syntax, but note that you would need to turn references into links when generating HTML, while it is rather trivial to have a filter do the opposite when generating LaTeX. Note also that you could abuse the link title, having the filter leave links with a title alone so that you get a hyperlink. It would even be easy to use spans with certain classes and/or attributes in the source and have one filter which turns them into references when generating LaTeX and one which turns them into links when generating HTML. I'm going on holiday tomorrow but I would be happy to write those filters when I get back! :-)

Gabor Szarnyas · Answer 24 · Mon Aug 11 2014 04:38:15 GMT+0800 (China Standard Time)

My goal is to produce HTML and PDF outputs from the same Markdown file, with the PDF containing references that can be printed (e.g. "See figure 1") . I found a cumbersome workaround inspired by @bpj's idea. Note that it does not work with pandoc 1.12.2.1 found in the Ubuntu APT repository, so I installed 1.12.4.2 from Cabal instead.

The following Markdown code:

<span id="pic.jpg"></span>

![A bar frequented by foos](pic.jpg)

[The foo bar](#pic.jpg).

Produces the following HTML code:

<p><span id="pic.jpg"></span></p>
<div class="figure">
<img src="pic.jpg" alt="A bar frequented by foos" /><p class="caption">A bar frequented by foos</p>
</div>
<p><a href="#pic.jpg">The foo bar</a>.</p>

This works reasonably well: the empty paragraph is not displayed so the link will navigate you to the image.

The generated LaTeX code is the following:

\label{pic.jpg}{}

\begin{figure}[htbp]
\centering
\includegraphics{pic.jpg}
\caption{A bar frequented by foos}
\end{figure}

\hyperref[pic.jpg]{The foo bar}.

The generated \label is of no use. Instead, we should add the label after the caption has been inserted. To do this, we save the filename of the current figure to a variable (\currentfigure) by redefining the \includegraphics command. We then redefine the \caption command to insert the caption and add the label from the variable. We also have to redefine the \hyperref command to \autoref.
To achieve this, we edit the LaTeX template file's preamble:

\let\oldincludegraphics\includegraphics
\renewcommand*{\includegraphics}[1]{\oldincludegraphics{#1}\def\currentfigure{#1}}
\let\oldcaption\caption
\renewcommand*{\caption}[1]{\oldcaption{#1}\label{\currentfigure}}
\renewcommand*{\hyperref}[2][\ar]{%
  \def\ar{#2}
  #2 (\autoref{#1})}

In the final PDF document, the caption and the reference look like this. "Figure 1" is also a hyperlink.

Figure 1: A bar frequented by foos

The foo bar (Figure 1).

While I think this workaround can be used in practice, it would be nice to have a syntax for inserting cross references in a simpler and less error-prone way.

Benct Philip Jonsson · Answer 25 · Mon Aug 11 2014 08:34:29 GMT+0800 (China Standard Time)

Note that the following works identically without the need to (re)define any LaTeX commands and without generating an 'empty' paragraph (including the fact that at least in my PDF reader the link jumps to the caption rather than to the top of the image):

<div id="fig:lalune">
![A voyage to the moon\label{fig:lalune}](lalune.jpg)

</div>

[The voyage to the moon](#fig:lalune).

It is slightly less elegant in that you have to specify the id/label twice, and slightly more elegant in that you avoid the empty span element and the resulting empty paragraph.

Note that the blank line inside the div is necessary in order to make pandoc see the div contents as a paragraph, and thus to get the image inside a figure environment. With the blank line the resulting LaTeX is like this:

\begin{figure}[htbp]
\centering
\includegraphics{lalune.jpg}
\caption{A voyage to the moon\label{fig:lalune}}
\end{figure}

but without it it is just like this:

\includegraphics{lalune.jpg}

Den 2014-08-10 22:38, Gábor Szárnyas skrev:

My goal is to produce HTML and PDF outputs from the same Markdown file, with the PDF containing references that can be printed (e.g. "See figure 1") . I found a cumbersome workaround inspired by @bpj's idea. Note that it does not work with pandoc 1.12.2.1 found in the Ubuntu APT repository, so I installed 1.12.4.2 from Cabal instead.

The following Markdown code:
<span id="pic.jpg"></span>

![A bar frequented by foos](pic.jpg)

[The foo bar](#pic.jpg).
Produces the following HTML code:
<p><span id="pic.jpg"></span></p>
<div class="figure">
<img src="pic.jpg" alt="A bar frequented by foos" /><p class="caption">A bar frequented by foos</p>
</div>
This works reasonably well: the empty paragraph is not displayed so the link will navigate you to the image.

The generated LaTeX code is the following:
\label{pic.jpg}{}

\begin{figure}[htbp]
\centering
\includegraphics{pic.jpg}
\caption{A bar frequented by foos}
\end{figure}

\hyperref[pic.jpg]{The foo bar}.
The generated \label is of no use. Instead, we should add the label after the caption has been inserted. To do this, we save the filename of the current figure to a variable (\currentfigure) by redefining the \includegraphics command. We then redefine the \caption command to insert the caption and add the label from the variable. We also have to redefine the \hyperref command to \autoref.
To achieve this, we edit the LaTeX template file's preamble:
\let\oldincludegraphics\includegraphics
\renewcommand*{\includegraphics}[1]{\oldincludegraphics{#1}\def\currentfigure{#1}}
\let\oldcaption\caption
\renewcommand*{\caption}[1]{\oldcaption{#1}\label{\currentfigure}}
\renewcommand*{\hyperref}[2][\ar]{%
   \def\ar{#2}
   #2 (\autoref{#1})}
In the final PDF document, the caption and the reference look like this. "Figure 1" is also a hyperlink.
Figure 1: A bar frequented by foos

The foo bar (Figure 1).
While I think this workaround can be used in practice, it would be nice to have a syntax for inserting cross references in a simpler and less error-prone way.

Reply to this email directly or view it on GitHub:
#813 (comment)

Gabor Szarnyas · Answer 26 · Mon Aug 11 2014 16:02:54 GMT+0800 (China Standard Time)

@bpj, thanks for the quick reply.

(including the fact that at least in my PDF reader the link jumps to the caption rather than to the top of the image)

I looked at this issue and found that it can be fixed easily by adding \usepackage{caption} to the template (see http://tex.stackexchange.com/questions/27096/href-to-an-image-label-how-to-jump-to-the-image-instead-of-the-caption-below-t for details).

It is slightly less elegant in that you have to specify the id/label twice, and slightly more elegant in that you avoid the empty span element and the resulting empty paragraph.

I agree, I also prefer your solution.

Note that the blank line inside the div is necessary in order to make pandoc see the div contents as a paragraph, and thus to get the image inside a figure environment.

Wow. I experimented with using a div element but couldn't get is working. Adding the extra newline did the trick.

Benct Philip Jonsson · Answer 27 · Mon Aug 11 2014 20:26:44 GMT+0800 (China Standard Time)

Den 2014-08-11 10:03, Gábor Szárnyas skrev:

@bpj, thanks for the quick reply.

@szarnyasg, you are welcome; sleeplessness has its advantages! ;)

(including the fact that at least in my PDF reader the link jumps to the caption rather than to the top of the image)

I looked at this issue and found that it can be fixed easily by adding \usepackage{caption} to the template (see http://tex.stackexchange.com/questions/27096/href-to-an-image-label-how-to-jump-to-the-image-instead-of-the-caption-below-t for details).

It does indeed! If a similar trick were possible in HTML one could just wrap the caption text in a span:

![<span id="fig:lalune">A voyage to the Moon</span>](lalune.jpg)

In practice you don't even see that you end up at a figure caption, though.

It is slightly less elegant in that you have to specify the id/label twice, and slightly more elegant in that you avoid the empty span element and the resulting empty paragraph.

I agree, I also prefer your solution.

Not needing to hack LaTeX is always preferable! :)

Note that the blank line inside the div is necessary in order to make pandoc see the div contents as a paragraph, and thus to get the image inside a figure environment.

Wow. I experimented with using a div element but couldn't get is working. Adding the extra newline did the trick.

See here and here for the explanation!

BTW for those who use Vim with the UltiSnips plugin I made a snippet for using this idiom which endeavors to (optionally) reduce typing duplication to a minimum:

# # ANCHORED FIGURE IN SOURCE FOR BOTH HTML AND LATEX
# 
#     <div id="fig:ID/LABEL/NAME">
#     ![CAPTION\label{fig:ID/LABEL/NAME}](ID/LABEL/NAME.jpg})
# 
#     </div>
# 
# ## Tabs
#
# Tab     Description                  Default
# ------  ---------------------------  -------
# $1      the entire id/label[^a]
# $2      id/label prefix              fig[^b]
# $3      id/label unique part
# $4      caption text
# $5      filename minus extension     $3
# $6      extension                    jpg[^c]
#
# [^a]: You should normally just skip this tab or the 'magic' with $5 will be lost!
# [^b]: The following `:` is automatically removed if you delete the prefix.
# [^c]: The separating dot is automatically removed if you delete the extension
#       -- for use with the --default-image-extension option!
#
# NOTE: The blank line inside inside the div is necessary to make pandoc
#   see a paragraph and thus place the image inside a figure environment!
#
snippet figdiv "Anchored figure in HTML and LaTeX" b
<div id="${1:${2:fig}${2/.+/:/}${3:ID/LABEL/NAME}}">
![${4:CAPTION}\\label\{$1}](${5:$3}${6/.+/./}${6:jpg})

</div>

$0
endsnippet

Reply to this email directly or view it on GitHub:
#813 (comment)

Ivo Jimenez · Answer 28 · Tue Aug 26 2014 06:47:47 GMT+0800 (China Standard Time)

quick clarification regarding the div method. Redefining \hyperref is still needed in order to include a (Figure #)

Sarah Brofeldt · Answer 29 · Tue Sep 02 2014 22:18:41 GMT+0800 (China Standard Time)

@bpj

I'm sorry but I had a bit of trouble following the last bit of your conversation with @szarnyasg. Does it mean that you found a way to to produce correct (LaTeX) references to figures, images and tables?

Gabor Szarnyas · Answer 30 · Tue Sep 02 2014 23:34:58 GMT+0800 (China Standard Time)

@srhb: I managed to get it working, although it's a bit cumbersome. I created a Hungarian thesis template which is available here:

https://github.com/FTSRG/thesis-template-markdown
- chapter2.md includes the figure
- chapter4.md references it
- the default.latex template redefines the hyperref command, see https://github.com/FTSRG/thesis-template-markdown/blob/master/default.latex#L142
http://docs.inf.mit.bme.hu/thesis-template-markdown/thesis.pdf (on page 8 "(2.1. ábra)" is the reference for the figure.

It requires Pandoc 1.12.4.2+.

HTH,
Gabor

Sarah Brofeldt · Answer 31 · Tue Sep 02 2014 23:36:15 GMT+0800 (China Standard Time)

@szarnyasg Thank you kindly!

Tony Vashevko · Answer 32 · Tue Sep 16 2014 11:29:58 GMT+0800 (China Standard Time)

I spent a bit of time trying to sort through this issue with a pandoc filter instead of by redefining \hyperref in default.latex.

TL;DR: compile the script below and use it as a pandoc --filter when converting to latex. Use the div trick to get html internal linking to work.

https://gist.github.com/balachia/d836f8829aec61cb4b54#file-pandoc-internalref-hs

Pandoc doesn't make \ref's anywhere when writing out latex so instead you have to inject them by using the pandoc format's RawInline type and using some kind of pattern matching to figure out where to do it. Right now I'm pattern matching on any internal link that starts with "#fig:" or "#tab:" and I'm wiping out whatever text the user specifies for the text words in favor of the \ref text. So you get the equivalent of:

Figure words -> Figure \ref*{fig:lefig}

That said, it's probably possible to pattern match on a better pattern. With some thinking, it might be possible to avoid the div trick too, by pulling out images in divs. Not sure about that one yet, though.

Tony Vashevko · Answer 33 · Wed Sep 17 2014 06:14:24 GMT+0800 (China Standard Time)

I've been trying to port one my papers from LyX to markdown so this was a good project to procrastinate with. Right now I've written up a filter (pandoc-internalref) that tries to implement the
![caption](image){#fig:reference}
syntax, and seems to be doing a good job, at least for html and latex.

The haskell source for it is at https://github.com/balachia/pandoc-filters.

I'd love to get some feedback on how that works out.

Benct Philip Jonsson · Answer 34 · Thu Sep 18 2014 01:24:41 GMT+0800 (China Standard Time)

Den 2014-09-17 00:14, Tony Vashevko skrev:

I've been trying to port one my papers from LyX to markdown so this was a good project to procrastinate with. Right now I've written up a filter (pandoc-internalref) that tries to implement the
![caption](image){#fig:reference}
syntax, and seems to be doing a good job, at least for html and latex.

The haskell source for it is at https://github.com/balachia/pandoc-filters.

I'd love to get some feedback on how that works out.

I had the idea to abuse the image title to hold the id/label info:
[caption](path "title {#id .class}"), mostly because it was easy
to 'parse' with Perl,^1 but this certainly looks nicer, if it's
robust enough.

Aaron O'Leary · Answer 35 · Sun Sep 28 2014 03:40:04 GMT+0800 (China Standard Time)

I've written a python filter that:

allows arbitrary attributes on images (![caption](filename){#id .class key=value})
converts these images to labeled figures in latex output
overwrites all internal links with autoref{} in latex output
automatically numbers figures and links to them in html / markdown output
overwrites internal links to figures e.g. see [](#fig:lalune) -> see [Figure 1](#fig:lalune)
does the same for sections

The code is at https://github.com/aaren/pandoc-reference-filter

It is pretty customisable (just look at the source) and avoids redefining latex commands. '\autoref' could easily be replaced by '\ref'. Currently I'm doing the markdown links like [](#fig:lalune), but it might be that there is a more natural syntax. I've considered [#fig:lalune] as a replacement.

Kurt Pfeifle · Answer 36 · Sun Sep 28 2014 04:45:18 GMT+0800 (China Standard Time)

Thanks aaren,

your filter compiled without a problem. I wanted to play with it, but your
guidelines about how to prepare the markdown source is not clear to me.

The current syntax of Pandoc's markdown for images is this

So I tried to add your "new" syntax somehow, but it does not really "fit"
into the current syntax:

{#mypic}

...plus some more variations

with an attempt to link to the image from a different place in the document:

See this picture for more...

Could you please give some additional pointers?

On Sat, Sep 27, 2014 at 9:40 PM, aaren notifications@github.com wrote:

I've written a python filter that:

allows arbitrary attributes on images ([image: caption]
http://filename{#id .class key=value})

converts these images to labeled figures in latex output

overwrites all internal links with autoref{} in latex output

automatically numbers figures and links to them in html / markdown
output

does section numbering

The code is at https://github.com/aaren/pandoc-reference-filter

It is pretty customisable (just look at the source) and avoids redefining
latex commands. Currently I'm doing the markdown links like
, but it might be that there is a more natural syntax.
I've considered [#fig:lalune] as a replacement.

—
Reply to this email directly or view it on GitHub
#813 (comment).

Aaron O'Leary · Answer 37 · Sun Sep 28 2014 04:47:58 GMT+0800 (China Standard Time)

Regarding syntax: if we went with the [#fig:lalune] syntax, the collision with the existing syntax would be with implicit reference links that begin with a #. For example,

Here is an implicit reference link: [#1]

Here is a link to a figure: [#lalune]

[#1]: example.com

![caption](lalune.png){#lalune}

This would only be a problem when the reference is multiply defined, therefore I suggest the following definition:

An implicit internal reference link is an implicit reference link that begins with a '#' and does not have a link definition and does have a linked object (figure, section, whatever).

This could be implemented with a filter: undefined reference links are passed through the markdown reader as a string. We would just have to walk the tree and find strings that match [#ref] then link them up.

Aaron O'Leary · Answer 38 · Sun Sep 28 2014 04:54:12 GMT+0800 (China Standard Time)

@KurtPfeifle maybe better to keep this list clean - can you start an issue on my repo? I need to see your input file, the pandoc command you are using and the output it is giving you.

the syntax is ![caption](filename){#id} with the image in a separate paragraph.

Kurt Pfeifle · Answer 39 · Sun Sep 28 2014 05:44:51 GMT+0800 (China Standard Time)

On Sat, Sep 27, 2014 at 10:54 PM, aaren notifications@github.com wrote:

@KurtPfeifle https://github.com/KurtPfeifle maybe better to keep this
list clean - can you start an issue on my repo?

Sorry for the "spam" then. It happened inadvertently -- I just replied to
a mail that arrived in my inbox, without realizing it could be regarded as
polluting a "list".

Aaron O'Leary · Answer 40 · Sun Sep 28 2014 06:03:49 GMT+0800 (China Standard Time)

@KurtPfeifle no problem. I was just meaning that discussion about whether my filter does what you expect it to should happen on the repo for that filter. I'm happy to help, just I'm concious that this issue is getting pretty long and it would be good to avoid too much tangential back and forth.

Start an issue on my repo and I'll see if I can fix your problem.

Kurt Pfeifle · Answer 41 · Sun Sep 28 2014 06:32:11 GMT+0800 (China Standard Time)

Funnily -- when I now click on the link pointing to your repo
'pandoc-reference-filter' I end up in a different repo than the one where I
previously retrieved this Haskell source code from:

1 {-
2 Pandoc filter that cleans up internal references to figures and
tables.
3 It's an attempt to deal with: #813
4 Compile with:
5
6 ghc --make pandoc-internalref.hs
7
8 and use in pandoc with
9
10 --filter [PATH]/pandoc-internalref
11 -}
12 module Main where
13 import Text.Pandoc.JSON
14 import Data.List
15 -
16 main = toJSONFilter makeref
17 -
18 makeref :: Inline -> [Inline]
19 makeref (Link txt ('#':ident, x))-
20 | Just subident <- stripPrefix "fig:" ident = reflink-
21 | Just subident <- stripPrefix "tab:" ident = reflink-
22 where reflink = [Link [RawInline (Format "latex") ("\ref*{" ++
ident ++ "}")]("#" ++ ident, x)]
23 makeref x = [x]

Dunno what changed meanwhile...

In any case, the filter in the new location isn't Haskell, but Python (as
your post said), and the filter code which is written in Haskell from
a different repo (https://github.com/aaren/pandoc-filters) has 52 lines of
code (not 23) -- but it works :-)

So given that your original outline of how to use the extended syntax (item
"1." in your list) seems to have been incorrect and is replaced by

{#id}

now, I'm happy for the time being :-)

On Sat, Sep 27, 2014 at 9:40 PM, aaren notifications@github.com wrote:

I've written a python filter that:

allows arbitrary attributes on images ([image: caption]
http://filename{#id .class key=value})

converts these images to labeled figures in latex output

overwrites all internal links with autoref{} in latex output

automatically numbers figures and links to them in html / markdown
output

does section numbering

The code is at https://github.com/aaren/pandoc-reference-filter

It is pretty customisable (just look at the source) and avoids redefining
latex commands. Currently I'm doing the markdown links like
, but it might be that there is a more natural syntax.
I've considered [#fig:lalune] as a replacement.

—
Reply to this email directly or view it on GitHub
#813 (comment).

Aaron O'Leary · Answer 42 · Sun Sep 28 2014 06:47:53 GMT+0800 (China Standard Time)

Ok, I think you confused me with @balachia! We have both proposed filters that do similar things recently. To clarify:

https://github.com/aaren/pandoc-filters - my fork of a haskell filter discussed earlier in the thread
https://github.com/balachia/pandoc-filters - the original haskell filter
https://github.com/aaren/pandoc-reference-filter - my more recent python filter

I edited item 1 in my list because github wasn't rendering the non-backticked code as I wanted. This probably didn't show up on email.

Matthew Pickering · Answer 43 · Sun Sep 28 2014 07:02:44 GMT+0800 (China Standard Time)

Feel free to add links to your filters to the wiki page. I'm sure other users will find them useful.

Aaron O'Leary · Answer 44 · Thu Dec 11 2014 09:53:25 GMT+0800 (China Standard Time)

@mb21 whilst you're trying to address #261, I'd suggest a slight alteration to the latex writer that would go some way to solving this issue.

Currently, if you define an image alone in a paragraph it is turned into a figure in latex, i.e.

![blah](fig)

becomes

\begin{figure}[htbp]
\centering
\includegraphics{fig}
\caption{blah}
\end{figure}

What I suggest is passing the id of the attr through to the label of the figure, i.e.

![blah](fig){#thefigure}

becomes

\begin{figure}[htbp]
\centering
\includegraphics{fig}
\caption{blah}
\label{thefigure}
\end{figure}

Whilst this wouldn't solve this issue it would be a big step in the right direction. I understand if you'd like to avoid scope creep though - it's great that anyone is putting Attr on Image at all! Thank you :)

Mauro Bieg · Answer 45 · Thu Dec 11 2014 21:50:03 GMT+0800 (China Standard Time)

@aaren Good idea. Should ![blah](fig){#thefigure} go to \label{thefigure} or \label{fig:thefigure}? For consistency with HTML-ids probably the former, although there is a LaTeX convention to use the latter. Since you cannot reference it with [link](#thefigure) anyhow (which gets converted to \thefigure[blah]{link} instead of \ref{thefigure}), this issue will still be only half-solved, but if you're targeting only LaTeX you can just write \ref{thefigure}.

Aaron O'Leary · Answer 46 · Thu Dec 11 2014 23:58:35 GMT+0800 (China Standard Time)

@mb21 definitely the former. Prepending fig: is a convention, not a rule.

Yes it only half solves this issue but it avoids the current figure-labeling workarounds which aren't very graceful (e.g. putting the \label in the caption).

Matthew Pickering · Answer 47 · Fri Dec 12 2014 00:49:19 GMT+0800 (China Standard Time)

@mb21 See how the attributes are currently handled for divs and spans.. I can't remember the precise command we use there.

Tamas Nagy · Answer 48 · Fri Dec 12 2014 05:53:17 GMT+0800 (China Standard Time)

Most of the discussion here is centered around labeling single figures, but what about figures containing separate panels, all of which may be referenced independently? It's been my experience that most scientific articles have figures like Fig. 1A, 1B, ... 1N and they must be referenced independently of each other, but displayed in the same panel. Currently, I believe I can hack a version using the latex subfigure package and \label, but this would not work with either the HTML or the DOCX output. This would be a more complex case to handle, but no less important IMO.

Aaron O'Leary · Answer 49 · Fri Dec 12 2014 06:55:03 GMT+0800 (China Standard Time)

@tlnagy I think that comes into a separate issue - having a Figure block element that allows you to have subfigures. Have a look at what @timtylin is doing with scholdoc.

Currently, having a single image in a paragraph become a figure is a bit of a hack. It may be possible for this to coexist in the future with a multi panel schema, but we'd need to decide what that is. There might already be an existing issue for this but I'm not sure.

Mauro Bieg · Answer 50 · Sun Dec 14 2014 19:34:35 GMT+0800 (China Standard Time)

(Yes, I went without the fig: prefix, the headers had no prefix either.)

So now you'll be able to add manual internal references to figures like:

![](image.png){#myFigure}

As can be seen in [Figure 1](#myFigure)...

(The LaTeX writer actually already converts [text](#link} to \hyperref[link]{text} and not \href{link}{text}, therefore the above works.)

To generate the reference text (i.e. "Figure 1") automatically, this really should be done in a filter so as to make it consistent across all output formats, as not all output formats (e.g. HTML) have a \autoref command (this is similar to how bibliography entries are handled by the pandoc-citeproc filter instead of LaTeX). I like aaron's idea of overloading the empty link syntax for that ([](#link)), since I cannot image ever actually wanting an empty link.

Aaron O'Leary · Answer 51 · Sun Dec 14 2014 20:22:51 GMT+0800 (China Standard Time)

@mb21 that's great! do you have a PR we can try out?

I'm using \autoref in my filter but am tempted to use \ref instead. I've also got a generic output format that will just put in Figure 1 or whatever.

I started out using [](#ref) as the in-text link, for the reasons that you give. Then I used [#ref], because it's less typing and internal references are a bit like implicit reference links anyway. Now I'm using #ref, because it's even less typing and the parallel with citations. Scholdoc uses either of the last two.

In the latter two cases, links are only made if there is something defined with the link on it. You can still write e.g. #1, as long as there isn't something with 1 as a label.

Me and @timtylin had a discussion about [#ref] vs. #ref and \autoref vs \ref on timtylin/issues/3.

Finally, note that the characters in latex labels are limited. I'm not sure if there is an equivalent restriction in html.

Mauro Bieg · Answer 52 · Sun Dec 14 2014 21:06:30 GMT+0800 (China Standard Time)

@aaren Yes, this is the pull request, as you can see from the discussion there it's not quite finalized yet though.

I don't have strong opinion on [#ref] vs #ref vs [](#ref). However the last one would enable the filter to work on every input format that can have empty links, although I guess you can write everywhere #ref in plain text as well.

Haven't had time yet to look closely at your filter, but sounds great!

About the limited characters, there was already an existing toLabel function that is used by header ids.

John MacFarlane · Answer 53 · Mon Dec 15 2014 00:05:17 GMT+0800 (China Standard Time)

+++ mb21 [Dec 14 14 03:34 ]:

I like aaron's idea of overloading the empty link syntax for that ([](#link)), since I cannot image ever actually wanting an empty link.

Note that gitit overloads empty links as wikilinks.

Benct Philip Jonsson · Answer 54 · Mon Dec 15 2014 00:39:59 GMT+0800 (China Standard Time)

I have recently written a filter which overloads empty link texts in yet
another way: it alters the AST so that the plain writer produces what looks
like perldoc POD markup, with the convention that you can put strings in
POD link syntax in the link title to get a perldoc link, and an empty link
text is automatically expanded to "the Foo::Bar module" or something like
that depending on whether the title is prefixed with "pod:" (for internal
links), "cpan:", "perldoc:", or "man:" (for manpage links)
(While a companion filter will expand a zero as the URL text into a link to
the appropriate web site.) and I can well imagine still other ways to
overload empty link texts, so I very much think such overloading should be
left to filters.
Den 14 dec 2014 17:05 skrev "John MacFarlane" notifications@github.com:

+++ mb21 [Dec 14 14 03:34 ]:

I like aaron's idea of overloading the empty link syntax for that
([](#link)), since I cannot image ever actually wanting an empty link.

Note that gitit overloads empty links as wikilinks.

—
Reply to this email directly or view it on GitHub
#813 (comment).

scaramouche1 · Answer 55 · Wed Jan 07 2015 01:54:20 GMT+0800 (China Standard Time)

The lack of internal cross-references is the major stumbling block for Pandoc's "world domination" in academia. This thread has made great progress at discussing the problem and providing solutions for cross-referencing figures.

But the problem of cross references is more general, and thus it makes sense to try to solve it in a more general way. For instance, mathematical documents cross-reference theorems and equations, social science cross reference hypotheses, philosophy cross references examples, and most disciplines cross reference section numbers. For Pandoc to be really helpful to all these disciplines, a general solution must be devised.

I propose a slight modification to the # and @ notations to allow for all types of cross-references. Here's the informal specification:

Anchors (#) and references (@) can optionally include a type descriptor. For instance, #fig:cat, #eq:force, #sec:intro, etc. The descriptor is what's in between '#' and ':'. Example uses:

Referencing a figure

![This is a cat](cat.png) {#fig:cat}

As seen in Figure @fig:cat.

Referencing an equation

$$F = ma$$ {#eq:force}

As seen in Equation @eq:force.

Referencing a hypothesis

Hypothesis {#hyp:temp}: Temperature increases in summer.

As mentioned in Hypothesis @hyp:temp.

This notation also serves to cite theorems, proofs, etc. (@thm:, @proof:).

Referencing a section number

# Introduction {#sec:intro}

As mentioned in Section @sec:intro.

This is in addition to the implicit referencing of headers already in place.

How to deal with possible clash with bibliographic references

There proposed notation has a small clash with the notation used to cite bibliographic references. I propose that references that include a ":" are first searched inside the document, and only if there's no internal match they are searched in the bibliographic system. Alternatively, ':' could be forbidden for bibliographic references.

John MacFarlane · Answer 56 · Wed Jan 07 2015 02:32:07 GMT+0800 (China Standard Time)

I like the idea of separate numbering sequences for
different kinds of things. And using a prefix with a colon
for that is fairly sensible.

The use of @ is problematic, as @ is already used for
citations and for numbered examples in pandoc. (Though
perhaps this mechanism could replace the current mechanism
for numbered examples: {#ex:foo}, @ex:foo?)

One thing that might be missing is control over the
numbering schemes used. For example, you might want figures
to number with a prefix by chapter or section: e.g., in
chapter 2, figures are numbered 2.1, 2.2, etc. It would be
nice to be able to control that somehow.

Another question is how this would be implemented. Would
the pandoc AST contain label and reference nodes? Or would
the Markdown reader simply convert these to hyperlinked
numbers (simpler)?

Aaron O'Leary · Answer 57 · Wed Jan 07 2015 02:40:14 GMT+0800 (China Standard Time)

@scaramouche1 why not use # instead of @? i.e.

![This is a cat](cat.png) {#fig:cat}

As seen in Figure #fig:cat.

This way you avoid the conflict with citations. I find it easier to use as well because of the cognitive separation between cross references and citations.

Thinking even further forward (much much further!), you could end up doing something like

As seen in Figure `@somepaper#fig:cat`

to reference a figure in another document.

Aaron O'Leary · Answer 58 · Wed Jan 07 2015 02:50:04 GMT+0800 (China Standard Time)

@jgm I could imagine more advanced cross referencing being implemented by an external filter (as for citations with pandoc-citeproc). Configuration of the numbering etc. would then be done through metadata.

I think the AST implementation depends on how we conceive of an internal reference - is a hyperlink sufficient to describe it? I'm not sure on this.

scaramouche1 · Answer 59 · Wed Jan 07 2015 04:13:29 GMT+0800 (China Standard Time)

@jgm, @aaren: I don't know enough about the internals of Pandoc to be of much help with implementation ideas. Here, I just include how this could be translated into LaTeX.

When exported to LaTeX, 'fig:', 'eq:', and 'sec:' references can be translated in the standard way (\label for the anchor and \ref for the reference), and all other reference types can use plain counters (1, 2, 3, ...). In LaTeX these per-type, plain counters can be implemented as follows (here's an example for 'hyp:'):

%in preamble: create counter and anchor for 'hyp'
\newcounter{ctrhyp}
\newcommand{\anchorhyp}[1]{\refstepcounter{ctrhyp}\arabic{ctrhyp}\label{#1}}

\begin{document}
Hypothesis \anchorhyp{hyp:temp}: Temperature increases in summer.

As seen in hypothesis \ref{hyp:temp}.
\end{document}

Making this work with other output formats probably requires programming a "counter" module in Pandoc that can deal with a few counting schemes. As a starting point, rather than making the schemes user-configurable, it may be better to copy the LaTeX defaults (i.e., sections use nested counters, and everything else use plain counters). In the future, alternative counting schemes could be specified in YAML.

scaramouche1 · Answer 60 · Wed Jan 07 2015 05:30:43 GMT+0800 (China Standard Time)

@jgm: For what is worth, I believe it is OK to replace the current example mechanism (@) with the proposed, more general cross-referencing functionality. The few documents that use (@) would need to be updated to use '@ex:', but a huge number of use cases could be allowed without changing Pandoc's syntax substantially. This functionality would make Pandoc the best way to work on academic papers in any discipline.

@aaren: I believe it is better to use '@' than '#' for references: a cross-reference is a citation to a part of a document. Thus, cross-references and citations are conceptually similar---they are pointers to an object---and could use the same '@' symbol. Using '#' would also be OK, but at the expense of a slight increase in cognitive load (because '#' is currently used to denote anchors, not references).

mangecoeur · Answer 61 · Mon Jan 12 2015 06:43:53 GMT+0800 (China Standard Time)

It seems to me that including the the references in the AST would be more powerful and future proof, especially wrt pdf/latex conversion - you would be able to tweak the output to work with the styling system. I think control over the numbering scheme would have to be through the template - though not sure how you could define rules in a flexible but still reasonably simple way

On 6 Jan 2015, at 19:32, John MacFarlane notifications@github.com wrote:

I like the idea of separate numbering sequences for
different kinds of things. And using a prefix with a colon
for that is fairly sensible.

The use of @ is problematic, as @ is already used for
citations and for numbered examples in pandoc. (Though
perhaps this mechanism could replace the current mechanism
for numbered examples: {#ex:foo}, @ex:foo?)

One thing that might be missing is control over the
numbering schemes used. For example, you might want figures
to number with a prefix by chapter or section: e.g., in
chapter 2, figures are numbered 2.1, 2.2, etc. It would be
nice to be able to control that somehow.

Another question is how this would be implemented. Would
the pandoc AST contain label and reference nodes? Or would
the Markdown reader simply convert these to hyperlinked
numbers (simpler)?
—
Reply to this email directly or view it on GitHub #813 (comment).

Mauro Bieg · Answer 62 · Sat Jan 17 2015 20:54:47 GMT+0800 (China Standard Time)

Another question is how this would be implemented. Would
the pandoc AST contain label and reference nodes? Or would
the Markdown reader simply convert these to hyperlinked
numbers (simpler)?

A third option would be to have a filter (similar to the citeproc-filter) make the conversion to hyperlinked numbers. The filter would be enabled by default for the markdown reader. That way, the pandoc-types wouldn't need to be changed but the feature wouldn't be restricted to markdown input. @jgm Are there any downsides to this way? Or do you not want the three-stage process (reader -> filters -> writer) to become the norm when not using custom filters?

references in the AST would be more powerful and future proof, especially wrt pdf/latex conversion - you would be able to tweak the output to work with the styling system.

@mangecoeur, could you elaborate on that? I don't see what you're getting at.

how we conceive of an internal reference - is a hyperlink sufficient to describe it?

Can anyone come up with an example use case where a hyperlink isn't enough? I for one, cannot.

For the case where the target is inline text (and not a block like a figure, table etc.), I would prefer the syntax to be the often-discussed span-syntax:

[Hypothesis]{#hyp:temp}: Temperature increases in summer.

instead of:

Hypothesis {#hyp:temp}: Temperature increases in summer.

While a tad more cumbersome to write, it makes conceptually much more sense to me, since the attribute is then on a span element instead of floating around in nowhere. This would generate the HTML:

<span id="hyp:temp">Hypothesis 1</span>: Temperature increases in summer.

I agree that it makes sense to generalize the example list numbering, so you'd have fig:, eq:, hyp: etc. and also ex:. You would reference an example like: As @ex:good illustrates, ... The trickier part is how to write the example list itself. We could of course stick with the current syntax, but an arguably more consistent syntax would need attributes on list items, something like:

- This is a good example. {#ex:good}
- This is a bad example. {#ex:bad}

As @ex:good illustrates, ...

However attributes on list items are really hard because how do you know whether the attributes are on the last list item, the entire list, or even on the last paragraph of the last list item? A possible solution is to simply look out for span tags (that have an id) in example lists:

(@) This is a good [example]{#ex:good}. Do it like that.
(@) This is a bad [example]{#ex:bad}. Do not do it like that.

As @ex:good illustrates, ...

Or similarly:

- This is a good [example]{#ex:good}. Do it like that.
- This is a bad [example]{#ex:bad}. Do not do it like that.
{.example-list}

As @ex:good illustrates, ...

John MacFarlane · Answer 63 · Sun Jan 18 2015 02:11:46 GMT+0800 (China Standard Time)

+++ mb21 [Jan 17 15 04:54 ]:

Another question is how this would be implemented. Would
the pandoc AST contain label and reference nodes? Or would
the Markdown reader simply convert these to hyperlinked
numbers (simpler)?

A third option would be to have a filter (similar to the citeproc-filter) make the conversion to hyperlinked numbers. The filter would be enabled by default for the markdown reader. That way, the pandoc-types wouldn't need to be changed but the feature wouldn't be restricted to markdown input. @jgm Are there any downsides to this way? Or do you not want the three-stage process (reader -> filters -> writer) to become the norm when not using custom filters?

There's a performance downside, I imagine. One would have to measure the time it takes to walk the tree and do this kind of transformation vs the time for parsing itself. If it is much smaller, then it may not matter so much.

For the case where the target is inline text (and not a block like a figure, table etc.), I would prefer the syntax to be the often-discussed span-syntax:

[Hypothesis]{#hyp:temp}: Temperature increases in summer.

instead of:

Hypothesis {#hyp:temp}: Temperature increases in summer.

While a tad more cumbersome to write, it makes conceptually much more sense to me, since the attribute is then on a span element instead of floating around in nowhere. This would generate the HTML:

Hypothesis 1: Temperature increases in summer.

I believe this is the recommended best practice for inserting targets in HTML -- putting ids on real elements instead of adding anchors.

I agree that it makes sense to generalize the example list numbering, so you'd have fig:, eq:, hyp: etc. and also ex:. You would reference an example like: As @ex:good illustrates, ... The trickier part is how to write the example list itself. We could of course stick with the current syntax, but an arguably more consistent syntax would need attributes on list items, something like:
- This is a good example. {#ex:good}
- This is a bad example. {#ex:bad}

As @ex:good illustrates, ...

This seems a bit confusing to me, since it looks like a bullet list. So, I'd probably prefer

(@ex:good) This is a good example.

That makes it clearer from the text itself that you're referring to a numbered list item.

scaramouche1 · Answer 64 · Mon Jan 19 2015 04:09:47 GMT+0800 (China Standard Time)

I am glad that a consensus is starting to form around the syntax of cross-references.

In this post I try to formalize a little bit more the syntax by discussing boundary cases and parsing and rendering details. My aim is to help the implementors, by making sure beforehand that the syntax is useful in a broad range of situations and its semantics are unambiguous.

Anchors and references

Anchors label an object in a document. References point to an anchor.

Anchors

Anchors have the following syntax:

{#type:[descriptor]}

The descriptor part is optional. These are examples of valid anchors:

{#ex:good}, {#ex:}, {#hyp:temp}, {#eq:force}, {#eq:}, {#sec:intro}

Anchors without a descriptor (such as {#ex:} or {#eq:} above) are useful for cases in which one will not refer back to an anchor, but wants the anchor to show up in the document. For instance, an author may want to have several equations or examples numbered, even if these won't be cross-referenced. In such cases, the author would prefer not to waste time creating unique labels.

Types and descriptors can only contain alphanumeric characters plus "_" and "-" (i.e., [A-Za-z0-9_-]+).

It is suggested that authors use mnemonic types (e.g., "sec" for sections, "eq" for equations), but this is not mandatory.

References

The syntax of references is similar to the syntax of anchors, except that: (i) the curly braces are optional, (ii) references include an "@" instead of a "#", and (iii) the descriptor part is not optional. Thus, their syntax is:

@type:descriptor       or       {@type:descriptor}

Examples of valid references are:

@ex:good, @hyp:temp, @eq:force, @sec:intro
{@ex:good}, {@hyp:temp}, {@eq:force}, {@sec:intro}

Examples of invalid references are:

@ex:, {@eq:}

An invalid reference is rendered in a document in an easy to notice way (e.g., "???").

What distinguishes a document cross-reference from a bibliographic reference is that document cross-references must include a ":".

Standard and special anchors

It is customary to render some anchors in special ways. For instance, equation numbers are put in parenthesis and right justified, while section numbers appear before the section heading.

Pandoc would detect these special anchors depending on the context where the anchor appears. For instance if an anchor follows an equation, this anchor will be deemed "special" and it will be formatted accordingly.

The special anchors are the following: equations, headings, and figures. Examples of these anchors:

$$ F = ma $$ {#eq:force}
$$ x = 1 $$ {#eq:}
# Introduction {#sec:intro}
![This is a cat](cat.png) {#fig:cat}

Special anchors are rendered according to style-specific rules (initially these rules could match LaTeX defaults, but in the future they could be user-configurable; see initial ideas on how to configure this at the end of the document). The previous examples could be rendered as follows:

       F = ma           (1)
       x = 1            (2)
1 Introduction
       [IMAGE]
Figure 1: This is a cat.

Any anchor that is not special is a "standard" anchor. Examples:

Hypothesis {#hyp:temp}: temperature increases in summer.
Proof {#proof:equal}: 1 = 1
({#ex:good}) This is a good example.
({#ex:}) This is a numbered, but not referenceable example.

Standard anchors are rendered as auto-increasing counters. Each anchor type is associated to its own counter (which starts at 1). For instance, the previous examples would be rendered as follows:

Hypothesis 1: temperature increases in summer.
Proof 1: 1 = 1
(1) This is a good example.
(2) This is a numbered, but not referenceable example.

Rendering references

References to defined anchors are rendered as the number of the corresponding anchor.

For instance, the following references

As seen in Equation @eq:force
As described in Section @sec:intro
As observed in Figure @fig:cat
As suggested in Hypothesis @hyp:temp
As demonstrated in proof @proof:equal
As shown in example @proof:equal

Would be rendered as:

As seen in Equation 1
As described in Section 1
As observed in Figure 1
As suggested in Hypothesis 1
As demonstrated in proof 1
As shown in example 1

Note that the rendered reference does not include any extra text apart from the number of the reference. That is, "@fig:cat" renders simply as "1", not as "Figure 1" or "Figure (1)". This is because more automation conflicts with some referencing needs. For instance, from time to time authors may need to write things such as:

As shown in figures 1--5.
As suggested in H1.
As seen in Equation (1).

Using the current notation, these can be accomplished in the following ways:

As shown in figures {@fig:cat}--{@fig:dog}.
As suggested in H{@hyp:temp}.
As seen in Equation (@eq:force).

References with long hyperlinks

Normally only the number of the reference is hyperlinked to the position of the anchor. For instance, "Equation @eq:force" is rendered as "Equation 1" where the "1" is a hyperlink to the corresponding equation.

If one wants the whole "Equation 1" to be the hyperlinked one can write: "[Equation ]@eq:force, "[Equation ]{@eq:force}".

Thus, text in square brackets immediately followed by a reference shares the same hyperlink as the reference.

Translating to LaTeX

LaTeX contains environments to render the special anchors. Thus:

equations should be rendered using \begin{equation}\label{eq:force} ... \end{equation}
headers should be rendered \[sub]section{...}\label{ref:sec:intro}
figures should be rendered using \begin{figure}\centering\includegraphics[width=\linewidth]{cat.png}\caption{This is a cat.}\label{fig:cat}\end{figure}

Standard anchors are rendered by creating a new command in the preamble. For instance, the following implements a counter for hypotheses:

%in preamble: create counter and anchor for 'hyp'
\newcounter{ctrhyp}
\newcommand{\anchorhyp}[1]{\refstepcounter{ctrhyp}\arabic{ctrhyp}\label{#1}}

%in document body: create anchor and refer back to it.
Hypothesis \anchorhyp{hyp:temp}: Temperature increases in summer.

As seen in hypothesis \ref{hyp:temp}.

References with long hyperlinks are linked using the hyperref package. For instance "As seen in [hypothesis ]{@hyp:temp}" is rendered as

As seen in \hyperref[hyp:temp]{Hypothesis \ref*{hyp:temp}}.

Translating to HTML

HTML does not have predefined mechanisms to deal with numbering, thus all of the numbering is done by Pandoc in a way that mimics the default LaTeX formats.

Other ideas

Here I include a couple ideas that are related to the cross-reference system.

Non-standard position of the references

In some documents, the References section does not appear at the end. For instance, many papers include appendixes after the references. One could choose a non-standard position for the references by adding an the anchor "{#references:}" For instance,

# Introduction
....

# References {#references:}

# Appendix
....

Custom numbering schemes

In the future, it may be nice to be able to customize the numbering schemes used by the different reference types. This could be done from a YAML section. Here's an example of how it could be used:

format: #sec:1.A.i
format: #eq:1.1
format: #fig:1-a
format: #hyp:A
format: #ex:1

This would number things as follows:

1 Section
  1.A Subsection
      Equation 1.1
      Hypothesis A
      Hypothesis B
      Example 1
      Example 2
      1.A.i  Subsubsection
             Equation 1.2
             Figure 1-a
      1.A.ii Subsubsection
             Figure 1-b
  1.B Subsection
      Hypothesis C
2 Section
  Figure 2-c
  Example 3
  ...

Jaremy Creechley · Answer 65 · Mon Jan 19 2015 06:10:50 GMT+0800 (China Standard Time)

This has been a good conversation to follow. I agree with @scaramouche1 that I cannot currently use Markdown for technical articles beyond informal group writing and some papers for coursework. I have been developing my workflow more around Markdown and lightweight text editing, and this has been a major hinderance (the other being able to quickly "tag" custom attributes readily).

This seems a bit confusing to me, since it looks like a bullet list. So, I'd probably prefer
(@ex:good) This is a good example.
That makes it clearer from the text itself that you're referring to a numbered list item.

@jgm, were you proposing that the syntax for numbered examples and general anchors could be bridged by placing the anchor at the start of the list item? Like:

- (@ex:good) Great idea!
- (@ex:bad) Ok idea. 
- (@ex:ugly) Really bad idea.

It seems that both features could co-exist for a while with the older syntax being deprecated at some future point.

@scaramouche1, the write up you did helped me reason about whether it this syntax would be useful and generic. Thanks! It covered almost all the questions I had, except how anchor would be associated with generic blocks of text. Going from what @bpj mentioned, I am not sure how this syntax would be applied outside of specific "standard" contexts of figures, equations, etc or manually specifying the block [Some Text]{#ref:} .

For example, where would the anchors be attached in these cases both for internal representation and for the example output HTML?

<span id="custom-checklist">
- [  ] Task 1
- [x] Task 2
- [?] Task 3
</span> {#ch:proposed-project-tasks}

This proposal will specify ... and a lot of other text. {#desc:proposed-project-tasks}

The simplest rule might be just attaching the anchor to the last "span" or block (text, figure, etc) excepting the predefined standard rules for example lists, figures, and others. Currently I have only briefly looked at the Pandoc source and am not sure how the AST is structured to give any technical opinions.

scaramouche1 · Answer 66 · Mon Jan 19 2015 07:05:18 GMT+0800 (China Standard Time)

@elcritch: In the syntax proposed above, the anchor is associated to a rendered number. So, something close to your example could be entered as:

This is checklist {#cklist:xyz}:
 - Task 1
 - Task 2
 - Task 3

As listed in checklist @cklist:xyz ...

Which would be rendered as:

This is checklist 1:
 - Task 1
 - Task 2
 - Task 3

As listed in checklist _1_ ...

(The "_" is denoting what's hyperlinked.)

Extension: Unnumbered anchors and references

The previously proposed syntax did not account for cases in which anchors and references are not numbered. One way to extend the syntax to allow that use-case is by defining that anchors starting with "-" are unnumbered anchors (this notation is akin to the one used for year-only bibliographic citations). Thus, one could enter:

Checklist: {-#cklist:xyz}
 - Task 1
 - Task 2
 - Task 3

As listed in the [checklist]{@cklist:xyz} ...

Which would be rendered as:

Checklist:
 - Task 1
 - Task 2
 - Task 3

As listed in the _checklist_ ...

Translating to HTML and LaTeX

I believe the simplest HTML implementation is to translate unnumbered anchors as <a name="cklist:xyz"></a> and reference to them as <a href="#cklist:xyz">checklist</a>.

A LaTeX translation could use \hypertarget{cklist:xyz}{} for the anchor and \hyperlink{cklist:xyz}{checklist} for the reference.

Unnumbered references

The unnumbered notation should also apply to unnumbered references. For instance, [checklist]{-@cklist:xyz} would produce an unnumbered reference (i.e., a hyperlink) irrespective of whether the anchor was defined as {#cklist:xyz} or as {-#cklist:xyz}.

Dealing with an erroneous reference to an unnumbered anchor

Referencing an unnumbered anchor (e.g., [checklist]{-@cklist:xyz}) without a reference preceded by [text] should be rendered as an error (???). For instance,

Checklist: {-#cklist:xyz}
 - Task 1
 - Task 2
 - Task 3

As listed in the @cklist:xyz ...

Would be rendered as:

Checklist:
 - Task 1
 - Task 2
 - Task 3

As listed in the ??? ...

Mauro Bieg · Answer 67 · Sun Jan 25 2015 19:23:02 GMT+0800 (China Standard Time)

Please note that anchors shouldn’t be seen as stand-alone items. The reason why anchors (or in pandoc/html parlance: ids) need to be attached to an element is that they are part of the Attr (attribute block) in pandoc’s internal data model, see e.g. the pandoc type for a header. This is analogous to HTML where the anchor/id also needs to be attached to an alement, e.g. <h1 id="myAnchor">…</h1>.

I agree that @fig:force should generally only result in the number (e.g. “1”), not the string “Figure 1”. So how to avoid that only the number would be part of the hyperlink then? I see three (not mutually exclusive) ways:

A manual syntax, similar to the current bibliography-references: As seen in [Figure @fig:force].
We could also make the preceeding word part of the hyperlink (if the word is in the same paragraph and there is nothing but exactly one space separating the word and the number). This should make the most common use case easy to type: As seen in Figure @fig:force -> As seen in <a href="#fig:force">Figure 1</a>.
Manually make a link: As seen in [Figure](#fig:force). This has the disadvantage that people sometime would have to use @ to make a reference and sometimes a link with # which would be very confusing. Note that I don’t like [Equation]{@eq:force} because that would clash with the proposed span-syntax and doesn’t look like a link or reference.

I tend towards (1) in combination with (2).

Mauro Bieg · Answer 68 · Sun Jan 25 2015 22:09:04 GMT+0800 (China Standard Time)

@jgm regarding the numbered example lists, I find the @ instead of the usual # at the anchor place (instead of the reference place) rather confusing. Maybe one of the following two?

(#ex:good) This is a good example.
(#ex:bad) This is a bad example.

{#ex:good} This is a good example.
{#ex:bad} This is a bad example.

scaramouche1 · Answer 69 · Mon Jan 26 2015 00:56:49 GMT+0800 (China Standard Time)

@mb21: I believe that proposal 1 is better (manual syntax similar to bibliography-references). As seen in Figure {@fig:a} would just link the number; and as seen in [Figure {@fig:a}] would link "Figure 1".

Proposal 2 is interesting, but I am not sure it would render the right thing is all cases. Would it lead to consistent and useful behavior in cases such as as seen in H{@hyp:a}, as seen in examples {@ex:a}--{@ex:z} and as seen in examples {@ex:a} through {@ex:z}?

Proposal 3 is similar to something I had proposed earlier. But now I think this behavior it is too complex. Proposal 1 dominates Proposal 3.

Aaron O'Leary · Answer 70 · Fri Feb 06 2015 00:20:55 GMT+0800 (China Standard Time)

@scaramouche1 - excellent contribution, thank you :).

I am leaning more towards using @ as the prefix now (rather than #).

@mb21 I think option 1 is good (see [Figure @fig:a])

Option 2 is tempting but what about plurals? (see figures @fig:a and @fig:b)

You might have to have typing for reference objects and some grammar of referencing to get this to work fully. I can see it getting complicated unless you just went with an allowed list of words ('figure', 'section', 'table' etc.) and only linked with these. This would be nice, but maybe it is too magical? Could be a configuration option on a filter.

Option 3 I don't like.

Mauro Bieg · Answer 71 · Mon Feb 09 2015 03:31:45 GMT+0800 (China Standard Time)

Okay, so here comes my attempt at a summary of where I feel this is headed. Sorry if this thread is turning into a list of summaries.

An id attribute that contains a colon is a special anchor. If there is no colon, it will just be an id as it is today, which you can link to, but without automatic numbering. The part of the anchor before the colon (e.g. fig in # my header {#fig:example}) serves as a namespace and the numbers are incremented separately for each namespace. If the author wants to have the number generated for the anchor but doesn't need to reference it, she can write e.g. {#fig:} which will be expanded to e.g. {#fig:2} (assuming it's the second figure). (Also, the namespace may contain only alphanumerics so we could use specialchars like the dash in the future for stuff like otherpaper-fig:cat.)

Anchors: The Markdown Reader inserts the number at an appropriate place for all elements that have anchors. For example:

Figures:
  ![myCaption](cat.png){#fig:cat}
  <div class="figure" id="fig:cat">
    <img src="cat.png"/>
    <p class="caption">Figure 1: myCaption</p>
  </div>
Headers:
  # Introduction {#sec:intro}
  <h1 id="sec:intro">1. Introduction</h1>
Inline spans:
  [Hypothesis]{#hyp:temp} is that...
  <p><span id="hyp:temp">Hypothesis 1</span> is that...</p>
Example lists (# or @ can both be used, # for consistency with the other ids, @ for backwards-compatibility):
  (#ex:good) This is a good example.
  (#ex:bad) This is a bad example.
Equations:
  $$F = ma$$ {#eq:force}

Note that Math currently has no attributes in the Pandoc data model, so I guess equation numbering would have to wait for that to be implemented. Maybe, it would even make sense to add a special Equation data constructor since the output for LaTeX and HTML would have to differ substantially to place the numbering correctly on the page (for figures and headers it should work to do the numbering all in the Markdown Reader instead of relying on LaTeX):

\begin{equation} \label{eq:force}
F = ma
\end{equation}

see \hyperref[eq:force]{equation (1)}

vs.

<p><span class="math" id="eq:force">F = ma</span> <span class="mathnr">(1)</span></p>
<p>see <a href="#eq:force">equation (1)</a></p>

References like @fig:cat generally get converted by the Markdown Reader to a number that is wrapped in a link to the anchor. However:
1. To specify a broader text range for the link, a syntax similar to the current bibliography-references can be used:
```
As seen in [Figure @fig:cat]
```
2. A possible automatism: if there are no brackets, the preceeding word will be made part of the hyperlink (if the word is contained in a whitelist like table, figure etc. and there is nothing but exactly one space separating the word and the number). This should make the most common use case easy to type:
```
As seen in Figure @fig:cat.
<p>As seen in <a href="#fig:cat">Figure 1</a>.</p>
```
Some namespaces have default numbering schemes set (e.g. sec should follow ISO 2145). However, the numbering scheme can be specified for each namespace separately in the YAML metadata. I haven't found any standardized format to describe numberings, but maybe @scaramouche1's suggestion with a few extension can work. In the format field, the following reserved characters would be replaced with a number in the corresponding format:
```
1 -> arabic numbers: 1, 2, 3
a -> alpha lowercase: a, b, c
A -> alpha uppercase: A, B, C
i -> roman lowercase: i, ii, iii
I -> roman uppercase: I, II, III
o -> short ordinal: 1st, 2nd, 3rd
O -> long ordinal: first, second, third
```
Subheadings in the text that are at a deeper level than the number of reserved characters from above in the YAML would have no effect on the numbering, it would just continue throughout the e.g. sub-sub-sub-sections. YAML fields like position, prefix and suffix are used to generate the strings in the anchors and references, and whether they should be prepended or appended to the existing content of the anchor (e.g. prepended to figure captions and header titles, but appended to equations). Example YAML:
```
numbering-schemes:
  eq:
    position: after
    format: ' (1)'
  sec:
    position: before
    format: '1.1.1 '
  hyp:
    format: 'A '
  ex:
    prefix: 'Example '
    format: '1'
    suffix: ': '
```
I'm not 100% sure this format is really generic (what about right-to-left scripts?) but its simplicity for common use cases is compelling.
It seems possible to implement all this in the Markdown Reader. It should probably go behind a feature flag that is mutually exclusive with —number-sections, which already enables header numbering in the LaTeX, ConTeXt, HTML and ePUB Writers.

Aaron O'Leary · Answer 72 · Mon Feb 09 2015 05:44:13 GMT+0800 (China Standard Time)

An id attribute that contains a colon is a special anchor.

@mb21 I'm not sure about

forcing namespacing using the anchor
only being able to refer to objects with a special (colon containing) anchor

whilst the type:tag convention is quite common I'm not sure that we should force it on people.

I suggest that any object that can have an anchor can be referred to. If you want to link to it with a regular hyperlink, use [the object](#anchor); if you want internal-referencing, use @anchor.

The Markdown reader would scan the text for all @ references and for all anchor definitions and then associate them together, using the object that the anchor is defined on to determine the numbering scheme.

scaramouche1 · Answer 73 · Mon Feb 09 2015 06:27:36 GMT+0800 (China Standard Time)

@aaren I have a question: if the type: part is not required, how would Pandoc guess that these three items do not share the same counter (but use three different counters)?

- Hypothesis #hyp ...
- Proposition #prop ...
- (#ex) ...

@mb21 Many thanks for the summary. One addition to it: It is important that references can optionally be put inside curly brackets, as this allows referencing things like

- As predicted by H{@hyp:temp}a...
- As shown in figures {@fig:a}--{@fig:z}...

and having them rendered as:

- As predicted by H1a...
- As shown in figures 1-5...

I believe it is very important that Pandoc translates references to "proper" LaTeX. So, I don't think it is a good idea for Pandoc to include hard coded versions of the reference numbers in the LaTeX code. In HTML or docx doing so is OK, but not in LaTeX. I have two main reasons for this:

LaTeX has facilities for creating hyperlinks, table of contents, tables of figures, etc... All these would be rendered useless if Pandoc hard codes the cross-references.
One of the main use cases of Pandoc in academic writing is to write a first draft in markdown, exporting it to LaTeX, and adding final touches in LaTeX. If the numbers are hard coded, editing in LaTeX will be very fragile and limited. This would also make unfeasible to send a Pandoc-created LaTeX file to a journal or book publisher (which uses LaTeX's cross references to create, e.g., the front matter of the book).

Aaron O'Leary · Answer 74 · Mon Feb 09 2015 06:48:53 GMT+0800 (China Standard Time)

@scaramouche1 yes good point. I'd say enclose in a span or div and use .hypothesis as a class. How would you do this in latex? Is there a hypothesis environment? Is this rendered by mathjax?

Regarding curly brackets.... I'm convinced by the use case, but not by the syntax. I think square brackets would be more consistent, but maybe there is a way with no brackets.

Regarding translation to latex.... absolutely, I think the labels should be passed through as-is so that latex can do its own thing.

Tim T.Y. Lin · Answer 75 · Mon Feb 09 2015 08:12:19 GMT+0800 (China Standard Time)

Now that I've been using Scholdoc for close to a year, I can comment on some of the issues here from experience:

I am leaning more towards using @ as the prefix now (rather than #).

I'm still not entirely convinced that numbered references should clobber the same @ syntax as citations. It is possible run into some edge-cases where it's impossible to tell if something should be a reference or a citation. Both of these types of identifiers (ref keys and cite keys) have the same set of allowed characters in TeX. Using the : rule isn't going to unambiguously solve the issue; I'm certainly no the only one who ends up with a bunch of : in my reference database cite keys, even through no action of my own (mostly through importing colleague's entries). In LaTeX we relied on separating these two with \ref and \cite to avoid issues with namespace pollution.

I really wouldn't have brought this up if it wasn't already possible to use # instead for references in text in an unambiguous fashion. I use this for Scholdoc and it's been working out pretty well for the past year or so. I did remember considering @ for Scholdoc but I abandoned it for reasons that I forgot, although I suspect it's similar to the above.

Regarding curly brackets.... I'm convinced by the use case, but not by the syntax. I think square brackets would be more consistent, but maybe there is a way with no brackets.

I agree that square brackets are "more markdownish". In my experience there are not really any ambiguities caused by this (unless @ is used for the syntax, in which case it again clobbers citation). My stance form the last time we talk about this haven't changed.

LaTeX has facilities for creating hyperlinks, table of contents, tables of figures, etc... All these would be rendered useless if Pandoc hard codes the cross-references.

The most sustainable way is probably to use a new inline product type (or the much-hyped Link with Attr attached, if that is eventually real) that holds both the reference id and a candidate numbering (possibly in the Attr), and let the writer decide what to do. In Scholdoc I had the markdown reader generate the candidate numbering, mainly out of laziness on my part, but arguably this should be done in a filter so it can have the potential to do some IO (i.e., reaching into another document and grab references there).

John MacFarlane · Answer 76 · Mon Feb 09 2015 08:20:52 GMT+0800 (China Standard Time)

+++ Tim T.Y. Lin [Feb 08 15 16:12 ]:

Now that I've been using Scholdoc for close to a year, I can comment on some of the issues here from experience:

I am leaning more towards using @ as the prefix now (rather than #).

I'm still not entirely convinced that numbered references should clobber the same @ syntax as citations. It is possible run into some edge-cases where it's impossible to tell if something should be a reference or a citation. Both of these types of identifiers (ref keys and cite keys) have the same set of allowed characters in TeX. Using the : rule isn't going to unambiguously solve the issue; I'm certainly no the only one who ends up with a bunch of : in my reference database cite keys, even through no action of my own (mostly through importing colleague's entries). In LaTeX we relied on separating these two with \ref and \cite to avoid issues with namespace pollution.

Yes. I have long regretted the fact that @ is used for two different things in pandoc: example list labels and citations. (Not to mention the clash with the increasingly popular twitterish use for usernames.)

However, if we switched to #, we'd break backwards compatibility for example lists. That's a pretty weighty consideration. One option would be to have an extension that enables the legacy behavior.

Tim T.Y. Lin · Answer 77 · Mon Feb 09 2015 08:45:35 GMT+0800 (China Standard Time)

However, if we switched to #, we'd break backwards compatibility for example lists. That's a pretty weighty consideration. One option would be to have an extension that enables the legacy behavior.

@jgm I think it's possible to keep the old syntax for citation lists though. Currently example lists (using symmetric @ syntax) and x-references work side-by-side in Scholdoc, since they use completely separate mechanisms.

We will have an additional ambiguity, if we use @ for in-text reference, with the definition of example lists. Of course this is already the case with current syntax. This isn't much of an issue now since definition of EL labels happens at the block level which takes precedence over referencing at the inline level, but if @ usage becomes more common (and using @ for inline anchor definition is somehow permitted) then obviously the rate of edge cases will increase.

Of course, if we switch to uniformly using # for defining anchors anyways, then this would affect example lists regardless of the choice of reference syntax.

(Not to mention the clash with the increasingly popular twitterish use for usernames.)

Example lists aside, I actually really like how it matches the notion of "referring to someone" for citations. My dream is to somehow be able to resolve DOIs/ISSN/PMID/ArXivID as cite keys (it's on my todo list for Scholdoc-citeproc). How cool would it be to do, e.g., @10.1190/1.234567 if it were somehow possible to unambiguously resolve the information.

I believe this also influenced my choice to use # for cross-references as well… it's like referring to a concept or a context, similar to how hashtags are currently used.

Aaron O'Leary · Answer 78 · Mon Feb 09 2015 15:38:04 GMT+0800 (China Standard Time)

@timtylin: yes, I think this is best done in a filter, similar to pandoc-citeproc now.

I'm not completely sold on @ yet, my position is more undecided (vs #). Importing other peoples keys is a reason for having both, but you risk your cite keys getting clobbered in this case anyway. Latex does have distinct \cite and \ref, but was this for a compelling reason or is it historical? (I'm not sure)

@timtylin: going a bit off topic: I'm not sure that explicitly typing a doi is the most user friendly thing to do when referencing something, but yes it would be great to be able to refer to another doi's figures like that! Regardless, did you know that you could do this:

curl -LH "Accept: text/bibliography; style=bibtex" "http://dx.doi.org/10.1017/S0022112061000019"

Mauro Bieg · Answer 79 · Sun Feb 15 2015 22:58:43 GMT+0800 (China Standard Time)

Okay, I see the need for having native LaTeX references. This leaves us with three possibilities:

Add a native reference type to the pandoc data type (cleanest appraoach but very involved: all writers need to be adjusted accordingly and call a shared numbering module).
Implementing the counter and number placement in a filter instead of the Markdown Reader (document conversion might take up to twice as long).
Do all the stuff in the Markdown Reader as discussed, then have the LaTeX Writer extract that again to write native LaTeX references (this approach is somewhat of a hack and potentially error prone).

Personally, I’m tending toward (2) as a reasonable trade-off between maintainable code and implementation effort. (Except if someone has time to do (1)). Even if conversions will be slower, since in my experience LaTeX is always dominating conversion times over pandoc itself anyhow.

Once again, I don’t like standalone curly brackets, because markdown has generally been following the HTML tradition of having attributes and anchors only on elements that span some text. I think the inline spans with auto-generating numbers will suffice. As posted above:

Inline spans:
  [Hypothesis]{#hyp:temp} is that...
  <p><span id="hyp:temp">Hypothesis 1</span> is that...</p>

Ben Gamari · Answer 80 · Wed Feb 18 2015 05:51:22 GMT+0800 (China Standard Time)

@mb21 I would be happy to give (1) a try after I defend in June. It seems like we have already taken on enough technical debt in the name of "it's hard to add new AST nodes".

Thomas J. Duck · Answer 81 · Sat Mar 14 2015 23:06:44 GMT+0800 (China Standard Time)

I wrote a filter to number figures and references: pandoc-fignos. The syntax follows the recommendations by @scaramouche1 on Jan. 18.

Demonstration: input demo.md and output pdf, tex, html, epub and md.

Details: The filter should work with any output format. For LaTeX the \label and \ref macros are used. For everything else the numbers are hard-coded. Caption formatting is retained. There is no linking. A filter option allows image attributes to be left in place for further processing.

scaramouche1 · Answer 82 · Mon Mar 16 2015 06:00:49 GMT+0800 (China Standard Time)

Thank you @tomduck! This seems like an excellent addition to Pandoc. I also saw you are working on pandoc-eqnos. Superb.

Thomas J. Duck · Answer 83 · Mon Mar 23 2015 09:54:46 GMT+0800 (China Standard Time)

Appreciated, @scaramouche1. And thanks for your and others efforts in putting together a well thought out spec.

I have been using pandoc-fignos for figure numbering and pandoc-eqnos for equation numbering in my academic writing over the past week or so. Both have been working well. People are welcome to file issues against them if problems are uncovered. Cheers.

Nikolay Yakimov · Answer 84 · Mon Mar 23 2015 17:09:54 GMT+0800 (China Standard Time)

Sorry for shameless self-promotion, but for anyone interested¹, there's also a Haskell implementation of similar idea², called pandoc-crossref. A couple additional features I personally find handy, like references to tables and list of figures/tables generation are included. Also some output configurability, like delimiters, etc, through metadata.

e.g., anyone bad with Python, like me ↩
although, syntax is slightly different to allow for automatic sequence collapsing, e.g. reference to 1,2,3 will collapse into 1-3, like LaTeX cleveref package (which is an option for latex output, by the way) ↩

mangecoeur · Answer 85 · Fri Mar 27 2015 23:34:13 GMT+0800 (China Standard Time)

@lierdakil - works great thanks! Though took me a few goes to realise i had to (re)install pandoc via Cabal (only used Python filters so far) ;) Seems to me this would make a good basis for including as a standard pandoc feature...

A. S. Budden · Answer 86 · Wed Apr 15 2015 22:12:58 GMT+0800 (China Standard Time)

@lierdakil This is excellent, thank you. It would be great to see this integrated into pandoc as standard: how hard would it be to merge it in?

Nikolay Yakimov · Answer 87 · Wed Apr 15 2015 22:57:09 GMT+0800 (China Standard Time)

@abudden at the moment, this is not possible. For this to be included in pandoc, we'd need to revise document model to add attributes to all block elements. While that's possible, and there have been some work on it (see new-image-attributes branch), it's still too early to include in Pandoc, and it would break backwards compatibility in a major way. At the moment, due to document model limitations, pandoc-crossref relies on a hacky post-parsing solution and has a very limited syntax, so It's not something I'm confident enough to include into pandoc.

If/when block attributes are supported, we could talk about merging pandoc-crossref into mainline pandoc, but since pandoc-citeproc is a filter, I see no obvious reason to include pandoc-crossref into mainline pandoc. I plan publishing pandoc-crossref on hackage soon, so installation should be somewhat simplified in the future.

Including some degree of support (e.g. similar to --bibliography implying pandoc-citeproc) would be fine though, I suppose. But that's a little far off, not until after 1.14 release at least.

Dmitry V. Luciv · Answer 88 · Tue Aug 18 2015 00:39:36 GMT+0800 (China Standard Time)

More over, it will be useful to link to anything. LaTeX allows doing so: http://tex.stackexchange.com/a/4024/70953

Frederik Aust · Answer 89 · Tue Aug 18 2015 15:39:53 GMT+0800 (China Standard Time)

For what it's worth, I just wanted to say that I would greatly appreciate the addition of a syntax like the one proposed by @scaramouche1.

Hadrien Mary · Answer 90 · Fri Sep 25 2015 18:57:47 GMT+0800 (China Standard Time)

That feature would be really cool.

For now I use \label{} and \ref{} from latex and it works when I convert to pdf but it don't when it comes to generate Word file or HTML :-(

Mauro Bieg · Answer 91 · Thu Dec 03 2015 06:09:44 GMT+0800 (China Standard Time)

What do you think of adding a Figure element to the Pandoc AST instead of adding Attr to Table? There was some discussion in that direction in #673, plus a good discussion on pandoc-discuss which I just revived.

Something along the lines of Figure Attr [Block] [Block]—a figure with a caption (which can contain markdown) and containing block elements (like one or more tables, images, blockquotes, codeblocks, etc).

Would it be good enough if you can just reference Figures, so we wouldn't have to add Attr to Table?

Deleted user · Answer 92 · Thu Dec 03 2015 06:44:01 GMT+0800 (China Standard Time)

That would cover all the uses I can think of off the top of my tired head.

John MacFarlane · Answer 93 · Thu Dec 03 2015 07:27:56 GMT+0800 (China Standard Time)

Yes, this is sensible. Figure could then be used for images
as figures, instead of the current hack of treating a Para
with just an image as a figure.

+++ Mauro Bieg [Dec 02 15 14:09 ]:

What do you think of adding a Figure element to the Pandoc AST instead
of adding Attr to Table? There was some discussion in that direction in
[1]#673, plus a good [2]discussion on pandoc-discuss which I just
revived.

Something along the lines of Figure Attr [Block] [Block]—a figure with
a caption (which can contain markdown) and containing block elements
(like one or more tables, images, blockquotes, codeblocks, etc).

Would it be good enough if you can just reference Figures, so we
wouldn't have to add Attr to Table?

—
Reply to this email directly or [3]view it on GitHub.

References

#673

https://groups.google.com/forum/#!topic/pandoc-discuss/zlSp_u3oEO0

#813 (comment)

Mauro Bieg · Answer 94 · Fri Dec 04 2015 19:40:57 GMT+0800 (China Standard Time)

@lierdakil do you think we could make this work with:

the proposed Figure/Float element as a block container with attributes
the new Link attributes, which we have in master now, as reference elements

Or do we really need attributes on more block elements (table, blockquote, etc.) and/or a dedicated reference element?

Tim T.Y. Lin · Answer 95 · Fri Dec 04 2015 20:04:12 GMT+0800 (China Standard Time)

@mb21 @lierdakil The former was what Scholdoc did and I personally think this would be the way to go. It's a better way to go if we want to add future attributes that only matter in a figure context (such as placement info, pre-rendered fallback image, etc), which you can already see some of in Scholdoc's current block type

Mauro Bieg · Answer 96 · Fri Dec 04 2015 20:19:40 GMT+0800 (China Standard Time)

@timtylin thanks, good to hear you're still in favour of the Figure element.
I took a closer look at Scholdoc's Figure, anything we should learn from this for pandoc? The PreparedContent is for "pre-rendered fallback image, etc", right? I guess that's kind of out of scope for pandoc... And what about the FigureType? I was thinking of handling that as part of the attribute, for output formats that need to know this: like {#fig:my-figure} (or even {type=figure}). So we don't have to change the AST whenever there's a new figure type. So we have just Figure Attr [Block] [Block] (in case there are formats where captions can be blocks as well). What do you think?

Nikolay Yakimov · Answer 97 · Fri Dec 04 2015 20:40:55 GMT+0800 (China Standard Time)

We already have a block container with attributes (it's called div).
Better syntax is basically all we lack with it. But I still think that
attributes on all or at least most blocks makes more sense, at least when
thinking in terms of xml- and html-based formats.

As for reference elements, it really doesn't matter that much. I see no
immediate need for attrs on reference elements (although those might come
in handy), and we're basically free to choose whatever, if syntax would
make sense.
4 дек. 2015 г. 15:04 пользователь "Tim T.Y. Lin" notifications@github.com
написал:

@mb21 https://github.com/mb21 @lierdakil https://github.com/lierdakil
The former was what Scholdoc did and I personally think this would be the
way to go. It's a better way to go if we want to add future attributes that
only matter in a figure context (such as placement info, pre-rendered
fallback image, etc), which you can already see some of in Scholdoc's
current block type
https://github.com/timtylin/scholdoc-types/blob/master/Text/Pandoc/Definition.hs#L217

—
Reply to this email directly or view it on GitHub
#813 (comment).

Aaron O'Leary · Answer 98 · Sat Dec 05 2015 00:25:11 GMT+0800 (China Standard Time)

Having Figure Attr [Block] [Block] does feel a bit redundant when we already have Div Attr [Block]. Why not just treat the first Para (or multiple) as the caption? I suppose the Figure caption can have completely arbitrary content (Figures all the way down!), rather than just Para.

I know there isn't agreed Div syntax yet, but I would also favour not using English words to specify the container.

If the Figure type can also contain tables then it looks more like a new Referable type than a dedicated figure container (could contain e.g. code blocks, block quotes as well). If this is the case then I'm not yet sold on the advantage cf. Div - is it worth it just to have the distinct caption field? Are there other things we haven't considered?

Another thing is that Figure logic (e.g. fancy placements) might be best handled by a filter.

I'm not sure what the best solution is here. I can see merit in both ways.

Deleted user · Answer 99 · Sat Dec 05 2015 00:36:11 GMT+0800 (China Standard Time)

@lierdakil Just weighing in quickly on the matter of references. Reference attributes would make it nice and easy to reference to, e.g. reference the section a table is in rather than than the table itself, or the controversial pageref. Best to make the deep changes now so that it is just the matter of making changes in the readers/writers later on.

New Feature: internal links to tables and figures and headers

Anchors and references

Anchors

References

Standard and special anchors

Rendering references

References with long hyperlinks

Translating to LaTeX

Translating to HTML

Other ideas

Non-standard position of the references

Custom numbering schemes

Extension: Unnumbered anchors and references

Translating to HTML and LaTeX

Unnumbered references

Dealing with an erroneous reference to an unnumbered anchor

Footnotes