DCsunset / pandoc-include

A pandoc filter to allow file and header inclusion

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Too strict on "File not found"

dmitryperets opened this issue · comments

When attempting to include a file that doesn't exist, version 1.2.0 would simply print a warning:

eprint('[Warn] included file not found: ' + name)

But version 1.2.1 throws an IOError, essentially aborting pandoc:

raise IOError(f"Included file not found: {name}")

That's too harsh, in my opinion. For example, I was trying to render a file which is an "instruction" on how to use pandoc-include itself. So there would be a line there, like

!include <your_file>

And this line is now failing the whole rendering.
I believe a warning was more appropriate.

That's a good point. I think a way to solve this is to make the default behaviour emit a warning. Meanwhile, we can provide an environment variable to force an error if it's set.

The latest commit should address this issue. You could install the latest version from git if you want to test it immediately

Hi @DCsunset,

With the latest commit, I get this error:

Traceback (most recent call last):
  File "/usr/local/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/pandoc_include/main.py", line 408, in main
    return pf.run_filter(action, doc=doc)
  File "/usr/local/lib/python3.10/dist-packages/panflute/io.py", line 227, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/panflute/io.py", line 208, in run_filters
    doc = doc.walk(action, doc=doc, stop_if=stop_if)
  File "/usr/local/lib/python3.10/dist-packages/panflute/base.py", line 264, in walk
    child = child.walk(action, doc, stop_if)
  File "/usr/local/lib/python3.10/dist-packages/panflute/base.py", line 264, in walk
    child = child.walk(action, doc, stop_if)
  File "/usr/local/lib/python3.10/dist-packages/panflute/containers.py", line 152, in walk
    ans = [(k, v) for k, v in ans if v != []]
  File "/usr/local/lib/python3.10/dist-packages/panflute/containers.py", line 152, in <listcomp>
    ans = [(k, v) for k, v in ans if v != []]
  File "/usr/local/lib/python3.10/dist-packages/panflute/containers.py", line 151, in <genexpr>
    ans = ((k, v.walk(action, doc, stop_if)) for k, v in self.items())
  File "/usr/local/lib/python3.10/dist-packages/panflute/base.py", line 264, in walk
    child = child.walk(action, doc, stop_if)
  File "/usr/local/lib/python3.10/dist-packages/panflute/containers.py", line 86, in walk
    ans = list(chain.from_iterable(ans))
  File "/usr/local/lib/python3.10/dist-packages/panflute/containers.py", line 84, in <genexpr>
    ans = ((item,) if type(item) is not list else item for item in ans)
  File "/usr/local/lib/python3.10/dist-packages/panflute/containers.py", line 82, in <genexpr>
    ans = (item.walk(action, doc, stop_if) for item in self)
  File "/usr/local/lib/python3.10/dist-packages/panflute/base.py", line 272, in walk
    altered = action(self, doc)
  File "/usr/local/lib/python3.10/dist-packages/pandoc_include/main.py", line 238, in action
    options = parseOptions(doc)
  File "/usr/local/lib/python3.10/dist-packages/pandoc_include/config.py", line 110, in parseOptions
    if options["process-path"] is None:
KeyError: 'process-path'
Error running filter pandoc-include:
Filter returned error status 1

Note that I am running it all in docker, and this is how I installed your latest fix (might have done it wrong?):

RUN pip3 install --force-reinstall git+https://github.com/DCsunset/pandoc-include.git#egg=pandoc-include

Let's ignore the problem with process-path, maybe that's something with my environment... I've moved to the local machine (no docker), and still I have these failures:

(local_pandoc) dperets@dperets-mac wptest % cat test-include.md                                                
!include filters.md

(local_pandoc) dperets@dperets-mac wptest % cat filters.md 
### pandoc-include

!include <file-doesnt-exist>

$include <file-doesnt-exist>


```
!include <file>

$include <file>
```

Result:

(local_pandoc) dperets@dperets-mac wptest % pandoc test-include.md --filter pandoc-include -o test-include.html
[INFO] including file 'filters.md'... ok
Traceback (most recent call last):
  File "/Users/dperets/git/wptest/local_pandoc/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/pandoc_include/main.py", line 408, in main
    return pf.run_filter(action, doc=doc)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/io.py", line 227, in run_filter
    return run_filters([action], *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/io.py", line 208, in run_filters
    doc = doc.walk(action, doc=doc, stop_if=stop_if)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/base.py", line 264, in walk
    child = child.walk(action, doc, stop_if)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/containers.py", line 86, in walk
    ans = list(chain.from_iterable(ans))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/containers.py", line 84, in <genexpr>
    ans = ((item,) if type(item) is not list else item for item in ans)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/containers.py", line 82, in <genexpr>
    ans = (item.walk(action, doc, stop_if) for item in self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/base.py", line 272, in walk
    altered = action(self, doc)
              ^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/pandoc_include/main.py", line 248, in action
    includeType, name, config = is_include_line(elem)
                                ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/pandoc_include/main.py", line 87, in is_include_line
    includeType, name, config = extract_info(rawString)
                                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/pandoc_include/main.py", line 60, in extract_info
    raise ValueError(f"Unable to extract info from include line {rawString}")
ValueError: Unable to extract info from include line !include `<file-doesnt-exist>`{=html}
Error running filter pandoc-include:
Filter returned error status 1

Traceback (most recent call last):
  File "/Users/dperets/git/wptest/local_pandoc/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/pandoc_include/main.py", line 408, in main
    return pf.run_filter(action, doc=doc)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/io.py", line 227, in run_filter
    return run_filters([action], *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/io.py", line 208, in run_filters
    doc = doc.walk(action, doc=doc, stop_if=stop_if)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/base.py", line 264, in walk
    child = child.walk(action, doc, stop_if)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/containers.py", line 86, in walk
    ans = list(chain.from_iterable(ans))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/containers.py", line 84, in <genexpr>
    ans = ((item,) if type(item) is not list else item for item in ans)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/containers.py", line 82, in <genexpr>
    ans = (item.walk(action, doc, stop_if) for item in self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/base.py", line 272, in walk
    altered = action(self, doc)
              ^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/pandoc_include/main.py", line 329, in action
    new_doc = pf.convert_text(
              ^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/tools.py", line 487, in convert_text
    out = inner_convert_text(text, in_fmt, out_fmt, extra_args, pandoc_path=pandoc_path)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/tools.py", line 510, in inner_convert_text
    out = run_pandoc(text, args, pandoc_path=pandoc_path)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/panflute/tools.py", line 408, in run_pandoc
    raise IOError('')
OSError
Error running filter pandoc-include:
Filter returned error status 1

Since you can run it locally, could you try running the test in this repo? You just need to clone it and run make in the test directory? I'm not sure why it still shows an exception but maybe it is the issue of the version you installed.

Also, I've fixed the previous KeyError in case the application state is corrupted due to some reason.

@DCsunset I managed to run it, but I found two remaining issues with non-existing files:

First - it doesn't like the "< >". So this crashes:

!include <file-doesnt-exist>
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/pandoc_include/main.py", line 60, in extract_info
    raise ValueError(f"Unable to extract info from include line {rawString}")
ValueError: Unable to extract info from include line !include `<file-doesnt-exist>`{=html}

... but this works fine:

!include file-doesnt-exist

Second - include inside fenced code blocks still has the original issue, that is, it raises IOError exception instead of the warning:

```
!include file-doesnt-exist
```
  File "/Users/dperets/git/wptest/local_pandoc/lib/python3.11/site-packages/pandoc_include/main.py", line 379, in action
    raise IOError(f"File not found: {name}")
OSError: File not found: file-doesnt-exist

Note: pandoc-include 1.2.0 handles both these issues successfully.

The first case is caused by a failed regex match. Do you have any idea why it fails? @studerluk

I have fixed the second case in the latest commit.

Looking at the error message I assume this happens because pandoc is treating the filename placeholder as HTML (note the added {=html}):

Raw include line from error message: !include `<file-doesnt-exist>`{=html}

The regex pattern expects the include line to be finished after the filename which wouldn't be the case with the addition of {=html}

Of the top of my head I see two possible solutions that might resolve the issue:

  1. Remove the line ending condition from the regex pattern.
  2. Add handling of such added tags, like {=html}, to the regex pattern for them to be ignored.

In the mean time @dmitryperets you can maybe try putting the place holder in code ticks to try to force pandoc to not treat it as HTML tag: !include `<file-doesnt-exist>`

@studerluk Thanks for looking into it. I think probably we can adopt your second approach to ignore all contents in the curly braces after the quote (still able to capture other input errors).

I don't fully yet understand the regex to make the change by myself. Do you know what \9 means in the regex? You can also submit a PR to fix it if you are willing to.

@studerluk Actually, I found a better way to solve this. The extra tags are added because it uses extended Markdown syntax. For file include, we don't want them so it's better to use markdown_strict, which prevents adding such tags.

@dmitryperets The latest commit should fix all the above issues. Feel free to try it again.

@studerluk Thanks for looking into it. I think probably we can adopt your second approach to ignore all contents in the curly braces after the quote (still able to capture other input errors).

I don't fully yet understand the regex to make the change by myself. Do you know what \9 means in the regex? You can also submit a PR to fix it if you are willing to.

\9 references the 9th capture group of the regex. In this case ([\`\'\"])

@DCsunset I confirm that the latest version successfully passes all my tests. Thanks!

The fix is included in v1.3.0. Closing it now.