JuliaDocs / Documenter.jl

A documentation generator for Julia.

Home Page:https://documenter.juliadocs.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ERROR: LoadError: PCRE compilation error: regular expression is too large

AdamWysokinski opened this issue · comments

Hi,
I keep getting the following error:

[ Info: ExpandTemplates: expanding markdown templates.
ERROR: LoadError: PCRE compilation error: regular expression is too large at offset 35288

I was able to trace the line causing the issue:

NeuroAnalyzer.xcov(obj1::NeuroAnalyzer.NEURO, obj2::NeuroAnalyzer.NEURO; ch1::Union{Int64, Vector{Int64}, AbstractRange}=signal_channels(obj1), ch2::Union{Int64, Vector{Int64}, AbstractRange}=signal_channels(obj2), ep1::Union{Int64, Vector{Int64}, AbstractRange}=_c(nepochs(obj1)), ep2::Union{Int64, Vector{Int64}, AbstractRange}=_c(nepochs(obj2)), l::Real=1, demean::Bool=true, biased::Bool=true, method::Symbol=:sum)

I see nothing wrong with it. When I remove any two of the function arguments, it works fine and completes with no error.

Julia 1.10.2

Is there a stacktrace or an MWE you could put together? Not really sure which regex is blowing up, though minimally it looks like we should add some error handling somewhere.

Just for extra context: that string is inside a @docs block at https://github.com/JuliaHealth/NeuroAnalyzer.jl/blob/main/docs/src/index.md

That's correct. It worked in the past, unfortunately I cannot trace when it went broken.

And here's the stacktrace:

[ Info: SetupBuildDirectory: setting up build directory.
[ Info: Doctest: running doctests.
[ Info: ExpandTemplates: expanding markdown templates.
ERROR: LoadError: PCRE compilation error: regular expression is too large at offset 35309
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] compile(pattern::String, options::UInt32)
    @ Base.PCRE ./pcre.jl:165
  [3] compile(regex::Regex)
    @ Base ./regex.jl:80
  [4] Regex(pattern::String, compile_options::UInt32, match_options::UInt32)
    @ Base ./regex.jl:40
  [5] Regex
    @ ./regex.jl:68 [inlined]
  [6] find_block_in_file(code::String, file::String)
    @ Documenter.Utilities ~/.julia/packages/Documenter/bFHi4/src/Utilities/Utilities.jl:24
  [7] runner(::Type{Documenter.Expanders.DocsBlocks}, x::Markdown.Code, page::Documenter.Documents.Page, doc::Documenter.Documents.Document)
    @ Documenter.Expanders ~/.julia/packages/Documenter/bFHi4/src/Expanders.jl:277
  [8] dispatch(::Type{Documenter.Expanders.ExpanderPipeline}, ::Markdown.Code, ::Vararg{Any})
    @ Documenter.Utilities.Selectors ~/.julia/packages/Documenter/bFHi4/src/Utilities/Selectors.jl:170
  [9] expand(doc::Documenter.Documents.Document)
    @ Documenter.Expanders ~/.julia/packages/Documenter/bFHi4/src/Expanders.jl:42
 [10] runner(::Type{Documenter.Builder.ExpandTemplates}, doc::Documenter.Documents.Document)
    @ Documenter.Builder ~/.julia/packages/Documenter/bFHi4/src/Builder.jl:227
 [11] dispatch(::Type{Documenter.Builder.DocumentPipeline}, x::Documenter.Documents.Document)
    @ Documenter.Utilities.Selectors ~/.julia/packages/Documenter/bFHi4/src/Utilities/Selectors.jl:170
 [12] #2
    @ ~/.julia/packages/Documenter/bFHi4/src/Documenter.jl:249 [inlined]
 [13] cd(f::Documenter.var"#2#3"{Documenter.Documents.Document}, dir::String)
    @ Base.Filesystem ./file.jl:112
 [14] #makedocs#1
    @ ~/.julia/packages/Documenter/bFHi4/src/Documenter.jl:248 [inlined]
 [15] top-level scope
    @ ~/Documents/Code/NeuroAnalyzer.jl/docs/make_md.jl:30
in expression starting at /home/eb/Documents/Code/NeuroAnalyzer.jl/docs/make_md.jl:30

It looks like you have a huge doctest @docs block somewhere, which means that the logic we use to find its linenumbers breaks:

rcode = "\\h*" * replace(regex_escape(code), "\\n" => "\\n\\h*")
blockidx = findfirst(Regex(rcode), content)

We probably should switch away from using a regex for this.

Side note: it also looks like you're using an old Documenter version (0.27 branch I suspect).

Oh, yea, the at-docs blocks in https://github.com/JuliaHealth/NeuroAnalyzer.jl/blob/f2bba13cf8c41f76452c3fa0c5727f7eb1fe5191/docs/src/index.md?plain=1#L681 are really big. At least one of them is apparently more than 35KiB.

As a workaround, I think if you just split the biggest ones into multiple smaller one, it will fix the issue.

But also, just as a suggestion, you may want to consider using at-autodocs here, with a custom filter -- I suspect maintaining those lists by hand is not pleasant.

I generate it automatically via bash script, e.g.

echo "\`\`\`@docs"
cat ../src/recorder/*.jl | grep ^function | sed s/"function "/"NeuroAnalyzer."/g
echo "\`\`\`"

I've tried using at-autodocs, but cannot setup Pages properly. How can I set it to point to all .jl files in src/recorder folder? (like in the example above)?

The workaround you suggested helped, thanks!

Like I was hinting at on Slack: Maybe change that bash script to

echo "\`\`\`@docs"
cat ../src/recorder/*.jl | grep ^function | sed s/"function "/"NeuroAnalyzer."/g | sed s/"(.*)"//g | sort -u
echo "\`\`\`"

which strips the (extremely long) argument lists. That way, you get a docstring per function, not per method. Function docstrings automatically concatenate all method docstrings, with the drawback that you can't link to a specific method docstring anymore. I usually prefer the function docstrings over individual method docstrings, but your mileage may vary. It would definitely cut down the size of your @docs block dramatically.

Or, as suggested, use @autodocs, which keeps the individual method docstrings separate. For setting Pages correctly, it might help that the right-hand-side can be arbitrary Julia code. So as long as you can express the list of .jl files you want to include it in a one-liner, that should work. That feature of "arbitrary code" is mentioned in the manual for @index blocks, where it gives the example

```@index
Pages = map(file -> joinpath("man", file), readdir("man"))
```

That trick also applies to @autodocs and any similar Documenter-specific block.

Thank you. The modified bash script works really good. I've tried @autodocs, but for some reason not all functions were rendered properly. I don't have time right now, but will investigate it later and submit an issue if necessary. Meanwhile, docstring per function is a perfect solution for my needs. Thanks again!