oblac / jodd

Jodd! Lightweight. Java. Zero dependencies. Use what you like.

Home Page:https://jodd.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LagartoParser ArrayIndexOutOfBoundsException due to Wrong Character Reference Parsing

janArb opened this issue · comments

Current behavior

LagartoParser crashes with java.lang.ArrayIndexOutOfBoundsException for input with & character that is no valid Character Reference, e.g. "CO&CO", i.e. simple text.

Expected behavior

Absence of Character Reference is detected, & character parsed as simple text and not part of a character reference.

Steps to Reproduce the Problem

    try {
        new LagartoParser("Jean-Pierre Vitrac, CO&CO").parse(new EmptyTagVisitor());
    } catch (ArrayIndexOutOfBoundsException e) {
        e.printStackTrace();
    }

produces

java.lang.ArrayIndexOutOfBoundsException: 25
at jodd.net.HtmlDecoder.detectName(HtmlDecoder.java:217)
at jodd.lagarto.LagartoParser._consumeCharacterReference(LagartoParser.java:228)
at jodd.lagarto.LagartoParser.consumeCharacterReference(LagartoParser.java:211)
at jodd.lagarto.LagartoParser$1.parse(LagartoParser.java:178)
at jodd.lagarto.LagartoParser.parse(LagartoParser.java:135)

This issue is incorrect.
In HTML, & is a reserved character, so in your case it should be encoded as & see https://developer.mozilla.org/en-US/docs/Glossary/Entity.

True @slandelle! However, Lagarto collects errors during parsing and should try to continue. Thanx for reporting @janArb !

It was a subtle bug after all :)

Thanks for fixing!

No worries, @janArb, new release will be out soon.