DEVSENSE / Phalanger

PHP 5.4 compiler for .NET/Mono frameworks. Predecessor to the opensource PeachPie project (www.peachpie.io).

Home Page:http://v4.php-compiler.net/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implementation of 'htmlspecialchars' is not complete

linsmod opened this issue · comments

the htmlspecialchars will not translate any word in this list
"&", """, "'", "<", ">"

but HtmlSpecialCharsEncode does not implements the logic, so if the input contains any words in the list above, the output will be unexpected.
eg. & will be translated to &

found the issue when woking with wordpress 4.6.1

And the function should not translate symbol & to & A M P when its not an attribute.
this is my personal fix
`
internal static string HtmlSpecialCharsEncode(string str, int index, int length, QuoteStyle quoteStyle, string charSet)
{

        if (str == null) return String.Empty;

        Debug.Assert(index + length <= str.Length);

        StringBuilder result = new StringBuilder(length);

        // quote style is anded to emulate PHP behavior (any value is allowed):
        string single_quote = (quoteStyle & QuoteStyle.SingleQuotes) != 0 ? "&#039;" : "'";
        string double_quote = (quoteStyle & QuoteStyle.DoubleQuotes) != 0 ? "&quot;" : "\"";
        var strArray = new string(str.Skip(index).Take(length).ToArray()).Split('&');
        var strList = new List<string>();
        foreach (var item in strArray)
        {
            for (int i = 0; i < item.Length; i++)
            {
                char c = item[i];
                switch (c)
                {
                    case '&':
                        result.Append("&amp;"); break;
                    case '"':
                        result.Append(double_quote); break;
                    case '\'':
                        result.Append(single_quote); break;
                    case '<':
                        result.Append("&lt;"); break;
                    case '>':
                        result.Append("&gt;"); break;
                    default:
                        result.Append(c); break;
                }
            }
            strList.Add(result.ToString());
            result.Clear();
        }
        return string.Join("&", strList);
    }`

Does the behaviour of Phalanger differ from PHP?

It appears you are stating that Phalanger escapes &amp; to &amp;amp; when run through htmlspecialchars(). If that is your intended message, then this behaviour is consistent with the PHP implementation and thus is not a bug and will not be changed.

Right, if you have a small test case in PHP, please try it with Phalanger and legacy PHP first whether it differs.

code:
echo 'quote_style:'.$quote_style; echo 'charset:'.$charset; echo 'double_encode:'.$double_encode; echo $string; echo htmlspecialchars($string); $string = @htmlspecialchars( $string, $quote_style, $charset, $double_encode ); die($string);

offical php output:
quote_style:3charset:UTF-8double_encode:http://localhost:8000/wp-admin/load-styles.php?c=0&amp;dir=ltr&amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;amp;dir=ltr&amp;amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;dir=ltr&amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;ver=4.6.1

Phalanger output:
quote_style:3charset:UTF-8double_encode:http://localhost:8000/wp-admin/load-styles.php?c=0&amp;dir=ltr&amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;amp;dir=ltr&amp;amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;amp;dir=ltr&amp;amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;amp;ver=4.6.1

when invoke htmlspecialchars( $string ) twice, the official php and Phalanger get the same result
however if the second invokation with parameters
htmlspecialchars( $string ) htmlspecialchars( $string, $quote_style, $charset, $double_encode );

the test results are different.

the official php seems like fixed &amp;amp; but Phalanger not.

Please can you tidy your test case to separate the outputs so that we can see what is output by which part of the test. It is still not clear what you are actually stating is the problem, i.e. which circumstance causes the issue you perceive. As an example, it is not clear what value $double_encode has in your example: is it true, false, empty string, null, ....?

known little about php, how can i know the $double_encode is ture/false, or empty string or null or someting else?

I also meet this problem when try wp4.6.1 with Phalanger.
hlizard/WpDotNet@72a0595

居然是同胞,我也是不懂php,太巧了