Why the input is not proper UTF-8?
geekplux opened this issue · comments
I have got a error when click the RSS button in my blog:
This page contains the following errors:
error on line 898 at column 35: Input is not proper UTF-8, indicate encoding !
Bytes: 0xE4 0xBB 0x8D 0xE6
How to fix it? thanks..
Hi there! Having the same error.
Same
error on line 80 at column 13: Input is not proper UTF-8, indicate encoding !
Bytes: 0x10 0xEF 0xBC 0x8C
error on line 411 at column 35: Input is not proper UTF-8, indicate encoding !
Bytes: 0x08 0xE6 0x9C 0x8D
I force encode these above problematic bytes using UTF-8, turns out fine.
So I guess it is worth a try to force encode them using UTF-8. However I'm not familiar with JS.
Same problem in my blog:
error on line 1192 at column 299: Input is not proper UTF-8, indicate encoding !
Bytes: 0x10 0xE4 0xB8 0x80
I find that it is caused by invisible characters in the posts. Removing these characters solves the problem.
This is my way to solve the problem:
error on line 1192 at column 299: Input is not proper UTF-8, indicate encoding !
Bytes: 0x10 0xE4 0xB8 0x80
- Download
atom.xml
and open it with your favorite text editor. - Open
atom.xml
in Chrome and check if there are any error messages. If there are no error messages, go to step 7. - Notice the first "byte" followed by
Bytes:
in the error message. As an example, the message above indicates that there are one or several0x10
characters in theatom.xml
. - Search for the regular expression pattern
\x10
and you should find where these invisible characters are. They are probably in post contents. - Open the post's source file (*.md) and replace
\x10
with empty string. - Regenerate
atom.xml
. - Go to step 1.
- Done.
However, I still have no idea why these characters appear in my posts.
@lujjjh Thanks!
Inspired by you, firstly I open atom.xml by vim, then goto line 1192 and find the relevant blog. Then I open the *.md file using vim, and find some strange character such as ^p, ^h. After I delete these characters, all seems work well.
By the way, I have no idea why these characters appear in two of my posts too.
突然发现很多中文用户遇到这个问题,并且大部分情况下是 at column 35:
,
我分别在虚拟机和主站点上搭建相同内容有时刷新出来的错误行不一样,但是列都是35。
通过生成的 atom.xml 文件来找,报错的行基本上都是这样的内容
<content type="html"><![CDATA[<p>
第35列刚好是在CDATA后面的方括号那里。
我比较了楼上的 http://selfboot.cn/atom.xml 和 http://geekplux.com/atom.xml 均是如此。
另外本问题在我的电脑上只出现在 Chromium 浏览器中,换Chrome或者哪怕是IE等就显示正常,
Firefox没测试,所以猜测是我浏览器的问题……