🐛 Bug Can not handle img
youngjuning opened this issue · comments
Describe the bug
A clear and concise description of what the bug is.
HTML Input
<figure><img class="lazyload inited loaded" data-src="https://i.loli.net/2020/08/13/cVomW7L9YOTw2uA.png" data-width="800" data-height="600" src="https://i.loli.net/2020/08/13/cVomW7L9YOTw2uA.png"><figcaption></figcaption></figure>
Generated Markdown
<img class="lazyload inited loaded" data-src="https://i.loli.net/2020/08/13/cVomW7L9YOTw2uA.png" data-width="800" data-height="600" src="https://i.loli.net/2020/08/13/cVomW7L9YOTw2uA.png">
Expected Markdown
nonting
fix it on my way: youngjuning/homebrew-juejin-spider@fa99c0b
I assume that you meant following html:
<figure>
<img
class="lazyload inited loaded"
data-src="https://i.loli.net/2020/08/13/cVomW7L9YOTw2uA.png"
data-width="800"
data-height="600"
src="" // empty?
>
<figcaption></figcaption>
</figure>
The "src" attribute is empty because the image is lazy-loaded.
I have thought about using “data-src” automatically when “src” is empty.
But there are three problems:
- "data-src" can filled with any data. It is not guaranteed to contain the url.
- The image url could also be somewhere else, like “data-lazy-url”.
- Some websites display a placeholder (for example “blank.gif”, the colours of the image, or a really low resolution of the image) in the “src”. The library can't really find out which image url is better unless it loads the images...
@youngjuning If you know the website and you know what the rules for lazy-loading are, I would recommend the following function:
// The hook-function is called before the rules are run. You can change the html that is passed to the "img" rule.
conv.Before(func(selec *goquery.Selection) {
selec.Find("img").Each(func(i int, s *goquery.Selection) {
_, ok := s.Attr("src")
if ok {
return
}
s.SetAttr("src", s.AttrOr("data-src", ""))
})
})
But your solution works as well 👍
@JohannesKaufmann Thanks for your answer,you are so 👍