string-strip-html Getting stuck with no errors and CPU 100%
alvaro-escalante opened this issue · comments
Package name
string-strip-html
Describe the bug
When trying to process the body on this particular page:
https://www.pearlevision.ca/pv-ca/ohip-coverage-for-eye-exams
it gets stuck and does not display any errors, CPU reaches 100%
To Reproduce
import axios from 'axios'
import cheerio from 'cheerio'
import { stripHtml } from 'string-strip-html'
const { data } = await axios(encodeURI('https://www.pearlevision.ca/pv-ca/ohip-coverage-for-eye-exams'), { timeout: 10000 })
const $ = cheerio.load(data)
$('footer').remove()
$('nav').remove()
const body = $('body').text()
let clean
try {
clean = stripHtml(body).result
} catch (error) {
console.log(error)
}
....
hi Alvaro! Thank you for reporting it and sorry about the bug. I'll investigate and report soon.
By the way, that website's source code is breaking the latest Safari source code preview renderer (Chrome manages to patch it).
.
It also breaks Kangax minifier (which is industry-standard web-dev oriented minifier). My preliminary hypothesis is that those JSP templating bits and/or HTML markup bugs somehow throw the algorithm off. I'll find out.
From algorithm improvement perspective, it's a treasure trove! I've harvested couple TDD tests todo from it already.
released v10 with a fix