codsen / codsen

a monorepo of npm packages

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

string-strip-html Getting stuck with no errors and CPU 100%

alvaro-escalante opened this issue · comments

Package name
string-strip-html

Describe the bug
When trying to process the body on this particular page:
https://www.pearlevision.ca/pv-ca/ohip-coverage-for-eye-exams
it gets stuck and does not display any errors, CPU reaches 100%

To Reproduce

import axios from 'axios'
import cheerio from 'cheerio'
import { stripHtml } from 'string-strip-html'

const { data } = await axios(encodeURI('https://www.pearlevision.ca/pv-ca/ohip-coverage-for-eye-exams'), {  timeout: 10000 })

const $ = cheerio.load(data)

$('footer').remove()
$('nav').remove()

const body = $('body').text()

let clean 
try {
  clean = stripHtml(body).result
} catch (error) {
  console.log(error)
}
....

hi Alvaro! Thank you for reporting it and sorry about the bug. I'll investigate and report soon.

By the way, that website's source code is breaking the latest Safari source code preview renderer (Chrome manages to patch it).
Screenshot 2022-07-05 at 19 33 38.

It also breaks Kangax minifier (which is industry-standard web-dev oriented minifier). My preliminary hypothesis is that those JSP templating bits and/or HTML markup bugs somehow throw the algorithm off. I'll find out.

From algorithm improvement perspective, it's a treasure trove! I've harvested couple TDD tests todo from it already.

released v10 with a fix