guile2912 / boilerpipe

Automatically exported from code.google.com/p/boilerpipe

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IllegalArgumentException for many web pages

GoogleCodeExporter opened this issue · comments

With boilerpipe-1.2.0.jar
ArticleExtractor.INSTANCE.getText(new java.net.URL("http://t.co/3RplOLjc"))
produces
ERROR java.lang.IllegalArgumentException:
protocol = http host = null
        at de.l3s.boilerpipe.sax.HTMLFetcher.fetch (HTMLFetcher.java:33)
        at de.l3s.boilerpipe.extractors.ExtractorBase.getText (ExtractorBase.java:87)

This happens for many other URLs e.g. http://t.co/5vuYimwn http://t.co/Dy5yolLs 
http://t.co/ShWhtFjP http://nyti.ms/lQrWwp ...


Original issue reported on code.google.com by johann.petrak@gmail.com on 22 Aug 2014 at 3:23