AngleSharp / AngleSharp.Js

:angel: Extends AngleSharp with a .NET-based JavaScript engine.

Home Page:https://anglesharp.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

QuerySelectorAll gives empty list

irfan-yusanif opened this issue · comments

commented

Page I want to scrape: https://www.olx.com.pk/items

My code:

var config = AngleSharp.Configuration.Default.WithDefaultLoader();
          var document = await BrowsingContext.New(config).OpenAsync(pageLink);

          var titleSelector = ".fhlkh";
          var titlecells = document.QuerySelectorAll(titleSelector); //no results, empty list
          var titles = titlecells.Select(m => m.GetAttribute("href"));

The QuerySelectorAll() gives empty list
Note: The page to be scraped don't have jquery included.

Are you sure you are reporting to the right repo? You don't even include AngleSharp.Js in your configuration.

Otherwise for the given page I see no problem. The page does not contain any element with the class fhlkh. What it does is that the JS on the page starts a redirect (quite an efficient mechanism, but whatever ...), which leads to a page that contains some items (and elements with the CSS class that you are looking for).

HTH!

sorry for not including the AngleSharp.Js part in question.
here is my code,

var context = BrowsingContext.New(Configuration.Default.WithJs());
                  var document = await context.OpenAsync(req => req.Content(pageLink));
                  var titleSelector = ".fhlkh"; 
                  var titlecells = document.QuerySelectorAll(titleSelector);
                  var titles = titlecells.Select(m => m.GetAttribute("href"));

on the given page, when i do
document.getElementsByClassName("fhlkh")[0].href it gives the href link fine. But the above code does not return href links. can you please help?

Unfortunately I cannot. There are several reasons why your code may not work. The top two options are:

  • Your configuration may be insufficient (you should include a full requester - not only for loading pages, but also resources); e.g., in the sample above you are not including any requester - just including the content is doomed to fail as relative addresses will all be resolved wrongly ...
  • AngleSharp.Js is using Jint and does not implement many of the JS parts required by modern web apps (the README says its experimental and not production ready)

If you rely on this working I recommend you diving into the code, debugging the issue and coming up with a reason why the particular page is not working. If its a missing API we could solve it in AngleSharp.Js, if its a problem with Jint a PR in their repo may be helpful.

HTH!