mwpenny / kijiji-scraper

A lightweight node.js module for retrieving and scraping ads from Kijiji

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Random Error:Invalid Kijiji HTML on search results page

jahbolero opened this issue · comments

Hi! I'm doing a recurring scraping of Kijiji which scrapes it every 10 minutes. It works sometimes but sometimes it throws that error in the title.
Search params are the same through each recursion,
keyword used was Harley Davidson.

 console.log("Kijiji parse");
      let options = {
        minResults: 5000,
      };
      let params = {
        locationId: province.id,
        categoryId: 30, 
        keywords: keyword,
      };
      let kijijiRes = await kijiji.search(params, options);
      let listings = [];
      let cleanList = [];
      kijijiRes.forEach((ad) => {
        let price = ad.attributes.price;
        let listingName = ad.title;
        let listingDescription = ad.description;
        let url = ad.url;
        let imgUrl = ad.image;
        let listing = {
          price,
          listingName,
          listingDescription,
          url,
          imgUrl,
        };
        listings.push(listing);
      });
      cleanList = listings.filter(function (el) {
        return el.price != 0;
      });
      console.log("End parse");
      return cleanList;
    } catch (err) {
      console.log(err);
      return cleanList;
    }

Hi there. Can you provide the error message and stack trace? What I can do is add an option to print the invalid HTML on error so it will be easier to diagnose.

Is there a line in the console output that begins with WARNING: Failed to parse search result? There should be a more detailed error after it.

Hi! Here's the full error that I get, I'm not getting the warning that you've mentioned.

Error: Ad not found or invalid Kijiji HTML at URL
    at C:\Users\skandinavisk\Documents\Others\Repository\motoarbitrage\node_modules\kijiji-scraper\lib\scraper.js:99:35
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

Your latest error message is different than the original one. The former occurs when scraping a search result page and the latter occurs when scraping an ad page. Did you run something different than before?

In either case, I've added extra information to the error messages so we can see the URL that is tripping up the scraper.

I also may have found the issue - the scraper was erroneously throwing "Invalid Kijiji HTML on search results page" if the results page only contained promoted ads and no "regular" ads (as Kijiji calls them), or just no ads at all. That could be what you were running into.

Try the latest version and see if the problem is fixed. If not, please share the URL from the error message.

Closing as I was able to reproduce the issue and believe that I have fixed it. Please let me know if this is not the case.