mwpenny / kijiji-scraper

A lightweight node.js module for retrieving and scraping ads from Kijiji

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Filtering results

Zalarox opened this issue · comments

Hey!

I noticed that some category IDs are either broken or have changed (one example is id 214 -- supposed to be two-bedroom apartments but gives out listings of cars, etc.)

In any case, I'm having trouble with the filtering params. I'd appreciate if you could point me in the right direction with them. For instance:

searchParams["attributeMap[numberbedrooms_s]"] = "[2]"

I can see that will help me set the attribute called numberbedrooms to 2 but I have no idea how the template works here. What's the _s for, why are we passing an array-like structure in the value? Also, how would this work with more than one value, ex. if I had to filter results to 2 or 3 bedrooms, or "at least 2" bedrooms, how would I do that?

I was trying to filter real-estate ads based on "offer" or "wanted" but that doesn't seem to be working. Here's how I've been trying to do it:

searchParams["attributeMap[type_s"] = "[OFFER]"

Based on this KijijiAd object:

{
    title: 'Grande Chambre',
    description: '...',
date: 2019 - 07 - 29 T13: 45: 54.000 Z,
    image: 'https://...$_35.JPG',
    images: ['https://...$_57.JPG',
    ],
    attributes: {
         forrentbyhousing: 'ownr',
         unittype: 'apartment',
         numberbedrooms: 1,
         numberbathrooms: 10,
         petsallowed: 0,
         dateavailable: 2019-08-05T00:00:00.000Z,
         areainfeet: 625,
         yard: 1,
         balcony: 1,
         smokingpermitted: 2,
         elevator: 0,
         hydro: 0,
         heat: 0,
         water: 0,
         cabletv: 1,
         internet: 1,
         landline: 0,
        numberparkingspots: 0,
        furnished: 1,
        price: 600,
        location: {
            latitude: ...,
            longitude: ...,
            mapAddress: '...QC, Canada',
            province: 'quebec',
            mapRadius: 0
        },
        type: 'OFFER',
        visits: 139
    },
    url: 'https://www.kijiji.ca/...',
    scrape: [Function],
    isScraped: [Function]
}

Also -- sorting by relevancy doesn't seem to be an option. If I were to set the results to sort by priceAsc a lot of strange (irrelevant) results show up.

Any help would be appreciated!

Hi there! Sorry for the late reply, and thanks for the detailed description.

I've just double checked the latest location and category IDs on Kijiji and they have not changed from what is in locations.js and categories.js. Additionally, 214 is not a valid category ID. How did you find it? As you've described, number of bedrooms is specified as a search parameter - not a category ID.

As for the attributeMap formatting - this is just the format that Kijiji uses. It is determined by them. The best way to figure out how to format your search is to perform a search on the real kijiji.ca and look at the network request for b-search.html in your browser's developer tools. Any parameter in that request can be specified in the params object passed to kijiji-scraper. After I did a search for only ads of type "OFFER", the request contained attributeMap[adType] = [OFFER], so that is what you need in your case.

I just tried sorting by priceAsc and the results the scraper correctly gives me are the same that are returned by kijiji.ca under the same conditions. I find that ads posted for $1 in categories like real estate, cars, etc. are usually pretty clickbait-y, so I'm not surprised if they're a little weird.

I hope I could answer all of your questions. Try your search again with updated params obtained through a real search on kijiji.ca and let me know how it goes!

214 is not a valid category ID. How did you find it? As you've described, number of bedrooms is specified as a search parameter - not a category ID.

From categories.js

REAL_ESTATE: {
        id: 34,
        APARTMENTS_AND_CONDOS_FOR_RENT: {
            id: 37,
            ONE_BEDROOM: { id: 212 },
            ONE_BEDROOM_PLUS_DEN: { id: 213 },
            TWO_BEDROOM: { id: 214 },
            THREE_BEDROOM: { id: 215 },
            FOUR_PLUS_BEDROOM: { id: 216 },
            BACHELOR_AND_STUDIO: { id: 211 }
        }
    ...
}

The best way to figure out how to format your search is to perform a search on the real kijiji.ca and look at the network request for b-search.html in your browser's developer tools.

That is incredibly helpful, thank you! I'll grab whatever parameters I require from there. I'm assuming all the parameters (whether visible on the client's end or not) are visible there? I ask this because filtering by "two-bedroom apartments" isn't really a filter on the Kijiji interface. That's why I'm trying to code something up.

I find that ads posted for $1 in categories like real estate, cars, etc. are usually pretty clickbait-y, so I'm not surprised if they're a little weird.

Indeed! I tried to combat this by setting a minimum price but a lot of ads with the price equal to the minimum price I set showed up -- some of which didn't seem relevant. I understand that this must be something on the Kijiji search algorithm's end, though, so you probably can't help much there. I'll figure some workaround here!

I hope I could answer all of your questions. Try your search again with updated params obtained through a real search on kijiji.ca and let me know how it goes!

Thanks a lot! I'll post if I encounter any problems.

From categories.js

Sorry, my mistake on the category ID. Kijiji has in fact updated their category IDs and removed the categories for number of bedrooms. The script I used this morning to test for diffs was out of date. I will update categories.js. The new REAL_ESTATE category tree looks like this:

REAL_ESTATE: {
    id: 34,
    FOR_RENT: {
        id: 30349001,
        LONG_TERM_RENTALS: { id: 37 },
        SHORT_TERM_RENTALS: { id: 42 },
        ROOM_RENTALS_AND_ROOMMATES: { id: 36 },
        STORAGE_AND_PARKING_FOR_RENT: { id: 39 },
        COMMERCIAL_AND_OFFICE_SPACE_FOR_RENT: { id: 40 }
    },
    FOR_SALE: {
        id: 30353001,
        HOUSES_FOR_SALE: { id: 35 },
        CONDOS_FOR_SALE: { id: 643 },
        LAND_FOR_SALE: { id: 641 },
        COMMERCIAL_AND_OFFICE_SPACE_FOR_SALE: { id: 44 }
    },
    REAL_ESTATE_SERVICES: { id: 170 }
}

I'm assuming all the parameters (whether visible on the client's end or not) are visible there? I ask this because filtering by "two-bedroom apartments" isn't really a filter on the Kijiji interface. That's why I'm trying to code something up.

There are many options there that aren't exposed in the UI. It's a little trial and error but if you inspect the request it will send default values for everything so you can see which properties exist. So are you able to do it with numberbedrooms_s? The format does appear to be attributename + '_s'. I know it has worked for me and others before. Multiple values should be "[2,3,4]". If not, They may have removed the ability to do it in a search.

As you mentioned, if it's not possible via search, you could always filter based on the various attributes in the KijijiAd object (such as numberbedrooms, in your case). There are many more attributes there than are exposed on the website. If they're doing changes like this it may be more reliable, at the cost of overfetching data.

So are you able to do it with numberbedrooms_s?

That does work, yes. I was having trouble with non-numerics like "type" which could have OFFER or WANTED. I'm going to try and work through these right now so I'll post if I encounter any difficulties. Glad to have the updated categories, will make a pull and continue my work!

Update: it seems that my old params are not working either anymore! :(

...
searchParams["attributeMap[numberbedrooms_s]"] = "[2]"
searchParams["attributeMap[adType]"] = "[OFFER]"
...
kijiji.search(searchParams).then(function(ads) {
    ads.forEach((ad) => {
        console.log(ad.attributes.type); // 'adType' is undefined
        console.log(ad.attributes.numberbedrooms); // This was working fine earlier
    });

This results in:
Results

Note that the Kijiji bsearch.html request has this parameter mentioned:
Kijiji Params

Any idea what I'm doing wrong?

Hm, so in the ad objects you get back, type and numberbedrooms are undefined? I would expect that not every ad may have the number of bedrooms attribute, but lack of ad type surprises me.

The fact that it doesn't work on the search unfortunately sound like changes on Kijiji's end. To rule it out, does it work it you just resend the same request in your browser (exact same HTTP headers)?

Hm, so in the ad objects you get back, type and numberbedrooms are undefined?

Here's a sample ad object:

{
  title: 'RÉNOVÉ 3. 5 PLATEAU, PRÈS DE MCGILL, TOUT INCLUS, SEPT/OCT  2019',
  description: '3.5 RÉNOVÉ...',
  date: 2019-08-12T12:53:47.000Z,
  image: 'https://i.ebayimg.com/00/s/MTA2M1gxNjAw/z/hMwAAOSwDvBc0dbH/$_35.JPG',
  images:
   [ 'https://i.ebayimg.com/00/s/MTA2M1gxNjAw/z/hMwAAOSwDvBc0dbH/$_57.JPG',
     'https://i.ebayimg.com/00/s/MTA2NFgxNjAw/z/uT4AAOSwvQdc0dbG/$_57.JPG' ],
  attributes:
   { forrentbyhousing: 'reprofessional',
     unittype: 'apartment',
     numberbedrooms: 1.5, // here!
     numberbathrooms: 10,
     petsallowed: 1,
     agreementtype: 'one-year',
     furnished: 0,
     laundryinunit: 0,
     laundryinbuilding: 1,
     dishwasher: 0,
     fridgefreezer: 1,
     microwave: 0,
     airconditioning: 0,
     yard: 0,
     balcony: 0,
     smokingpermitted: 0,
     gym: 0,
     pool: 0,
     sauna: 0,
     yogaroom: 0,
     theatreinbuilding: 0,
     gamesroom: 0,
     partyroom: 0,
     concierge: 1,
     twentyfourhoursecurity: 0,
     bicycleparking: 0,
     storagelocker: 0,
     elevator: 1,
     wheelchairaccessible: 0,
     braillelabels: 0,
     audioprompts: 0,
     barrierfreeentrancesandramps: 1,
     visualaids: 0,
     accessiblewashroomsinsuite: 0,
     hydro: 0,
     heat: 0,
     water: 0,
     cabletv: 0,
     internet: 0,
     landline: 0,
     numberparkingspots: 0,
     price: 1100,
     location:
      { latitude: 45.5141933,
        longitude: -73.5766012,
        mapAddress: '3777 Saint-urbain, Montreal, QC, H2W 1T5',
        province: 'quebec',
        mapRadius: 0 },
     type: 'OFFER', // here
     visits: 14951 },
  url: 'https://www.kijiji.ca/v-appartement-condo/ville-de-montreal/renove-3-5-plateau-pres-de-mcgill-tout-inclus-sept-oct-2019/1416477946',
  scrape: [Function],
  isScraped: [Function] }

I would expect that not every ad may have the number of bedrooms attribute, but lack of ad type surprises me.

A quick test can help us figure that out. Note that these results are with this exhaustive list of parameters:

let searchParams = {
    locationId: locations.QUEBEC.GREATER_MONTREAL.CITY_OF_MONTREAL,
    categoryId: categories.REAL_ESTATE,
    minPrice: 600,
    maxPrice: 1500
}

searchParams["attributeMap[hydro_s]"] = "[1]"
searchParams["attributeMap[heat_s]"] = "[1]"
searchParams["attributeMap[water_s]"] = "[1]"
searchParams["attributeMap[numberbedrooms_s]"] = "[2]"

... print ad.attributes.numberbedrooms + ' and ' + ad.attributes.type ...

Which results in...

1 and OFFER
2.5 and OFFER
1.5 and OFFER
2 and OFFER
undefined and OFFER
1 and OFFER
1 and OFFER
1 and OFFER
0 and OFFER
1 and OFFER
2 and OFFER
2 and OFFER
2 and OFFER
0 and OFFER
1 and OFFER
undefined and OFFER
2 and OFFER
1 and OFFER
1 and OFFER
0 and OFFER

It seems most objects have numberbedrooms defined but not all. Weirdly enough, all objects returned were type "OFFER" -- none were of type "WANTED." To confirm, I rerun the script with this parameter added to the original list above: searchParams["attributeMap[adType]"] = "[WANTED]" and the results remain unchanged. This implies that neither of the parameters worked for the filter.

The fact that it doesn't work on the search unfortunately sound like changes on Kijiji's end. To rule it out, does it work it you just resend the same request in your browser (exact same HTTP headers)?

Hm, so what I noticed is that the browser shoots a b-search.html when I click search, however on applying filters that request isn't sent. Instead, a GET is sent. Query params update in the URL. Check these out:
https://www.kijiji.ca/b-a-louer/ville-de-montreal/c30349001l1700281r10.0?ad=offering&address=H3G%201M8&ll=45.496987%2C-73.578808&price=600__2000&siteLocale=en_CA

This is a bookmarked URL I kept with my filters. Here we see ad=offering -- which is the filter I applied. If I click on "All" instead of "Offering" then a b-search.html isn't sent, but this is:
https://www.kijiji.ca/b-a-louer/ville-de-montreal/c30349001l1700281r10.0?price=600__2000&address=H3G+1M8&ll=45.496987,-73.578808

Likewise, clicking on wanted results in:
https://www.kijiji.ca/b-a-louer/ville-de-montreal/c30349001l1700281r10.0?ad=wanted&price=600__2000&address=H3G+1M8&ll=45.496987,-73.578808

If I click on the Search button to trigger the b-search.html request, it simply resets all the filters.

Sorry for the delayed responses.

I just tried it out with a few different attributes and it looks like Kijiji isn't respecting any attributeMap properties anymore :( Based on the fact that numberbedrooms_s went from working to not working for you in a matter of days, it sounds Kijiji has been changing some things on their end lately. I'll update the readme.

Hm, so what I noticed is that the browser shoots a b-search.html when I click search, however on applying filters that request isn't sent [...]

I've noticed that as well for some requests but not all (for example, changing search radius sends a b-search.html request with radius defined, but offer type doesn't send one with the ad type set). I've noticed in the past that for each filter that doesn't trigger a b-search.html request, there is a corresponding parameter for b-search.html requests nonetheless. Kijiji isn't sending them properly when you click search - but they exist and the server knows how to handle them. I haven't been proven wrong on this yet. After a little looking I found you can filter ads by offer/wanted using adType. Example:

let params = {
    locationId: kijiji.locations.ONTARIO.OTTAWA_GATINEAU_AREA.OTTAWA,
    categoryId: kijiji.categories.REAL_ESTATE,
    sortByName: "priceAsc",
    adType: "WANTED"  // or "OFFER", or undefined for both
}
kijiji.search(params);

The Kijiji UI is incorrectly sending attributeMap[adType] when it should be sending adType instead. Interestingly I found this in the b-search.html request that gets fired after setting the search radius, of all places. It sends adType, but clicking the search button doesn't. Weird! Give it a try :)

As for filtering bedroom count - as we saw, it used to be a category type but they removed it and now there's no UI option. Unfortunately it looks like you'll have to do that one in code.

As for the b-search.html URLs vs the ugly generated ones you get redirected to: I thought about building the scraper around the auto-generated ones, but I found b-search.html URLs more structured and less likely to change over time (which has remained true until now). It makes the client code more readable too.

For example, to set price and search radius using b-search.html the following properties must be set:

{
    minPrice: 1000,
    maxPrice: 2000,
    radius: 100, // km
    address: "H3G1M8"  // center of circle
}

to do this via the cryptic generated URL you would need to set:

{
    price: "1000__2000",
    address: "H3G1M8",
    r: 10.0
}

After a little looking I found you can filter ads by offer/wanted using adType.

Huh! I'll check that out.

Unfortunately it looks like you'll have to do that one in code.

You mean getting all the responses and filtering the ones that have my attribute -- yes? Just confirming.

I thought about building the scraper around the auto-generated ones, but I found b-search.html URLs more structured and less likely to change over time.

It's a shame that they don't have proper filters on their UI but even more of a shame that they don't provide an API for us to filter those results out manually. :( I guess nothing can be done about that.

You mean getting all the responses and filtering the ones that have my attribute -- yes? Just confirming.

Yes, exactly.

even more of a shame that they don't provide an API for us

Yeah, it's really too bad. And now they're removing useful functionality. Ideally this module should not have to exist, but hopefully the API-like abstraction it provides is useful.

I'm going back and forth on whether or not it would be beneficial to include attribute filtering functionally as a feature. It's convenient, but the scraper would still over fetch ads you don't care about. Then again, that happens anyway now. Really it'd just be a different way of doing the same filtering you're already doing (for example, as attributes specified in the search parameters). Do you think it would be worth it? Thinking about it, I think you get more flexibility doing it with your own code because you can do comparisons besides equality (greater than, less than, arithmetic operations, string comparison)

Do you think it would be worth it?

Probably not. You don't control the changes that are happening on Kijiji's end. Also:

you get more flexibility doing it with your own code

Precisely this! I think a README with examples does wonders instead of providing it as a feature. As long as the core functionality remains intact (it seems precarious at the moment) I don't think you need to make any major updates. Documenting findings on the other hand? That'd be great.

I can't control what Kijiji does, but I like reducing annoyance where I can (which is why this exists :p). I have to agree with you though: give people the power and let them run with it. I can only account for so much.

It seems like the things most likely to change are ad attributes and the ability to filter on them. For this reason, I never bothered to document them because I didn't want people relying on them never changing. However, I think examples of how to filter on arbitrary attributes would indeed be useful.

The readme is getting fairly large. I think it could use some splitting up into smaller files. I can do that and add some filtering examples when I get some free time. In the mean time, PRs are welcome :)

Hi mwpenny i am facing problem applying carmake filter example :

const params = {
    locationId: 1700273,  
    categoryId: 174,  
    distance: 176,
    sortByName: "dateDesc",
    address: '434 Bay St., Toronto, ON M5G 1P5, Canada',
    radius: 176.0,
    minPrice: 700,
    maxPrice: 2500,
    carmake : ["honda", "lexus", "nissan", "toyota", "mazda", "acura"],
    carmileageinkms: 300000,
    forsaleby: 'ownr',     
};

this is my prams but carmake param is not working any idea if you have I really appreciate that thankyou