tomquirk / linkedin-api

👨‍💼Linkedin API for Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`search_people` endpoint doesn't work anymore

agusmdev opened this issue · comments

I used this endpoint 2 days ago and it was working correctly, but it seems LinkedIn updated their API, and now instead of using the endpoint /search/blended they use /graphql?variables=.....

When I use search_people I always get a 403 response, is somebody experiencing the same issue?

PS: My cookie session is working properly

I am noticing similar behavior, but it probably happens 70% of the time. I'm not quite sure how to resolve this issue.

Got the same problem too

Same problem here

Did someone found a solution to replace the current code?

        res = self._fetch(
            f"/search/blended?{urlencode(default_params, safe='(),')}",
            headers={"accept": "application/vnd.linkedin.normalized+json+2.1"},
        )
        data = res.json()

Same problem :(

Please update if anyone has any resolution on this.

I am also willing to work on this with anyone who is interested maybe on call, let me know.

same here, toying around with it atm

@agusmdev do you happen to know where in the documentation it mentions the new endpoint?

@agusmdev do you happen to know where in the documentation it mentions the new endpoint?

Nowhere, I checked that with my Linkedin account executing a search query from the browser

gotcha, was digging through the docs and I wasn't able to find anything so makes sense, I'll take a look there

The code below is my basic implementation of getting list of first 10 employees (because 1 request returns exactly that, so offset can be used to request not from 1st employee, but from 10th for example), parsing it and returning some basic data. Very little is parsed since I don't need all the data. But I think this might be a good starting point.

def fetch_employees(company_id, offset=0):
    cache = f"companies/{company_id}/employees_{offset}.json"
    if os.path.exists(cache):
        r = json.loads(open(cache).read())
        print(f"[get_employees()]: OK! Using cached file \"{cache}\".")
    else:
        uri = f"/graphql?includeWebMetadata=true&variables=(start:{offset},origin:COMPANY_PAGE_CANNED_SEARCH,query:(flagshipSearchIntent:SEARCH_SRP,queryParameters:List((key:currentCompany,value:List({company_id})),(key:resultType,value:List(PEOPLE))),includeFiltersInResponse:false))&&queryId=voyagerSearchDashClusters.b0928897b71bd00a5a7291755dcd64f0"
        r = API._fetch(uri)

        if not r.ok:
            print(f"[fetch_employees()]: Fail! LinkedIn returned status code {resp.status_code} ({r.reason})")
            return

        print(f"[fetch_employees()]: OK! LinkedIn returned status code {r.status_code} ({r.reason})")
        r = r.json()

        # Cache request
        os.makedirs(f"companies/{company_id}", exist_ok=True)
        with open(cache, "w") as f:
            json.dump(r, f)

        if not r["data"]["searchDashClustersByAll"]:
            print(f"Bad json. LinkedIn returned error:", r["errors"][0]["message"])
            os.remove(cache)
            return

    return r["data"]["searchDashClustersByAll"]


def get_employees(company_id, offset=0):
    def get_item_key(item, keys):
        if type(keys) == str:
            keys = [keys]

        cur = item
        for key in keys:
            if cur and key in cur.keys():
                cur = cur[key]
            else:
                return ""

        return cur

    j = fetch_employees(company_id)
    if not j:
        return []

    if not j["_type"] == "com.linkedin.restli.common.CollectionResponse":
        return []

    employees = []
    for it in j["elements"]:
        if not it["_type"] == "com.linkedin.voyager.dash.search.SearchClusterViewModel":
            continue

        for it in it["items"]:
            if not it["_type"] == "com.linkedin.voyager.dash.search.SearchItem":
                continue

            e = it["item"]["entityResult"]
            if not e or not e["_type"] == "com.linkedin.voyager.dash.search.EntityResultViewModel":
                continue

            try:
                #print("\nEmployee:")
                #print("    ", get_item_key(e, ["title", "text"]))
                #print("    ", get_item_key(e, "entityUrn"))
                #print("    ", get_item_key(e, ["primarySubtitle", "text"]))
                #print("    ", get_item_key(e, ["secondarySubtitle", "text"]))

                employees.append({
                    "title": get_item_key(e, ["title", "text"]),
                    "entityUrn": get_item_key(e, "entityUrn"),
                    "primarySubtitle": get_item_key(e, ["primarySubtitle", "text"]),
                    "secondarySubtitle": get_item_key(e, ["secondarySubtitle", "text"]),
                })
            except Exception as e:
                print(f"Exception {e} while processing employees of id {company_id}")
                exit(1)

    return employees

Is this code working?.

Is this code working?.

It is working for me in my program :)

Oh okay, इ
I will try on my end, if it doesn’t will you be able to connect with me on. A meet?

Oh okay, इ I will try on my end, if it doesn’t will you be able to connect with me on. A meet?

I don't think I'm the right person to answer these kind of questions ;). All I did in my code is pure guessing + looking at lots of json requests. But if anything less serious happens, you could try writing here, so it will also help others if they stumble upon the same problem.

What is the company_id here?

Wrote a small function for getting company_id

def getCompanyID(company_link):
    try:
        company_username = company_link.split('.com/company/')[1].replace('/','')
    except:
        print("Wrong Company URL. Company Format should be https://www.linkedin.com/company/company_Username/!")
        return None
    
    api_link = 'https://www.linkedin.com/voyager/api/organization/companies?decorationId=com.linkedin.voyager.deco.organization.web.WebCompanyStockQuote-2&q=universalName&universalName={}'.format(quote(company_username))
    resp = api._get(api_link).json()
    company_id = resp.get('elements')[0].get('entityUrn').split(':')[-1]
    return company_id

company_id is numerical id of the company (google = 1441, facebook = 76987811). It can be retreived as urn from linkedin_api and then converted to numerical id using built-in helper function

Example snippet:

from linkedin_api.utils import helpers

company = API.get_company("google")
company_id = helpers.get_id_from_urn(company["entityUrn"])
employees = get_employees(company_id)

# Print name of first 10 employees
for e in employees:
    print(e["title"])

PS: There were some minor typos in my initial code (#313 (comment)) which I fixed already. So just re-paste it.

might be a little out of the loop here, how does this code fix the search function?

as an update, I think was able to get the mappings right for the new search endpoint. The current tests show that 16/24 tests fail that are all tied to the search function, so I'll be forking the repo and seeing if I can bring it back up to 24/24.

might be a little out of the loop here, how does this code fix the search function?

It doesn't. I wanted to use search_people in my project, but it was broken. So I wrote my own small variation of it and posted it in case someone needed it. It can output only minimal information, but that's okay for me, since that was all I needed. If anyone needs more than that, I thought that code would've been a nice little foundation.

I used this endpoint 2 days ago and it was working correctly, but it seems LinkedIn updated their API, and now instead of using the endpoint /search/blended they use /graphql?variables=.....

Hey! Where can I find the information about this new API? Share the links to the reference please

I used this endpoint 2 days ago and it was working correctly, but it seems LinkedIn updated their API, and now instead of using the endpoint /search/blended they use /graphql?variables=.....

Hey! Where can I find the information about this new API? Share the links to the reference please

#313 (comment)

The output from the new endpoint is a little confusing, anyone know how to make sense of it? seems like it's returning multiple attributes that come together to make a single profile on the website.
Screenshot 2023-06-05 at 1 44 48 PM

I've noticed they are using 2 endpoints to get people by different params. First one returns only urn ids and second one returns list of profiles by list of urn ids. I can fetch urn ids but for some reason second endpoint returns me 400. Probably it has some specific headers or something idk for now.

My solution:
`

def graphql_search_people(
            self,
            job_title: str,
            regions: list[str],
            limit: int | None,
            offset: int
    ) -> list[dict]:
        """Get list of user's urns by job_title and regions."""
        count = Linkedin._MAX_SEARCH_COUNT
        if limit is None:
            limit = -1
    results = []
    while True:
        # when we're close to the limit, only fetch what we need to
        if limit > -1 and limit - len(results) < count:
            count = limit - len(results)

        default_params = {
            "origin": "FACETED_SEARCH",
            "start": len(results) + offset,
        }

        res = self._fetch(
            (f"/graphql?variables=(start:{default_params['start']},origin:{default_params['origin']},"
             f"query:(keywords:{job_title},flagshipSearchIntent:SEARCH_SRP,"
             f"queryParameters:List((key:geoUrn,value:List({','.join(regions)})),"
             f"(key:resultType,value:List(PEOPLE))),"
             f"includeFiltersInResponse:false))&=&queryId=voyagerSearchDashClusters"
             f".b0928897b71bd00a5a7291755dcd64f0"),
            headers={"accept": "application/vnd.linkedin.normalized+json+2.1"},
        )

        logger.debug(res.text)
        data = json.loads(res.text)

        new_elements = []
        elements = data.get("included", [])
        logger.debug(f"Profile urns: {elements}")

        for i in range(0, 10):
            new_elements.append(elements[i]["entityUrn"])

        results.extend(self._get_people_by_urns(urns=new_elements))

        # break the loop if we're done searching
        # NOTE: we could also check for the `total` returned in the response.
        # This is in data["data"]["paging"]["total"]
        if (
                (-1 < limit <= len(results))  # if our results exceed set limit
                or len(results) / count >= Linkedin._MAX_REPEATED_REQUESTS
        ) or len(new_elements) == 0:
            break

        self.logger.debug(f"results grew to {len(results)}")

    return results

def _get_people_by_urns(self, urns: list[str]) -> list[dict]:
    """Get profiles info by urns."""
    profiles = []

    for urn in urns:
        clear_urn = urn.split(":")[-1]
        profiles.append(self.get_profile(urn_id=clear_urn))

    return profiles`

URL to fetch profiles (always returns 400):
https://www.linkedin.com/voyager/api/graphql?variables=(lazyLoadedActionsUrns:List(urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAA6ZpN0B-fPBL3atd5cCsIS9cl7w3zXLylw,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAAAJcNcBZWx8gvYiUs_1cLtFiwXhXoNQihc,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAAD-cOsB2wB0EldN_R22uvya2ZcYuefBKPI,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAADEXysBWdPqwfO-p8MyOQOwaWMB2qO0Umg,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions:(urn:li:fsd_profileActions:(ACoAAAO9jNABuhihN_wVSgFGgDry9xrGYM-cmzU,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAAfUQGcBG3VTivwWqKm9Gw5g8F3Rt8gUwQ8,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions:(urn:li:fsd_profileActions:(ACoAABdSlasBRb9Dp9rwdkpKS3_atJQPLkAt0jY,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAABfd6ZoBNCHS45DdfDVHMABssw9S57AH4-Y,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAACKT0KABrXki4zf6VnGenRUxSBmG-udwtag,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAACWuoz0BNF2Tcij9PyIymEc65yt_mlrzAfk,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP))) &=&queryId=voyagerSearchDashLazyLoadedActions.9efa2f2f5bd10c3bbbbab9885c3c0a60

The output from the new endpoint is a little confusing, anyone know how to make sense of it? seems like it's returning multiple attributes that come together to make a single profile on the website.
Screenshot 2023-06-05 at 1 44 48 PM

Someone has find a solution ?

Same, not working always returns empty list !

any news on the LinkedIn Search. It's very important feature. Thanks for contribution

any news on the LinkedIn Search. It's very important feature. Thanks for contribution

Still down unfortunately :/

I created a draft PR with the changes suggested by @17314642 and @Timur-Gizatullin + a few modifications.
The search_people and search_companies endpoint work for me with these changes and the parameters of my use case but I haven't tested all the other combinations.

Feel free to add any improvements or suggest changes! I might take a look again at it if I get some time and try to do a cleaner fix, if there is one.