patcg-individual-drafts / topics

The Topics API

Home Page:https://patcg-individual-drafts.github.io/topics/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Different topics on different domains

eysegal opened this issue · comments

Hi, I thought the purpose of Topics is to show which topics users are reading across sites without disclosing the user id.
However on my personal browser, I see I get different topics in different sites. Why is that? Is it just based on the context of the current page I'm visiting?

Each week you get 5 new topics. And each site you visit that calls the API will get one of those 5, and it's sticky (that site keeps getting that same topic for the remainder of the week). The reason that we distribute it this way is explicitly to make it harder to track users across pages. Because site A will see topic 1 and site B will see topic 2 for the same user, it's harder for A+B to collude and determine that it's the same person based on topics than if both sites saw the same topic.

Thank you for your answer. But the point of Topics is to compensate over cross-site tracking. If I get "News" on a general news site and "Investing" on a financial site it doesn't help advertising, because when I'm on the sites I know what is the context of the current site. What I need is what the user is reading about on other sites (as we do with cookies).

Which topic you get on which site is randomly chosen from the top 5 from the previous week's browsing. The chosen topic is not otherwise correlated to the page you're currently visiting.

Ok, so it's only one topic once a week?
And weird, it's seems that it is correlated, but maybe it's by chance.

@jkarlin, have you considered returning more than one topic in case the current site topic the user is visiting is equal to the topic that the browser is about to return?
According to the Nov 21st update - it looks like Topics will return two of the five top topics, but what about having at least one of them different than the current website topic?

@jkarlin, have you considered returning more than one topic in case the current site topic the user is visiting is equal to the topic that the browser is about to return?

The API returns up to 3 topics (one for each of the past 3 weeks). So we technically do return multiple per call. Note that even though there are 3 topics returned, the max rate that one site can learn new topics is one per week.

Ensuring that the returned topic is different from the current site's topic is relatively expensive, as it would require running our classifier on the current site when determining which topics to return. It could be done but we prefer to only run the classifier once per week.

Thanks @jkarlin for the answer
As for returning a topic that is different from the current site - since the current website was already classified within the epoch, can we pull this data locally?