internetarchive / openlibrary

One webpage for every book ever published!

Home Page:https://openlibrary.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Aggregate want-to-read counts from works onto authors

RayBB opened this issue · comments

Problem

Perhaps @cdrini can edit this issue with some more details about how to do this and related tickets.

A clear and concise description of what you want to happen

We want to have author-level want-to-read counts for some work related to wikidata (but also it is nice to have in general).
We will use these counts as a proxy for the popularity of the author later when we are going to be showing things like most popular authors by country.

Expected behaviour / screenshots (ex: Figma design screenshots for UI feature)

  • This only needs to be added to solr
  • It doesn't need to be shown in the UI

Additional Context

Proposal & Constraints

What is the proposed solution / implementation?

Is there a precedent of this approach succeeding elsewhere?

Which suggestions or requirements should be considered for how feature needs to appear or be implemented?

Related files

We currently populate the solr author record with another solr search for every author:

async with httpx.AsyncClient() as client:
response = await client.get(
base_url,
params=[ # type: ignore[arg-type]
('wt', 'json'),
('json.nl', 'arrarr'),
('q', 'author_key:%s' % author_id),
('sort', 'edition_count desc'),
('rows', 1),
('fl', 'title,subtitle'),
('facet', 'true'),
('facet.mincount', 1),
]
+ [('facet.field', '%s_facet' % field) for field in facet_fields],
)
reply = response.json()

We want to update this solr query to also aggregate the want to read/ratings data. We will likely need to switch to use the solr JSON Facet API, which offers easy ways to do things like sum. We'll basically want:

('json.facet', {
    "ratings_count_1": "sum(ratings_count_1)",
    "ratings_count_2": "sum(ratings_count_2)",
    "ratings_count_3": "sum(ratings_count_3)",
    "ratings_count_4": "sum(ratings_count_4)",
    "ratings_count_5": "sum(ratings_count_5)",
    "readinglog_count": "sum(readinglog_count)",
    "want_to_read_count": "sum(want_to_read_count)",
    "currently_reading_count": "sum(currently_reading_count)",
    "already_read_count": "sum(already_read_count)",
})

And then using the results, compute the ratings_average, ratings_sortable and ratings_count by passing in the 1..5 counts to work_ratings_summary_from_counts .

Then overwrite the build method of AuthorSolrBuilder to look like that of the WorkSolrBuilder:

doc |= self.build_ratings() or {}
doc |= self.build_reading_log() or {}

Stakeholders

@cdrini

Note: Before making a new branch or updating an existing one, please ensure your branch is up to date.

Hello, could you assign me to this task? It looks like a lot of fun!

Sorry about the radio silence on this issue. I think I've got a working version of it now, I just need to properly test it.

So, I've gotten it to work as far as I can tell; however, the testing seems to be failing for an unrelated reason. https://pastebin.com/qeKd5U1v

So far as I can tell this has nothing to do with the changes that I have made to the code, and more to do with just the local host's peculiarities.

Awesome, nice! Open a draft PR and we can check it out later this week!

Hmm that error might be related to #9443 maybe ? Do you still get that error on master? If so please create a new issue for it 👍