Question: filtered list of articles for a project
tim-moody opened this issue · comments
Is it possible to get the list of articles for a project that have changed since some date?
For example, the list of all Project WikiMed articles is in medicine.tsv. I would like to have a list of only those articles that changed since some (recent) date.
@tim-moody Why not donwloading the versions you need and make a diff?
The purpose is to calculate the list of articles that will change in the next version without having to create the next version to do so.
q = 'https://www.mdwiki.org/w/api.php?action=query&format=json&list=recentchanges&rclimit=max&rctoponly&rcprop=redirect|title'
q += '&rcnamespace=0&rcstart=now&rcend=' + since
gets all changes since the since date, but for the entire wiki.
For enwp I only want the pages associated with wiki project med.
I thought this might be functionality available in the wp1 api.
OK, lets take it from the beginning: what is the problem you want to solve and how this is related to build (offline) selections?
The purpose is to calculate the list of articles that will change in the next version without having to create the next version to do so.
This allows mdwiki-cacher to use its cache for articles that have not changed, and only query the source for articles that have changed since the last run.
How is this related to this project? As far as I know WP1 selection tools has "nothing" directly to do with MDwiki.
It would allow me to know which enwp articles have changed since the last time medicine.tsv was created. My understanding is that WP1 produces a list of articles from enwp for various projects. I am only asking whether that list could optionally be only articles that have changed since the last time the list was produced.
We only keep track of the time when the article's quality or importance score changed, not when the article itself changed. Is that what you're looking for?
I'm looking for changes to the article, not the score. I'm trying to avoid reading every article in the list into my cache if the article hasn't changed since the last time I read it. I was hoping you would have an api for that, but it seems not.
Yes, not the role of the project to handle upstream wiki to handle there cache strategy
With access to enwiki_p
, the replica database, you could probably do something with rev ids, but I'm not sure