lucaswerkmeister / m3api

minimal modern MediaWiki API client

Home Page:https://www.npmjs.com/package/m3api

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatically combine compatible requests

lucaswerkmeister opened this issue · comments

The MediaWiki Action API, especially the “query” action, allows performing many different actions in a single API request. For example, you can get the categories, outgoing links, and recent revisions, of several pages, together with general site information, all at once:

session.request( {
    action: 'query',
    prop: [ 'categories', 'links', 'revisions' ],
    meta: 'siteinfo',
    titles: [ 'Page 1', 'Page 2' ],
} );

However, this requires that the request parameters are combined at some point. Usually, this requires some programmer effort: either you hard-code the request parameters directly, as above, or you encapsulate the request parts into several functions, but split each of them, and then have a phase where each function adds to the parameter set, then you make the request, and then you have another phase where each function extracts the relevant part out of the common response (example).

In JavaScript, we can do better. Since all API requests are asynchronous, we can postpone each request a tiny little bit, and, just before actually sending it, check if it can be combined with any other requests that haven’t yet been sent either. This can happen at the library level, so that application-level code doesn’t have to worry about it very much; for example:

async function getSiteName( session ) {
    const response = await session.request( {
        action: 'query',
        meta: new Set( [ 'siteinfo' ] ),
        siprop: new Set( [ 'general' ] ),
    } );
    return response.query.general.sitename;
}

async function getPageCount( session ) {
    const response = await session.request( {
        action: 'query',
        meta: new Set( [ 'siteinfo' ] ),
        siprop: new Set( [ 'statistics' ] ),
    } );
    return response.query.statistics.pages;
}

const [ siteName, pageCount ] = await Promise.all( [
    getSiteName( session ),
    getPageCount( session ),
] );

Here, m3api should detect that the two requests can be combined into a single request with siprop: new Set( [ 'general', 'statistics' ] ). Set will be used to distinguish multi-valued parameters that can be combined (and are order-insensitive) from ones that can’t be combined and are order-sensitive (arrays).

Disclaimer: We’ve previously implemented something very similar at WMDE for Wikidata Bridge (BatchingApi.ts). However, I haven’t looked at that code in over a year, and plan to reimplement the idea from scratch (and in JavaScript, not TypeScript), without direct reference to the old code. I believe this does not infringe on WMDE’s copyright (on the code that was largely written by me in the first place).

Hm, apparently combining requests is a bit more fragile than I anticipated. I was aware of some potential issues, such as, if you make a request without formatversion, and expect that to mean the default formatversion=1, you’ll have problems if it gets combined with an explicit formatversion=2 request (for which the solution is, make sure to explicitly specify a formatversion) – but the interaction of continuation seems to be a bit more complicated.

Specifically: if you specify continue= generator=, then any titles= are simply… ignored? Really? https://en.wikipedia.org/w/api.php?action=query&format=json&titles=A|B|C&generator=allpages&gapfrom=!&gapto=!!!

I think for now I’ll hard-code continue generator as being incompatible with titles, pageids, revids (the ApiPageSet parameters), but this is probably another thing to revisit before 1.0.0.

Hm, but my edits to the above comments notwithstanding, continue is also incompatible with titles. https://en.wikipedia.org/w/api.php?action=query&format=json&titles=A|B|C&continue=-||

Ohh, the reason generator is incompatible with titles is that the titles become the input of the generator, e.g. generator=links&titles=Main Page for the outgoing links of the main page. (Same for pageids and revids, presumably.)