macbre / nodemw

MediaWiki API and WikiData client written in Node.js

Home Page:https://www.npmjs.com/package/nodemw

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encourage non-conflicting edits

Krinkle opened this issue · comments

When making changes to existing content (e.g. not appending or prepending text), it is important that bots don't accidentally overwrite edits by other users.

The way bots should do this to, when fetching the existing content, pass the revision timestamp to the edit module. This way, if another edit has been made since then, the edit will be rejected. At this point the bot can either try again, or skip the item for the time being.

mwbot provides a getArticle method, but it doesn't expose any meta data besides the page content.

Please provide an easy way for developers to use mwbot to make edits in a way that doesn't cause human edits to be overwritten by default. It should perhaps be an option to ignore conflicts, but by default it probably should not ignore conflicts.

Ideas:

// getPage(string name) -> API query revisions, rvprop=content|timestamp
client.getPage(name, function (err, data) {
  // data.title
  // data.content
  // data.timestamp

  var newContent = change(data.content);

  // Method 1: edit( pageName, content, summary, params, callback )
  client.edit(data.title, newContent, '', { basetimestamp: data.timestamp, fn(err,data) });

  // Method 2: edit( string|Object pageData, content, summary, calllback )
  client.edit(data, newContent, '', function (err, data) { });
});

The second method is probably easiest and encourages developers to use it without hardcoding details of parameters.

I created a quick draft of this for my own bot. See https://gist.github.com/Krinkle/8e1e0e41baaae63f9839d86d918d512c/c1788e5ececff3ddf373944d3dc41ced99be08af#file-wmf-tour-bot-js-L107-L139. Feel free to use it as in any way you like.

    client.edit = function(pageData, content, summary, minor, callback) {
        var params = {
            text: content,
            // Avoid accidentally editing as anonymous user if session expires
            assert: 'user'
        };

        if (typeof minor === 'function') {
            callback = minor;
            minor = undefined;
        }

        if (minor) {
            params.minor = '';
        } else {
            params.notminor = '';
        }

        var title;
        if (typeof pageData === 'object') {
            params.basetimestamp = pageData.revision.timestamp;
            params.starttimestamp = new Date().toISOString();
            // Avoid accidentally creating a new page (e.g. if title string got corrupted,
            // or if page was deleted meanwhile).
            params.nocreate = '';
            title = pageData.title;
        } else {
            title = pageData;
        }

        this.doEdit('edit', title, summary, params, callback);
    };

This protects bots against various problems. The same protections that the MediaWiki web interface uses when you edit via a web browser:

  • Session loss or expiration (assert=user). Avoid making edits as anonymous user if server lost the session.
  • Edit conflict (basetimestamp). Avoid overriding other edits.
  • Re-create (starttimestamp). Avoid re-publishing text that was deleted.
  • Edit type (nocreate/createonly). Avoid creating new pages or edits. Sometimes if the page title is corrupted (e.g. bad encoding) it can happen that the content is fetched from A but saved to B - causing a duplicate page to be created. (E.g. If spaces are cut off and you fetch from "Foo bar" and save to "Foo", or a character encoding problem).

See https://www.mediawiki.org/wiki/API:Edit

@Krinkle, thanks for the code snippet. assert=user can be added automatically when bot operates in logged-in mode.