recover best copy of troubled pages

Question

recover best copy of troubled pages

WardCunningham opened this issue 8 years ago · comments

Various attempts to do the right thing has resulted in a mix of character encodings both among and within wiki pages. We have scattered resources for resolving many if not most of these issues which we will catalog here. We will lump in a few other poorly defined losses which might share solutions. This is probably the biggest blocker for #2 and the restoration implied there.

filesystem errors, fsck failure and raid resync
robot edit wars, weeks of conflict before going read-only
tar file omissions, tar-t reports 33 more files than tar-x makes
ruby encoding errors, rescue skips about 1000 files
unwanted encodings, such as �, =XX, and &XX; in pages

To this list I will add possible remediations and check them off as they are fully applied. Suggestions are welcome. Don't be concerned if I remove comments once they have been understood and incorporated in this list. I will also remove remediations that don't work out.

consult filesystem level backups for missing files
consult application level history for missing or abused pages
convert or ignore bad utf-8 characters using ruby's encoding mechanisms
consult textfiles and other scraped markup archives
consult archive.org or other scraped html archives

Finally I will assemble a list of suspicious pages that illustrate some malfunction that will serve as test cases for improved algorithms and workflows.

http://wiki.c2.com/?WikiCitoyen, � in non-english words

Ward Cunningham · Answer 1 · Mon Nov 14 2016 01:48:29 GMT+0800 (China Standard Time)

From my post announcing the remodeling ...

Ward Cunningham · Answer 2 · Tue Dec 04 2018 09:58:47 GMT+0800 (China Standard Time)

Some progress in this pull request: #32