Giters
spencermountain
/
dumpster-dive
roll a wikipedia dump into mongo
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
236
Watchers:
12
Issues:
62
Forks:
45
spencermountain/dumpster-dive Issues
Error messages when loading wikipedia pages into mongodb
Updated
a year ago
Comments count
3
Expose the instance of WTF to the options so that it's possible to use .extend()
Updated
a year ago
Comments count
7
ERR_REQUIRE_ESM
Closed
2 years ago
Comments count
3
Adding page view counts?
Closed
3 years ago
Comments count
2
wtf_wikipedia doc.wikidata() is always null
Updated
3 years ago
Comments count
1
Update wtf_wikipedia and allow wtf.extend()
Closed
3 years ago
Comments count
2
Default options when not using `bin/dumpster.js`
Updated
3 years ago
Comments count
1
db_url is always taken from config.js
Closed
3 years ago
Comments count
3
Not all pages are loaded into the afwiki database
Closed
4 years ago
Comments count
7
Secured Database URL is not accepted
Closed
4 years ago
Comments count
4
MongoServerSelectionError: connection <monitor> to 127.0.0.1:27017 timed out
Closed
4 years ago
Comments count
3
Process stopped dumping after a while
Closed
4 years ago
Comments count
3
Other feeds
Updated
4 years ago
Comments count
1
MongoError: BSONObj Size
Closed
4 years ago
Comments count
4
Any suggestions on running in this in the cloud?
Closed
4 years ago
Comments count
2
The process does not end on Windows
Closed
4 years ago
Comments count
2
Pages not imported
Closed
4 years ago
Comments count
6
Missing ids on windows
Closed
4 years ago
Comments count
37
--plaintext not working
Closed
4 years ago
Comments count
2
Parsing bug with multiple workers
Updated
5 years ago
Comments count
6
Title parsing error in custom function
Closed
5 years ago
Comments count
3
missing pages, silent
Closed
6 years ago
Comments count
12
Couldn't find some tags after importing the dump
Closed
6 years ago
Comments count
6
Support wikia dumps
Closed
6 years ago
Comments count
3
Encoding of table keys
Closed
6 years ago
Comments count
11
16mb record issue
Closed
6 years ago
Comments count
29
xml title passes through doc object
Closed
6 years ago
Comments count
12
Markdown mode collapses paragraphs
Closed
6 years ago
Comments count
1
Error on keys with periods
Closed
6 years ago
Comments count
1
html conversion doesn't work for links/references
Closed
6 years ago
Comments count
15
Output
Closed
6 years ago
Comments count
11
disambiguation-page parsing
Closed
6 years ago
Comments count
2
cpu worker process same records multiple times
Closed
6 years ago
Comments count
2
Number of CPU workers doesn't work
Updated
6 years ago
Comments count
2
missing important pages in parsed results
Closed
6 years ago
Comments count
9
Wiktionary parses but may be missing some pages
Updated
6 years ago
Comments count
2
TypeError: i.json is not a function
Closed
6 years ago
Comments count
14
Cannot insert pages with section titles starting with $
Closed
6 years ago
Comments count
8
Article count of enwiki
Closed
6 years ago
Comments count
10
this doesn't work `wp2mongo /path/to/my-wikipedia-article-dump.xml.bz2`
Closed
6 years ago
Comments count
9
node src/worker.js not working
Closed
6 years ago
Comments count
3
Is the dev branch supposed to be working at all?
Closed
6 years ago
Comments count
6
Great progress ! ... console is outputting "Error: Cannot find module '../../infobox/infobox' ..."
Closed
6 years ago
Comments count
35
pageid field
Closed
6 years ago
Comments count
11
Type error while loading data into mongo using dumpster
Closed
6 years ago
Comments count
3
Suggested option to omit redirects and disambiguation pages
Closed
6 years ago
Comments count
2
wp2mongo hangs during article insertion.
Closed
6 years ago
Comments count
60
Not importing
Closed
6 years ago
wp2mongo fails with an error
Closed
6 years ago
Comments count
1
TypeError with worker.js
Closed
7 years ago
Comments count
1
Previous
Next