Type error upon loading - index.json getting clobbered

Question

Type error upon loading - index.json getting clobbered

marcboivin opened this issue 3 years ago · comments

Marc Boivin commented 3 years ago

Great idea, I was dumping as much stuff as I could into it because it solves a real problem I have.

Thing is, I realized my index didn't have everything in it. Looked at the archive folder all the pages were there.

Tried to open a new session started re-indexing content. Still same issue, not everything was showing in the index.

Started a 3rd time, got :

TypeError: Cannot destructure property 'id' of 'Mo.Index.get(...)' as it is undefined.
    at /Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8330661
    at Array.map (<anonymous>)
    at n.flex (/Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8330639)
    at Object.search (/Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8330865)
    at async /Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8357152

Now I can't use the tool and my indexed content is unusable.

Attached is the MASSIVE error log I got from trying to restart diskernet.

Anyway to solve this?
out.log

Thanks

Cris · Answer 1 · Fri Jan 07 2022 12:44:07 GMT+0800 (China Standard Time)

Thank you! I'm really sorry about this issue. I've seen it as well.

I still have not isolated the cause.

Basically what has happened is the index.json file has been clobbered.

So all your cached resources are still there, and I believe the cache.json file should still be OK.

This is a really terrible thing to happen to your index, I'm sorry!

I don't have a solution right now but I believe it may be possible to rebuild the index.json file and recover it.

A patch I'm intending to release will keep a backup index.json and recover it if it gets clobbered. As well as adding a check before any writes that we are not overwriting an existing one, and to, in any case, save out the existing one, to the backup before writing.

I still can't isolate where Index.json is overwritten with an empty copy, as there are only a couple of places this occurs.

Cris · Answer 2 · Fri Jan 07 2022 12:45:01 GMT+0800 (China Standard Time)

Thanks again for the report @marcboivin ! I really appreciate it and I'm very sorry for you that this happened 😢

Cris · Answer 3 · Fri Jan 07 2022 12:54:49 GMT+0800 (China Standard Time)

I just checked out the out.log -- that is an impressively long error isn't it 😂 😆

It's basically just dumped the entirety of the bundled JavaScript for the entire project out of the executable. I'm still not sure why that happens on crash -- it used to happen with nexe and still occurs with pkg.

I think it is happening because it's trying to output the line where the error occurred. but of course the built JS is all one single "line" (8 Mb long...)

Anyway, this is not the cause of the crash / corruption

Marc Boivin · Answer 4 · Fri Jan 07 2022 21:58:52 GMT+0800 (China Standard Time)

(Edit: corrected typo)

I can confirm the cache looks intact.

I could rebuild the index. Don't mind trying at least.

Pretty sure it's looking for an array indice that doesn't exist because my index.json is nothing like I would expect it to be


[
  [
    "http://www.lockwiki.com/index.php/Main_Page",
    {
      "date": 1641520105000,
      "id": 4,
      "ndx_id": 1000016,
      "title": "Lockwiki"
    }
  ],
  [
    4,
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "http://bjoernkarmann.dk/project_alias",
    {
      "date": 1641520105765,
      "id": 6,
      "ndx_id": 1000017,
      "title": "Bjørn Karmann › project_alias"
    }
  ],
  [
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer",
    {
      "date": 1641520104911,
      "id": 5,
      "ndx_id": 1000015,
      "title": "The Digital Services Playbook — from the U.S. Digital Service"
    }
  ],
  [
    "ndx1000003",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    5,
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000004",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    6,
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000005",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000006",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000007",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000008",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000009",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000010",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000011",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000012",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000013",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000014",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000015",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000016",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000017",
    "http://bjoernkarmann.dk/project_alias"
  ]
]

Cris · Answer 5 · Sat Jan 08 2022 10:19:31 GMT+0800 (China Standard Time)

That's awesome, how did you rebuild the index??

Marc Boivin · Answer 6 · Sat Jan 08 2022 22:45:39 GMT+0800 (China Standard Time)

By hand my good sir.

Backed up the public folder and with a bit of ls and grep magic figured out all the URLs I wanted to index and revisited them ;)

Doing that I found out some of the archive was still not processed and diskermet asked if I wanted to recover. I did and got some back.

So something tells me the process fails at some point but we have no way of knowing when.

Also I noted that one folder was corrupted and Finder (I'm on a Mac) would open the folder.

Marc Boivin · Answer 7 · Sat Jan 08 2022 22:52:19 GMT+0800 (China Standard Time)

If you're curious, I used this as a starting point

grep -r GEThttp ./ | cut -d ':' -f 4 | cut -d '?' -f 1

Cris · Answer 8 · Mon Jan 10 2022 19:27:00 GMT+0800 (China Standard Time)

By hand my good sir.

Backed up the public folder and with a bit of ls and grep magic figured out all the URLs I wanted to index and revisited them ;)

Doing that I found out some of the archive was still not processed and diskermet asked if I wanted to recover. I did and got some back.

So something tells me the process fails at some point but we have no way of knowing when.

Also I noted that one folder was corrupted and Finder (I'm on a Mac) would open the folder.

You're awesome! That's so good! 😆 😂 ✊🏻 !!