dosyago / DownloadNet

💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!

Home Page:https://localhost:22120

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Type error upon loading - index.json getting clobbered

marcboivin opened this issue · comments

Great idea, I was dumping as much stuff as I could into it because it solves a real problem I have.

Thing is, I realized my index didn't have everything in it. Looked at the archive folder all the pages were there.

Tried to open a new session started re-indexing content. Still same issue, not everything was showing in the index.

Started a 3rd time, got :

TypeError: Cannot destructure property 'id' of 'Mo.Index.get(...)' as it is undefined.
    at /Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8330661
    at Array.map (<anonymous>)
    at n.flex (/Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8330639)
    at Object.search (/Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8330865)
    at async /Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8357152

Now I can't use the tool and my indexed content is unusable.

Attached is the MASSIVE error log I got from trying to restart diskernet.

Anyway to solve this?
out.log

Thanks

commented

Thank you! I'm really sorry about this issue. I've seen it as well.

I still have not isolated the cause.

Basically what has happened is the index.json file has been clobbered.

So all your cached resources are still there, and I believe the cache.json file should still be OK.

This is a really terrible thing to happen to your index, I'm sorry!

I don't have a solution right now but I believe it may be possible to rebuild the index.json file and recover it.

A patch I'm intending to release will keep a backup index.json and recover it if it gets clobbered. As well as adding a check before any writes that we are not overwriting an existing one, and to, in any case, save out the existing one, to the backup before writing.

I still can't isolate where Index.json is overwritten with an empty copy, as there are only a couple of places this occurs.

commented

Thanks again for the report @marcboivin ! I really appreciate it and I'm very sorry for you that this happened 😢

commented

I just checked out the out.log -- that is an impressively long error isn't it 😂 😆

It's basically just dumped the entirety of the bundled JavaScript for the entire project out of the executable. I'm still not sure why that happens on crash -- it used to happen with nexe and still occurs with pkg.

I think it is happening because it's trying to output the line where the error occurred. but of course the built JS is all one single "line" (8 Mb long...)

Anyway, this is not the cause of the crash / corruption

(Edit: corrected typo)

I can confirm the cache looks intact.

I could rebuild the index. Don't mind trying at least.

Pretty sure it's looking for an array indice that doesn't exist because my index.json is nothing like I would expect it to be


[
  [
    "http://www.lockwiki.com/index.php/Main_Page",
    {
      "date": 1641520105000,
      "id": 4,
      "ndx_id": 1000016,
      "title": "Lockwiki"
    }
  ],
  [
    4,
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "http://bjoernkarmann.dk/project_alias",
    {
      "date": 1641520105765,
      "id": 6,
      "ndx_id": 1000017,
      "title": "Bjørn Karmann › project_alias"
    }
  ],
  [
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer",
    {
      "date": 1641520104911,
      "id": 5,
      "ndx_id": 1000015,
      "title": "The Digital Services Playbook — from the U.S. Digital Service"
    }
  ],
  [
    "ndx1000003",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    5,
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000004",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    6,
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000005",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000006",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000007",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000008",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000009",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000010",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000011",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000012",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000013",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000014",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000015",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000016",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000017",
    "http://bjoernkarmann.dk/project_alias"
  ]
]

commented

That's awesome, how did you rebuild the index??

By hand my good sir.

Backed up the public folder and with a bit of ls and grep magic figured out all the URLs I wanted to index and revisited them ;)

Doing that I found out some of the archive was still not processed and diskermet asked if I wanted to recover. I did and got some back.

So something tells me the process fails at some point but we have no way of knowing when.

Also I noted that one folder was corrupted and Finder (I'm on a Mac) would open the folder.

If you're curious, I used this as a starting point

grep -r GEThttp ./ | cut -d ':' -f 4 | cut -d '?' -f 1

commented

By hand my good sir.

Backed up the public folder and with a bit of ls and grep magic figured out all the URLs I wanted to index and revisited them ;)

Doing that I found out some of the archive was still not processed and diskermet asked if I wanted to recover. I did and got some back.

So something tells me the process fails at some point but we have no way of knowing when.

Also I noted that one folder was corrupted and Finder (I'm on a Mac) would open the folder.

You're awesome! That's so good! 😆 😂 ✊🏻 !!