tomayac / local-reverse-geocoder

Local reverse geocoder for Node.js based on GeoNames data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crash when replacing yesterday's geodata (Extract.writer.error [Error: ENOENT: no such file or directory, lstat '/tmp/geonames/cities/cities1000.txt'])

tkafka opened this issue · comments

Reproduction steps:

  1. Run local-reverse-geocoder, let it create files with dates in name (eg. cities/cities1000_2021-03-19.txt)
  2. Wait another day (or rename file to older date, eg. cities/cities1000_2021-03-19.txtcities/cities1000_2021-03-15.txt)
  3. Start local-reverse-geocoder again.

What I expected:

Geocoder updates its files.

What actually happens

Geocoder crashes with Extract.writer.error [Error: ENOENT: no such file or directory, lstat '/tmp/geonames/cities/cities1000.txt'].

The cause is this:

  • we call unzip.Extract
  • Extract extracts files from downloaded zip into a folder and emits close
  • We rename and unlink
  • Extract uses fstream module, and it calls stat on extracted file after calling close - see code!
  • Calling stat on now nonexistent file causes crash (error is emitted from fstream, and not handled by anyone).

Fix ideas:

  1. Fix the underlying stream libs (fstream shouldn't touch the extracted file after it passes close event)
  2. Ignore error event from unzip.Extract
extractStream
          .on('error', function (err) {
            if (err.code === 'ENOENT') {
              // ignore - fstream writer runs stat/lstat after extract is finished (and .close sent), and it crashes, because we have already used and moved/deleted it's file
            } else {
              throw err
            }
          })
  1. Use unzip to extract just a single file we need, in memory, without writing to disk
  2. Keep the extracted file (eg. cities1000.txt), so that we allow fstream to touch it however it wants after the extraction.

Beware:

All _get*data methods have this problem, it needs to be fixed everywhere.

It is very curious that no one has reported this before, as it happens every time you don't clean up the data and restart local-reverse-geocoder. I may have discovered it by moving dumpDirectory to /tmp/geonames (which I believe should be a default place, the geo database shouldn't be just dumped inside node_modules).

I think what you're asking for is making GEONAMES_DUMP user-configurable. I'm happy to merge a PR that adds this.

@tomayac Hi Thomas, that was already configurable - see here.

I dived into the error with streams, but I think that the real problem is that one of node-unzip-2, fstream, or their interactions is wrongly written. However, there was no need to actually store the unzipped files on a disk, so I forked local-reverse-geocoder to https://github.com/tkafka/local-reverse-geocoder and rewrote it to use unzip-stream to extract the wanted file from stream to disk.

See tkafka@6e96500

@tomayac Hi Thomas, that was already configurable - see here.

Oh, sorry. It's been a while that I touched this code.

I dived into the error with streams, but I think that the real problem is that one of node-unzip-2, fstream, or their interactions is wrongly written. However, there was no need to actually store the unzipped files on a disk, so I forked local-reverse-geocoder to https://github.com/tkafka/local-reverse-geocoder and rewrote it to use unzip-stream to extract the wanted file from stream to disk.

Thanks for working on this, but…

See tkafka@6e96500

…I wonder why you went for forking this. I am and always was happy to merge PRs. I'd be happy to add you to the list of contributors.

@tomayac Please don't take it personally, I just needed it fixed quickly. I just created a pull request #40, and I will be happy to delete local-reverse-geocoder-2 from NPM after this fix is published.

I have been running it on my server for past few days, and finally the geocoder reloads smoothly without crash, even when server is rebooted a day after previous start.

Feel free to add me on a contributor list - thank you!