DrHyde / perl-modules-Number-Phone

Number::Phone and friends

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DBM::Deep, __DATA__, preforking, and things blowing up

xsawyerx opened this issue · comments

At $work we preload Phone::Number and the UK data module crashes:

DBM::Deep: Cannot write to a readonly filehandle at /.../lib/Number/Phone/UK.pm line 107
DBM::Deep: '30': Don't know what to do with type 'â' at /.../lib/Number/Phone/UK.pm line 107
DBM::Deep: Cannot read sector at 38987384 in get_bucket_list() at /.../lib/Number/Phone/UK.pm line 107

Our test code:

use strict;
use warnings;
use Number::Phone::UK;
use Number::Phone::UK::Data;
my $pid = fork();
if ( $pid ) {
  Number::Phone->new('+44 1234 567890');
  wait();
} else {
  Number::Phone->new('+44 1234 567890');
}

$ perl dbm-deep-debug.pl
DBM::Deep: Cannot write to a readonly filehandle at /.../lib/Number/Phone/UK.pm line 107
$ perl dbm-deep-debug.pl
Can't locate object method "find_md5" via package "DBM::Deep::Sector::File::Scalar" at /.../lib/DBM/Deep/Sector/File/Reference.pm line 295.

(This might be related to https://rt.cpan.org/Public/Bug/Display.html?id=114349.)

In any case, there are several ideas on handling this to assure proper preloading of Number::Phone:

  1. Use something else instead of DBM::Deep. We've tested it with Sereal and although it's significantly slower, it works.
  2. Same as 1. but only optionally.
  3. Allow an easy way to exhaust all of the data from within DBM::Deep so it could be done during preforking phase.
  4. Your idea here.

I've got a dev release out (3.4001_02) that moves the database file out from __DATA__ into a separate file, so maybe that will fix your problem. If it doesn't let me know and I'll investigate further.

In addition if you can test it on Windows (I don't have access to that platform) that would be great!

  1. Wrap the database in an accessor that detects if we've forked since it was opened and re-opens if necessary. Eeuuww.

I try to build the dist target from trunk, but I get this:

Illegal or missing directory 'share' at /usr/local/perl/5.18.2/site/lib/File/ShareDir/Install.pm line 105.
        File::ShareDir::Install::_add_dir('HASH(0x7fba628182b8)', 'share') called at /usr/local/perl/5.18.2/site/lib/File/ShareDir/Install.pm line 42
        File::ShareDir::Install::install_share() called at Makefile.PL line 6

Is there some configuration needed before making dist?

I don't think Sereal is an option, it appears to de-serialize the database into an in-memory perl data structure. The whole point of using DBM::Deep is to not do that. Earlier versions used an in-memory structure and it was far too big. That said, DBM::Deep is designed to be very memory-efficient so you could maybe work around the problem by not loading the database until after you've forked.

@39832 you need to build several bits of it using build-data.sh. That will in turn run a few other scripts. Those scripts have several extra perl dependencies that aren't declared in Makefile.PL, and will also require curl, git, and probably some other stuff I've forgotten.

Sure, but dist is building for us as of the last release. File::ShareDir::Install is a new dependency and it seems to require something new in the build environment, in addition to simply being installed.

… Really I'm only trying to test the latest code as a distribution. How may I download this dev release 3.4001_02?

… Ah, we only need to mkdir share because, being an empty directory considering the .gitignore, it doesn't exist from a git clone.

The problem with delaying loading the data from DBM::Deep is two-fold: 1. We would lose speed because we wouldn't be able to preload that data. Considering the size of this, it's a major burden to added runtime per request. 2. We cannot account that nothing else might accidentally load it.

Are there no alternatives to DBM::Deep that are fork-friendly?

If not, is it at least possible to introduce an optional preloadable scenario? An example could be:

# Use something other than DBM::Deep:
local $Number::Phone::DBM_DEEP = 0;
require Number::Phone::Data::UK;
Number::Phone::Data::UK->import();

(Or maybe a bettert one I haven't come up with on the spot.)

By using the N::P::UK::Data module you're not actually pre-loading anything apart from a few insignificant lines of code. perl doesn't slurp files into memory until they're read (DATA counts as a file in 3.4001 and earlier) and DBM::Deep seek()s all over it and tries to avoid slurping it into memory.

You have a valid point about other stuff loading it, although there's no reason for anything outside Number::Phone::UK to do so.

I'm not aware of any other data serialisation module that:

  • doesn't slurp the data into memory;
  • allows random access to nested data structures

I'd welcome a patch to make it optionally pre-load everything into memory though. I'd do it myself, except that I'm off on holiday for the next few days.

Before you do that, try ^^^ that nasty hack.

I can't replicate the error using your script, at least not on the machine I'm on right now, but maybe it'll help. Exactly what versions of perl and DBM::Deep are you using?

Commit 72cb37b has a Thing for slurping the database into memory.

Thank you, David! We will try this out and let you know.

Do please try the re-opening thingy as well as the slurping. Even if you end up slurping it all into memory so it runs faster once loaded I'd be interested to know if I've still got a forking problem to solve.

Can I assume everything worked OK?

Didn't get to test it yet. :/

Unfortunately not yet. I'll try this week!
(Thanks for pinging!)

We've deployed the DBM::Deep reopening fix since pull request #74 was opened. It works perfectly. We haven't tried deploying the slurp. Assuming it prevents the file handle from being touched, it should probably work, given that we previously solved the issue by slurping.

Thanks, I'll mark this as closed.

The next release is imminent and will include that fix.