yhirose / maxminddb

Pure Ruby GeoIP2 MaxMind DB reader.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting 'invalid file format' errors when a lookup for an ip does not find a value

DemitryT opened this issue · comments

Hey there,

I use your gem to do lookups for the city, postal code, country code and lat and long coordinates from a purchased version of the MaxMind GeoIP2-City database.

It seems as though when a lookup for a certain IP fails, the gem raises an "invalid file format" error, when it should simply return nil. Here's the line I'm talking about: https://github.com/yhirose/maxminddb/blob/master/lib/maxminddb.rb#L49

Any ideas here?

Thanks!

Hey, just wondering if you have any ideas here or if you could walk me through the code in the lookup method?

Sorry for the delayed reply.

The method def lookup(ip) tries to lookup the geo data for the given ip by searching in the 'Binary Search Tree Section' descried in the MaxMind DB File Format Specification document.

We shouldn't fall into the line raise 'invalid file format' that you mentioned, unless the database is corrupted. Of course, your database is correct, because you bought it from the MaxMind. So my search logic must has a bug...

Can you please tell me which ip address causes the problem? I'll check the logic thoroughly with it, and get back with you later. (I actually don't have the purchased version, but hopefully I can reproduce the problem with the free version.)

Thanks for your help!

Thanks for the reply! I'll try downloading the file again and reuploading it, as the possibility of a corrupt file is a valid point.

Here are some of the IPs that were throwing that error:

  • 185.23.124.1
  • 178.72.254.1
  • 95.153.177.210
  • 200.148.105.119
  • 195.59.71.43
  • 179.175.47.87
  • 202.67.40.50

Please let me know if you need more or if you will need to test the issue with the full database.

Thanks again!

@DemitryT, are you using the latest version of the database? @yhirose, if you can't reproduce this with GeoLite2 City, we would be happy to provide you with a copy of GeoIP2 City to diagnose the issue.

Hi All,

I tried the IPs with GeoLite2 City, but I can't reproduce the problem. I successfully looked up correct results for all of the IPs. (Please see a new test case.)

I confirmed that the record size of GeoLite2 City is 28 bits. I guess GeoIP2 City uses 32 bits record size that I have never tested with.

@oschwald, thank you for allowing me to use the GeoIP2 CIty data. Can I send you a message to your gmail address on your github profile, so that you can give me a link to the file to download? (Or, please let me know if you can think of a better way. My only concern is that I don't want to expose my personal email address or the location of your licensed database file as well). I'll delete it right way after I finish the debugging.

@yhirose, yes, please email me at goschwald@maxmind.com.

The record size on GeoIP2 City is also 28 bits so I am not sure where the issue would be.

@oschwald, thanks for the database. I fixed the problem. It helped me a lot to locate where the problem is. mmdblookup tool and maxminddb.c source code also helped me to understand the MaxMind DB file format deeply.

@DemitryT, could you check if v1.0.3 fixes your problem?

Glad to help!

@yhirose Yes, the problem has been fixed with v1.0.3. Thanks!

When you have time could you explain what the problem was and how you were able to test and fix it? I don't fully understand the current lookup method, but started looking at the maxmind specification docs you mentioned above. I'm just really curious to understand how this all ties together a bit more so that I can actually debug and create a PR next time.

@oschwald Thanks a lot for providing the full db to test with and helping us out.

@DemitryT, thanks for testing!!

In the commit 0551b8a, I made 3 changes.

Two of them are simply my (stupid or embarrassing) mistakes in def read_record(node_no, flag). Bit mask value 0x7 has been changed to 0xf. + operator has been changed to +=. (maxminddb.rb#L62, maxminddb.rb#L65)

My last change can be seen in def lookup(ip). This is due to my misunderstanding of the MaxMind DB spec. I misinterpreted the sentence "If the record value is equal to the number of nodes, that means that we do not have any data for the IP address, and the search ends here" in Search Lookup Algorithm section. Also I didn't pay attention very much to the example "a 24-bit tree with 1,000 nodes" in the section.

When I started debugging, I tried first to find out which IP caused the problem. Then, I tested the IP with mmdblookup command line tool found in https://github.com/maxmind/libmaxminddb. Of course, the tool didn't have the problem. Then, I compiled the tool with MMDB_DEBUG preprocessor symbol used in maxminddb.c, so that it could print log information about how the binary search algorithm takes place. I put the similar debug print in my look up code as well. I compared my result with the result from mmdblookup tool, and finally found the cause of the problem. I really enjoyed debugging it!!

In order to understand the MaxMind DB file format, I recommend you to read the spec document over and over. It is well written with clear explanations, chars, formulas and examples. Hopefully it will eventually make sense to you. Also it should be helpful to read maxminddb.c to see how the code implements the spec.

Pull requests from you are always welcome!!