Enriching destination data by using henleyglobal.com
iamCristYe opened this issue · comments
I noticed that henleyglobal.com offers more destinations compared with passportindex.org
Should we integrate data from both sides for a more comprehensive result?
I noticed there's API from passportindex.org like https://api.henleypassportindex.com/api/passports/US/countries and I can help to PR if you think that's a good idea.
Hi @iamCristYe
Thanks for sharing this API endpoint - I wasn't aware of it. It looks like they have 227 destinations whereas PassportIndex is 199. The latter provides types of visa-free access though (ie number of visa-free days, e-visa etc) so I'm conflicted whether one should mix the two. Also - what do you do when there is conflicting info 🙈?
I guess with tidy datasets one could add extra destinations (rows) from Henley, and record the source (passportindex vs henley) in a new column. It's trickier with the matrix format where that column would have to be dropped - but maybe we don't need matrix format altogether (or I overestimate data source importance).
If you're willing to expand the existing notebook to query the API for missing passports/destinations, I'd be happy to review. I'd use the existing passportindex source as base and only add extras from Henley.
Sorry for my late reply. I downloaded the data from HPI only to see a huge difference. for 200 regions, there's around 40000 combinations, and 7000+ of them are different. I manually checked the differences, PI is right in some cases and HPI is right in other cases. I'm thinking of using Wikipedia to solve the conflict 🤦♂️🤦♂️
I've created a file for your reference: https://github.com/iamCristYe/passport-index-dataset/blob/master/diff.csv
Oh wow. The issue is, one can "solve" it once, but with monthly updates it's probably best to stick to one source or the other. Maybe there should be another repository called henley-passport-index-dataset?
Sure a separate repo would solve the problem but isn't our aim to build a dataset that can reflect the actual situation in the real world? By saying that, I mean, we should provide data that is most accurate in our capabilities, so we should check and verify multiple data sources. By using Wikipedia, I was thinking of using some automation scripts to not manually, but automatically update the results.
My two cents here, there's a IATA-based product called Timatic that manages this exact type of information. The problem is that it seems to be a paid alternative. I really don't know, but I'm into it.
I'll circle back soon with more information.
Maybe we can get access somehow... but it seems that there are about 70 rule changes everyday.
https://gist.github.com/hezhao/b6fe9f5aa5c70e7d93fc
I found also that's possible to get a lot of information from a Timatic Web trial account, sadly limited on number of request. Try this on Postman. You can modify pvh_destinationcountrycode
and pvh_nationalitycode
(eg. US) for pairs of interests, both only requirement to fetch data.
curl --location --request POST 'https://www.timaticweb2.com/request/pvh' \
--header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Origin: https://www.timaticweb2.com' \
--header 'Cookie: language=en_EN; AWSALB=mbrleow/5Cp1zsPcQy91LomZ6M2sRvr9+SzUpqthOm38FCM+UJzMTmaJBVEVEh/y/8oxVJIMs8LpD1K9BnRXUo81SsHxeKd8EkWk8CdzeEP/AotbLxnc6UCmzs8l; AWSALBCORS=mbrleow/5Cp1zsPcQy91LomZ6M2sRvr9+SzUpqthOm38FCM+UJzMTmaJBVEVEh/y/8oxVJIMs8LpD1K9BnRXUo81SsHxeKd8EkWk8CdzeEP/AotbLxnc6UCmzs8l; KONSOLIDATETRACKER=a85ca51f2a697fd3303e09ba5f37f65c; language=en_EN; __utma=232171546.2099270134.1673550295.1673550295.1673550403.2; __utmb=232171546.40.10.1673550403; __utmc=232171546; __utmt=1; __utmz=232171546.1673550403.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); accept=true; __gads=ID=da53c44494b9a43b:T=1673546709:RT=1673551161:S=ALNI_MYjOdf-rDhMVitmDMAvBO-6FqZ1pg; __gpi=UID=0000057818a8a1f1:T=1673546709:RT=1673546709:S=ALNI_MYDxysu3VqhdNxbIGs7Bziv1tD0MA; PHPSESSID=5n54fkkt8u774cokv7aapba7ad; language=en_EN; AWSALB=T/pqKm7e9qAzj+6oW1TL5Lp5TUtB0ShyLIAFBRhSx1DoHRTFd9vzL6wdSDmHoTIRUaFvrVz0VifOXdEGTAXTCSX8N+8Fy09DD2VPi0R4mDCy30nU4qhH9aN1Vsbt; AWSALBCORS=T/pqKm7e9qAzj+6oW1TL5Lp5TUtB0ShyLIAFBRhSx1DoHRTFd9vzL6wdSDmHoTIRUaFvrVz0VifOXdEGTAXTCSX8N+8Fy09DD2VPi0R4mDCy30nU4qhH9aN1Vsbt; KONSOLIDATETRACKER=a85ca51f2a697fd3303e09ba5f37f65c' \
--header 'Content-Length: 3150' \
--header 'Accept-Language: en-US,en;q=0.9' \
--header 'Host: www.timaticweb2.com' \
--header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Safari/605.1.15' \
--header 'Referer: https://www.timaticweb2.com/home/CH' \
--header 'Accept-Encoding: gzip, deflate, br' \
--header 'Connection: keep-alive' \
--data-urlencode 'interfaceprocessor=force' \
--data-urlencode 'command=dynamicpvhrequestform' \
--data-urlencode 'templateid=' \
--data-urlencode 'pvh_id=19988100' \
--data-urlencode 'pvh_template=0' \
--data-urlencode 'pvh_template_name=0' \
--data-urlencode 'pvh_template_save=0' \
--data-urlencode 'trialuser=1' \
--data-urlencode 'pvh_destinationcountrycode=US' \
--data-urlencode 'pvh_destinationcountrycode_code=' \
--data-urlencode 'pvh_arrivaldate_day=12' \
--data-urlencode 'pvh_arrivaldate_month=01' \
--data-urlencode 'pvh_arrivaldate_year=2023' \
--data-urlencode 'pvh_arrivaldate=2023-01-12' \
--data-urlencode 'pvh_departairportcode=' \
--data-urlencode 'pvh_departairportcode_code=' \
--data-urlencode 'pvh_departairportcode=' \
--data-urlencode 'pvh_carriercode=00' \
--data-urlencode 'pvh_stayduration_days=' \
--data-urlencode 'pvh_stayduration_type=days' \
--data-urlencode 'pvh_returndate=' \
--data-urlencode 'pvh_stayduration=' \
--data-urlencode 'pvh_staytype=' \
--data-urlencode 'pvh_ticket=' \
--data-urlencode 'pvh_transitcountrycode_1=' \
--data-urlencode 'pvh_transitcountrycode_code_1=' \
--data-urlencode 'pvh_transit_arrivaldate_day_1=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_1=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_1=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_1=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_1=' \
--data-urlencode 'pvh_transit_departuredate_day_1=12' \
--data-urlencode 'pvh_transit_departuredate_month_1=01' \
--data-urlencode 'pvh_transit_departuredate_year_1=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_1=1:00' \
--data-urlencode 'pvh_transit_departuredate_1=' \
--data-urlencode 'pvh_transitcountrycode_ticket_1=' \
--data-urlencode 'pvh_transitcountrycode_2=' \
--data-urlencode 'pvh_transitcountrycode_code_2=' \
--data-urlencode 'pvh_transit_arrivaldate_day_2=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_2=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_2=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_2=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_2=' \
--data-urlencode 'pvh_transit_departuredate_day_2=12' \
--data-urlencode 'pvh_transit_departuredate_month_2=01' \
--data-urlencode 'pvh_transit_departuredate_year_2=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_2=1:00' \
--data-urlencode 'pvh_transit_departuredate_2=' \
--data-urlencode 'pvh_transitcountrycode_ticket_2=' \
--data-urlencode 'pvh_transitcountrycode_3=' \
--data-urlencode 'pvh_transitcountrycode_code_3=' \
--data-urlencode 'pvh_transit_arrivaldate_day_3=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_3=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_3=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_3=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_3=' \
--data-urlencode 'pvh_transit_departuredate_day_3=12' \
--data-urlencode 'pvh_transit_departuredate_month_3=01' \
--data-urlencode 'pvh_transit_departuredate_year_3=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_3=1:00' \
--data-urlencode 'pvh_transit_departuredate_3=' \
--data-urlencode 'pvh_transitcountrycode_ticket_3=' \
--data-urlencode 'pvh_transitcountrycode_4=' \
--data-urlencode 'pvh_transitcountrycode_code_4=' \
--data-urlencode 'pvh_transit_arrivaldate_day_4=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_4=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_4=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_4=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_4=' \
--data-urlencode 'pvh_transit_departuredate_day_4=12' \
--data-urlencode 'pvh_transit_departuredate_month_4=01' \
--data-urlencode 'pvh_transit_departuredate_year_4=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_4=1:00' \
--data-urlencode 'pvh_transit_departuredate_4=' \
--data-urlencode 'pvh_transitcountrycode_ticket_4=' \
--data-urlencode 'pvh_transitcountrycode_5=' \
--data-urlencode 'pvh_transitcountrycode_code_5=' \
--data-urlencode 'pvh_transit_arrivaldate_day_5=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_5=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_5=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_5=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_5=' \
--data-urlencode 'pvh_transit_departuredate_day_5=12' \
--data-urlencode 'pvh_transit_departuredate_month_5=01' \
--data-urlencode 'pvh_transit_departuredate_year_5=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_5=1:00' \
--data-urlencode 'pvh_transit_departuredate_5=' \
--data-urlencode 'pvh_transitcountrycode_ticket_5=' \
--data-urlencode 'pvh_nationalitycode=CL' \
--data-urlencode 'pvh_documenttype=passport' \
--data-urlencode 'pvh_issuecountrycode=CL' \
--data-urlencode 'pvh_issuedate_day=00' \
--data-urlencode 'pvh_issuedate_month=00' \
--data-urlencode 'pvh_issuedate_year=0000' \
--data-urlencode 'pvh_issuedate=' \
--data-urlencode 'pvh_expirydate_day=00' \
--data-urlencode 'pvh_expirydate_month=00' \
--data-urlencode 'pvh_expirydate_year=0000' \
--data-urlencode 'pvh_expirydate=' \
--data-urlencode 'pvh_residentcountrycode=CL' \
--data-urlencode 'pvh_residencydocument=' \
--data-urlencode 'pvh_birthdate_day=00' \
--data-urlencode 'pvh_birthdate_month=00' \
--data-urlencode 'pvh_birthdate_year=1980' \
--data-urlencode 'pvh_birthdate=' \
--data-urlencode 'pvh_passportseries=' \
--data-urlencode 'pvh_secondarydocumenttype=' \
--data-urlencode 'pvh_documentfeature='