ilyankou / passport-index-dataset

Passport Index 2024: visa requirements for 199 countries, in .csv

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enriching destination data by using henleyglobal.com

iamCristYe opened this issue · comments

I noticed that henleyglobal.com offers more destinations compared with passportindex.org

Should we integrate data from both sides for a more comprehensive result?

I noticed there's API from passportindex.org like https://api.henleypassportindex.com/api/passports/US/countries and I can help to PR if you think that's a good idea.

Hi @iamCristYe

Thanks for sharing this API endpoint - I wasn't aware of it. It looks like they have 227 destinations whereas PassportIndex is 199. The latter provides types of visa-free access though (ie number of visa-free days, e-visa etc) so I'm conflicted whether one should mix the two. Also - what do you do when there is conflicting info 🙈?

I guess with tidy datasets one could add extra destinations (rows) from Henley, and record the source (passportindex vs henley) in a new column. It's trickier with the matrix format where that column would have to be dropped - but maybe we don't need matrix format altogether (or I overestimate data source importance).

If you're willing to expand the existing notebook to query the API for missing passports/destinations, I'd be happy to review. I'd use the existing passportindex source as base and only add extras from Henley.

Sorry for my late reply. I downloaded the data from HPI only to see a huge difference. for 200 regions, there's around 40000 combinations, and 7000+ of them are different. I manually checked the differences, PI is right in some cases and HPI is right in other cases. I'm thinking of using Wikipedia to solve the conflict 🤦‍♂️🤦‍♂️

I've created a file for your reference: https://github.com/iamCristYe/passport-index-dataset/blob/master/diff.csv

Oh wow. The issue is, one can "solve" it once, but with monthly updates it's probably best to stick to one source or the other. Maybe there should be another repository called henley-passport-index-dataset?

Sure a separate repo would solve the problem but isn't our aim to build a dataset that can reflect the actual situation in the real world? By saying that, I mean, we should provide data that is most accurate in our capabilities, so we should check and verify multiple data sources. By using Wikipedia, I was thinking of using some automation scripts to not manually, but automatically update the results.

My two cents here, there's a IATA-based product called Timatic that manages this exact type of information. The problem is that it seems to be a paid alternative. I really don't know, but I'm into it.

I'll circle back soon with more information.

Maybe we can get access somehow... but it seems that there are about 70 rule changes everyday.
https://gist.github.com/hezhao/b6fe9f5aa5c70e7d93fc

Sherpa is another service that I found could give you more up to date insights.

Also. 👀

I found also that's possible to get a lot of information from a Timatic Web trial account, sadly limited on number of request. Try this on Postman. You can modify pvh_destinationcountrycode and pvh_nationalitycode (eg. US) for pairs of interests, both only requirement to fetch data.

curl --location --request POST 'https://www.timaticweb2.com/request/pvh' \
--header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Origin: https://www.timaticweb2.com' \
--header 'Cookie: language=en_EN; AWSALB=mbrleow/5Cp1zsPcQy91LomZ6M2sRvr9+SzUpqthOm38FCM+UJzMTmaJBVEVEh/y/8oxVJIMs8LpD1K9BnRXUo81SsHxeKd8EkWk8CdzeEP/AotbLxnc6UCmzs8l; AWSALBCORS=mbrleow/5Cp1zsPcQy91LomZ6M2sRvr9+SzUpqthOm38FCM+UJzMTmaJBVEVEh/y/8oxVJIMs8LpD1K9BnRXUo81SsHxeKd8EkWk8CdzeEP/AotbLxnc6UCmzs8l; KONSOLIDATETRACKER=a85ca51f2a697fd3303e09ba5f37f65c; language=en_EN; __utma=232171546.2099270134.1673550295.1673550295.1673550403.2; __utmb=232171546.40.10.1673550403; __utmc=232171546; __utmt=1; __utmz=232171546.1673550403.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); accept=true; __gads=ID=da53c44494b9a43b:T=1673546709:RT=1673551161:S=ALNI_MYjOdf-rDhMVitmDMAvBO-6FqZ1pg; __gpi=UID=0000057818a8a1f1:T=1673546709:RT=1673546709:S=ALNI_MYDxysu3VqhdNxbIGs7Bziv1tD0MA; PHPSESSID=5n54fkkt8u774cokv7aapba7ad; language=en_EN; AWSALB=T/pqKm7e9qAzj+6oW1TL5Lp5TUtB0ShyLIAFBRhSx1DoHRTFd9vzL6wdSDmHoTIRUaFvrVz0VifOXdEGTAXTCSX8N+8Fy09DD2VPi0R4mDCy30nU4qhH9aN1Vsbt; AWSALBCORS=T/pqKm7e9qAzj+6oW1TL5Lp5TUtB0ShyLIAFBRhSx1DoHRTFd9vzL6wdSDmHoTIRUaFvrVz0VifOXdEGTAXTCSX8N+8Fy09DD2VPi0R4mDCy30nU4qhH9aN1Vsbt; KONSOLIDATETRACKER=a85ca51f2a697fd3303e09ba5f37f65c' \
--header 'Content-Length: 3150' \
--header 'Accept-Language: en-US,en;q=0.9' \
--header 'Host: www.timaticweb2.com' \
--header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Safari/605.1.15' \
--header 'Referer: https://www.timaticweb2.com/home/CH' \
--header 'Accept-Encoding: gzip, deflate, br' \
--header 'Connection: keep-alive' \
--data-urlencode 'interfaceprocessor=force' \
--data-urlencode 'command=dynamicpvhrequestform' \
--data-urlencode 'templateid=' \
--data-urlencode 'pvh_id=19988100' \
--data-urlencode 'pvh_template=0' \
--data-urlencode 'pvh_template_name=0' \
--data-urlencode 'pvh_template_save=0' \
--data-urlencode 'trialuser=1' \
--data-urlencode 'pvh_destinationcountrycode=US' \
--data-urlencode 'pvh_destinationcountrycode_code=' \
--data-urlencode 'pvh_arrivaldate_day=12' \
--data-urlencode 'pvh_arrivaldate_month=01' \
--data-urlencode 'pvh_arrivaldate_year=2023' \
--data-urlencode 'pvh_arrivaldate=2023-01-12' \
--data-urlencode 'pvh_departairportcode=' \
--data-urlencode 'pvh_departairportcode_code=' \
--data-urlencode 'pvh_departairportcode=' \
--data-urlencode 'pvh_carriercode=00' \
--data-urlencode 'pvh_stayduration_days=' \
--data-urlencode 'pvh_stayduration_type=days' \
--data-urlencode 'pvh_returndate=' \
--data-urlencode 'pvh_stayduration=' \
--data-urlencode 'pvh_staytype=' \
--data-urlencode 'pvh_ticket=' \
--data-urlencode 'pvh_transitcountrycode_1=' \
--data-urlencode 'pvh_transitcountrycode_code_1=' \
--data-urlencode 'pvh_transit_arrivaldate_day_1=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_1=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_1=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_1=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_1=' \
--data-urlencode 'pvh_transit_departuredate_day_1=12' \
--data-urlencode 'pvh_transit_departuredate_month_1=01' \
--data-urlencode 'pvh_transit_departuredate_year_1=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_1=1:00' \
--data-urlencode 'pvh_transit_departuredate_1=' \
--data-urlencode 'pvh_transitcountrycode_ticket_1=' \
--data-urlencode 'pvh_transitcountrycode_2=' \
--data-urlencode 'pvh_transitcountrycode_code_2=' \
--data-urlencode 'pvh_transit_arrivaldate_day_2=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_2=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_2=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_2=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_2=' \
--data-urlencode 'pvh_transit_departuredate_day_2=12' \
--data-urlencode 'pvh_transit_departuredate_month_2=01' \
--data-urlencode 'pvh_transit_departuredate_year_2=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_2=1:00' \
--data-urlencode 'pvh_transit_departuredate_2=' \
--data-urlencode 'pvh_transitcountrycode_ticket_2=' \
--data-urlencode 'pvh_transitcountrycode_3=' \
--data-urlencode 'pvh_transitcountrycode_code_3=' \
--data-urlencode 'pvh_transit_arrivaldate_day_3=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_3=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_3=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_3=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_3=' \
--data-urlencode 'pvh_transit_departuredate_day_3=12' \
--data-urlencode 'pvh_transit_departuredate_month_3=01' \
--data-urlencode 'pvh_transit_departuredate_year_3=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_3=1:00' \
--data-urlencode 'pvh_transit_departuredate_3=' \
--data-urlencode 'pvh_transitcountrycode_ticket_3=' \
--data-urlencode 'pvh_transitcountrycode_4=' \
--data-urlencode 'pvh_transitcountrycode_code_4=' \
--data-urlencode 'pvh_transit_arrivaldate_day_4=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_4=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_4=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_4=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_4=' \
--data-urlencode 'pvh_transit_departuredate_day_4=12' \
--data-urlencode 'pvh_transit_departuredate_month_4=01' \
--data-urlencode 'pvh_transit_departuredate_year_4=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_4=1:00' \
--data-urlencode 'pvh_transit_departuredate_4=' \
--data-urlencode 'pvh_transitcountrycode_ticket_4=' \
--data-urlencode 'pvh_transitcountrycode_5=' \
--data-urlencode 'pvh_transitcountrycode_code_5=' \
--data-urlencode 'pvh_transit_arrivaldate_day_5=12' \
--data-urlencode 'pvh_transit_arrivaldate_month_5=01' \
--data-urlencode 'pvh_transit_arrivaldate_year_5=2023' \
--data-urlencode 'pvh_transit_arrivaldate_hour_5=0:01' \
--data-urlencode 'pvh_transit_arrivaldate_5=' \
--data-urlencode 'pvh_transit_departuredate_day_5=12' \
--data-urlencode 'pvh_transit_departuredate_month_5=01' \
--data-urlencode 'pvh_transit_departuredate_year_5=2023' \
--data-urlencode 'pvh_transit_departuredate_hour_5=1:00' \
--data-urlencode 'pvh_transit_departuredate_5=' \
--data-urlencode 'pvh_transitcountrycode_ticket_5=' \
--data-urlencode 'pvh_nationalitycode=CL' \
--data-urlencode 'pvh_documenttype=passport' \
--data-urlencode 'pvh_issuecountrycode=CL' \
--data-urlencode 'pvh_issuedate_day=00' \
--data-urlencode 'pvh_issuedate_month=00' \
--data-urlencode 'pvh_issuedate_year=0000' \
--data-urlencode 'pvh_issuedate=' \
--data-urlencode 'pvh_expirydate_day=00' \
--data-urlencode 'pvh_expirydate_month=00' \
--data-urlencode 'pvh_expirydate_year=0000' \
--data-urlencode 'pvh_expirydate=' \
--data-urlencode 'pvh_residentcountrycode=CL' \
--data-urlencode 'pvh_residencydocument=' \
--data-urlencode 'pvh_birthdate_day=00' \
--data-urlencode 'pvh_birthdate_month=00' \
--data-urlencode 'pvh_birthdate_year=1980' \
--data-urlencode 'pvh_birthdate=' \
--data-urlencode 'pvh_passportseries=' \
--data-urlencode 'pvh_secondarydocumenttype=' \
--data-urlencode 'pvh_documentfeature='