A tool to check if the ISP is blocking you for any of the Alexa top 1M websites
The idea is simple. Send requests to the website and analyse the response that you get. Depending on that we determine if it's getting blocked or not.
PR_CONNECT_RESET
-RST
packet, in return.- Check final redirect to http://www.airtel.in/dot/
- Redirected page has string : "The website has been blocked as per order of Ministry of Electronics and Information Technology under IT Act, 2000." (http://www.airtel.in/dot/)
- Check DNS filtering by comparing the final IP with the ones in the
filtering.yaml
. Usually the IP returned by resolving from the ISPs DNS is under their ASN, so that gives a higher confidence of saying that it's blocked by the ISP.
data/citizenlabs-lists
- https://github.com/citizenlab/test-listsdata/India/potentially_blocked_unique_hostnames.txt
- https://github.com/kush789/How-India-Censors-The-Web-Data/blob/master/potentially_blocked_unique_hostnames.txtdata/India/airtel-fiber-blocked-hosts.txt
- https://github.com/captn3m0/airtel-blocked-hostsdata/India/act_blocked_list_26_may.txt
- https://github.com/qurbat/blocked-hosts/blob/main/output/may_26_2022-blocked_hosts.txt
- Option to use restricted domains from lists like : CitizenLabs/test-lists, Domains Project etc
- Create this as a CLI tool. See bubble tea golang lib.
- d3.js or some other tool to create a heat map - Partially done with the map, need to add the folder here.
- Replace http client with retryable http client - https://github.com/hashicorp/go-retryablehttp
- Keep in DB stats for last run, like : 1. Scan Time 2. Domains scanned 3. Accessible, Non-accessible, blocked, connection timed out domains 4. Location 5. ISP 6. Evil or not 7. Time of scan 8. Type of filtering
- Save all data as base64 encoded into file.
- Check DNS Filtering.
- Add data from different ISPs from India and world. Airtel, JIO, ACT, Hathaway, Tata, Vodafone etc.
- Save results with Folder and list as well. Upload to Github.
- Add
goreleaser
to automatically publish new version
- Add
proxy
support to run checks for different ISPs anywhere in the world. - Based on country automatically choose the list from
citizenlabs
. - Option to save stats in DB - sqlite, postgres etc.
-
debug
option to print all data like which websites wereblocked
,inaccessible
,accessible
etc. - Check DNS filtering using
net.LookupIP
or server IP from the request's response . - Update
README
with the way it is being checked. Mentioning each of the block strategies and how different ISPs are using it. How the tool is checking these ? - When choosing
cisco
list warn people about the bandwidth usage and how much data + time it might consume.
- How can we use this to collect data from around the world, making sure it's reliable ( not tampered with ) ?
- Ideally this should be like :
- Cost efficient - Maybe create a serverless lambda to send data to.
- Integration with CLI : Give user option in CLI tool to send data to their server
- Open source : Thus, people can review the code or compile their own binaries and build on top of it.
- Option to just check for a particular type of filtering like DNS , HTTP, SNI etc
- Can check for
www
subdomain, where the answer to aGET
request isno such host
. - Better Blocking Check : Can check if page is blocked by checking similarity from a non-blocked source ( like a s3 bucket that saves the pages daily ) ?
- Create an un-censored source of truth.
- Corroborate data with some un-censored source of truth to be sure of filtering. One way I propose is to use similarity detection by HTML tree.
- Try out bypasses for common techniques. Keep this as an option in the cli tool.
- Run multiple times, to avoid flaky results
- Decide number of goroutines on the basis of internet connection. A low bandwidth connection will get choked and all websites' will get timed out. Also timeout should be decided on this basis. Can use speedtest-go.
- Keep unique domains in the list to scan and remove subdomains - Currently 264k unique domains. Takes ~1330 seconds on a ~200Mbps internet connection with 15 second timeout and 3 retries. Try to get this to max 100k domains
- Why isn't alexa top 1 M included in the list ?
Because alexa service is discontinued, as of 1st May 2022, check here.