Python script for finding blocked sites on Indian ISPs

  • Thread starter Thread starter JB701
  • Start date Start date
  • Replies Replies 37
  • Views Views 7,773

JB701

🇵🇸🤝🇮🇳
Messages
2,391
Location
Kochi, KL
ISP
Airtel

If you want to see which major sites are blocked on your ISP, you can use this script

Feel free to run the script and share the output here.

I have added a new method of changing SNI on SSL to check blocked sites. It doesn't require the site to have SSL/TLS by using a different server IP. It appears to be faster and uses less CPU.

 
Last edited:
Btw how long does it take to check those million websites? 😛
 
Not every Indian ISP use Netsweeper, but this covers most of them. You can also look for TLS connection resets.
 
Great stuff. If someone has already checked for Tata Sky Broadband then it would save my time.

Would the ISP get suspicious as to why your IP is trying to access 1 million sites in 4-5 hrs? Will that break some terms and conditions ? Stupid question I know but still.
 
1M GET requests and 1M DNS requests is less than a thousand page of random news website.
 


Not every Indian ISP use Netsweeper, but this covers most of them. You can also look for TLS connection resets.

The reason why I didn't go with TLS Connection reset is because it could also be for reasons other than blocking, for example the website could be blocking... maybe i could try checking if ISP Reset packets are any different from regular reset packets.

Also, sites without TLS don't return RST Packets.

If you have any ISPs that don't match my regex, please do tell me the HTML of the blocking page.



Would the ISP get suspicious as to why your IP is trying to access 1 million sites in 4-5 hrs? Will that break some terms and conditions ? Stupid question I know but still.

Tata Sky should be same as my Namecheap VPN because both are on TATA Transit. I'll upload complete results on GitHub.

I'm not sure if ISPs will notice, you could turn max_workers down to 2 or 3 make it run slower but not be that stressful on the ISPs network or your own.

I have been running it with max_workers=20 throughout the night over VPN and my PFSense state table is 50% full and CPU at 70% usage (mainly because of the AdGuard Home which isn't suited for such large number of queries at once I'll add option for custom DNS to script).

Maybe I could also setup max size cap to make it easier on bandwidth as blocked site pages are quite small.
 
Last edited:
~620 of the top 1 million sites are blocked it looks like


it is possible that my script may have missed a few when pppoe reconnected or something.

another possible less intense method of finding out blocked sites is by using dns. but only certain isps (Airtel is one of them) use dns blocking.
 
Last edited:
The reason why I didn't go with TLS Connection reset is because it could also be for reasons other than blocking, for example the website could be blocking... maybe i could try checking if ISP Reset packets are any different from regular reset packets.

Also, sites without TLS don't return RST Packets.

Sites without TLS will mostly won't reply on :443, someone of them are behind CDN and whatnot they'll send you certificate error. I have not seen TLS peer reset in the wild other than coming from middleboxes.

If you have any ISPs that don't match my regex, please do tell me the HTML of the blocking page.
Railwire for example uses ck-block and have so many different pages including just 404. It's useless here. You can look for ck-block in the header though.
 
Last edited:
I'm now running it on my Oracle Free VPS, lets see if the blocked sites are different from my original (though itll take a few days to complete). Oracle uses both TATA and Airtel transit (preferring Airtel mostly it looks like).

I have tried it with 10 Million sites but it ends up eating all the RAM on my laptop, probably need to split the list into chunks to make it work.

Ill make a similar script for checking TCP Reset checking soon.
 
If anyone knows any public Indian DNS Servers which blocks sites please do tell me. I know Jio LTE Default DNS returns 49.44.79.236 for many blocked sites but LTE is unreliable for large number of DNS Queries I want to do.
 
On my oracle server is already at 290 websites with ~325k sites scanned. I don't know if its because if its because of different transit or whether my original run missed sites due to network issues.
 
Last edited:
Last edited:
~4899/10 million sites blocked https://raw.githubusercontent.com/s...iteslist/tlsblockedsites-oracle-10million.txt


Many are subdomains though, Airtel/TATA are likely blocking by regex or wildcards

I have combined multiple potentially blocked sites list and the blocked sites among top 1million and top 10 million sites, removed duplicates and sorted. ~11896 sites. Though there are www subdomains which need to be removed


After removing subdomains and duplicates, its https://raw.githubusercontent.com/s...-oracle-combined-nosubdomains-duperemoved.txt 6651 sites.

This should be a significant chunk of all the blocked sites but clearly not even close to how much there is.



NOW ITS TIME FOR 260 MILLION
 
Last edited:

Top