Python script for finding blocked sites on Indian ISPs

  • Thread starter Thread starter JB701
  • Start date Start date
  • Replies Replies 37
  • Views Views 7,770
Now I'm running a the search on 260 million domains. The csv file was ~4GB so I had to split it to ~130 smaller files with 2million domains each.

After ~14hrs of running it has already found ~620 domains which was kinda surprising considering the large amount to be checked. I'm running the same on another Oracle VPS to make sure no domain is missed.
 
260 million search is only partly done but i've combined all the blocked domains from both vps+all the blocked domains from earlier search (1mill, 10mill, adult gambling sites etc) and removed subdomains and duplicates. Right now its at 12896 blocked domains.

 
This is not completely related to the original topic but I thought it was worth mentioning.

Apparently Mikrotik RouterOS has a feature to match packets by TLS Host. This can be used to easily policy route blocked sites over a VPN without relying on IP Based policy routing (by resolving blocked sites using DNS) which can have issues because of subdomains, ip changes etc. Its a shame that this feature isn't available on PFSense.

 
Ok so one of the oracle server is done.


Its 20350 domains. That said I think it did miss a few domains that should be picked up by the other VPS.

I'm trying to run the same on my homeserver on TriplePlay. My scripts tries connecting to 1.1.1.1 with various domains from the sites list.

And on TriplePlay, 1.1.1.1 is routed through Jio. This means I'm hitting Jio's middleboxes whenever a site is blocked. I've been finding some odd blocked sites. For example "fantasyfacesbybelle.com.au" is blocked on Jio for some reason. It makes zero sense for the site to be blocked, its not an Indian site, isn't pornographic or promote anything illegal, doesn't violate trademarks. Anything. But when I open it on TriplePlay it gives me connection reset (on Excitel it opens fine).

So it only makes sense that Jio is blocking sites that government doesn't even want blocked!

Edit: It appears that fantasyfacesbybelle.com.au was in the past a torrent site, here is a archived version of page from 2018 Torrent Downloads - download free torrents! which explains why it was blocked lol (strange name for a torrent proxy site, might have been trying to evade blocking). But still, its only blocked on Jio transit and not blocked on Excitel or on Namecheap VPN (TATA Transit).
 


Last edited:
Can anyone on Jio access tataskysales.com ? This site I think is blocked on Jio but not on Airtel.


I also found certain sites on Airtel transit which were blocked on http but not on https.
 
I've found a list of sites with ~570Million domains. Currently I'm running that on my Oracle VPS. The 570 Million list contains many domains that aren't listed on zone files because of no DNS Records. These sites aren't active at all but are still blocked because the blocklists aren't updated. This is part of the reason why when I checked for blocked sites with the 260million domains from zone files, it returned only ~18500 domains as blocked. But combining the zone files blocked list with other blocked lists from older sources got me >20k domains

ISP middleboxes seem to return RST Packets right after TLS ClientHello (ClientHello contains the SNI). There is no need to continue with the TLS handshake to check blocking because of this.

Just sending ClientHello without anything else will help reduce CPU load a ton and help query a lot more domains quicker.

Unfortunately, Python doesn't have any modules which I can use to just send ClientHello without doing full handshake. I'm trying to make my own with scapy to create a TCP Connection, send TLS ClientHello packet, if RST is returned the site is blocked. If anyone has more knowledge in this than I do please do tell.
 
Last edited:
I found this neat python library for sending ClientHello packet.


Idk how to customize it yet but regardless, using default settings with the module, handshake fails and no ServerHello or key exchange occurs.

Here is a code snippet. I'm trying to get this working with threading so I can query a ton of sites at once but there were some problems with Threads not working as I expected

Code:
from tlslite.api import *
from socket import *

sock = socket(AF_INET, SOCK_STREAM)
sock.connect(("1.1.1.1", 443))

def test_tls(domain):
    try:
        connection = TLSConnection(sock)
        connection.handshakeClientAnonymous(serverName=domain)
    except ConnectionResetError:
        print(domain + " is blocked")
    except:
        pass
    finally:
        connection.close()
test_tls("13 37x.to")

WireShark output:

wireshark.png
 
Last edited:
i've created a script for checking a large number of domains based on the code above, unfortunately further changes need to be made to make it faster than my first tlschecking script.

it has been very frustrating trying to make this thing go faster.


But good thing is that lack of full TLS Handshake means it uses far less bandwidth
 
Last edited:
Airtel blocks oss-cn-hangzhou.aliyuncs.com for some reason but other regions aren't blocked. I don't think anyone from the government asked them to block this one. It's not blocked on Jio and it's not there in the uploaded lists.
 

Top