The Deep Web is where all the criminals hang out, right?
Well, yes and no. The deep web is essentially all of the non-indexed, non-searchable content on the Internet. To really understand this, we need to quickly talk about what the Internet IS.
A very basic summary of what the Internet really is:
This is a simplification of what is really going on. If you already know the intimate details about any of the following: IP, FTP, DNS, or similar – jump down to the next section. This part is not for you.
Read more about open source threat intelligence
The Internet, in its simplest form, is a group of computers talking to each other. Computer A asks Computer B for a piece of information and Computer B sends it back. The piece of information could be a webpage, an advertisement, a calculation - pretty much anything.
Each computer uses a unique name during this communication. That name is your IP address (IP stands for Internet Protocol, it is formatted like this: 220.127.116.11). As IP addresses are hard to remember for humans, we make them look like something else – a web address. When you type “Facebook.com,” your Internet Service Provider (Comcast, Google Fibre, Time Warner, Shaw) takes the request sent from your computer (Computer A) and sneakily changes that domain name to the correct IP address that corresponds to a Facebook server (Computer B). Facebook (Computer B) then gets you (Computer A) the information that you want to look at, at home, on the go, or at work (I’m not judging).
This is the same for any communication carried out on the Internet, including the Deep Web. The difference between the Deep Web and the rest of the Internet is whether or not you can search it.
I can’t, however, search for that conversation your teenager had on WhatsApp last week, nor can I search for data transmitted from your Internet-enabled fridge to your iPhone telling you, “Your beer is now cold.” Another example of data that you can’t search is the mindless chatter that computers constantly spew out to check on the status, health, or performance of other computers, or other equipment, in the system. This data is the majority of what comprises the “Deep Web.” It's mostly boring information that is not useful for the vast majority of people. According to some sources, the "Deep Web" is approximately 500x bigger than the ‘normal web.’ What is rarely addressed is how much of the "Deep Web" is human readable, versus computer readable.
But, what about the criminals and the “SilkRoad”?
First, we'll need to discuss the difference between the "Dark Web" and the "Deep Web". According to the incredibly reliable Wikipedia, the Deep Web is "the content on the World Wide Web that is not indexed by standard search engines". This is what we've been discussing, so far.
The Dark Web, on the other hand, is "the World Wide Web content that exists on darknets, overlay networks which use the public Internet but which require specific software, configurations or authorization to access. The dark web forms a small part of the Deep Web...".
"Huh?" (that's what we said)
To access the "Dark Web", not only do you have to know where to look, you have to have the right tools to get access; a programmatic key, or secret knock.
In fact, these areas of the "Deep Web" started well before the Internet was mainstream and popular. Early internet users hung out in online chatrooms called Internet Relay Chats (IRC). Some criminal activity on the "Dark Web" today arose from these IRC communities.
The "Deep Web" is comprised of hosted web pages and other informational sites that are not searchable. In fact, this is done every day. Every time an organization is working on a new website, but it isn’t quite ready to show the world, they will tell Google and other search engines to not read and index the website.
This is done programmatically through a web-document called “robot.txt.” You can look at most robots.txt files by simply adding “/robots.txt” to a web address.
For example, here is Organization X. To take this off the searchable Internet all an IT guy needs to do is add the line of text “Disallow: /” to the robot.txt file. Google interprets this code as: “I don’t want you to read anything on my site. It is not ready yet!”
NOTE!: The greatest thing you can do for website “SEO” (search engine optimization) is to not have “Disallow: /” in your robots.txt!
That doesn’t mean that you cannot access the site, if you know where to find it. It only means that you can’t search it using Google, Bing, or another search engine.
And with the millions of sites and billions of possible address’, if you aren’t told where to look, you are probably not going to find a site that someone wants to hide.
In essence, this is how people, not just criminals, keep their websites hidden in plain view– security through obscurity.
There are a number of other tricks that can be used to hide something on the Internet, but most nefarious websites are hidden in plain sight. [Tweet This]
There are tricks! Things like Tor
The “SilkRoad” was a notorious drug trafficking website that was shut down by an FBI sting in November of 2014.
There were a couple of additional cyber security measures in place (other than just being un-indexed) that prevented the average user from accidentally stumbling across the site.
One such counter measure is forcing your users to connect to your site with something called Tor. Tor is an Internet tool that allows users to stay anonymous. It does this through a methodology called onion routing.
Essentially, Tor forces your computer to run all of it’s communication through a large number of other computers, called nodes, before it is directed to the final computer. Nodes, also called relays, can be just about any computer that has been set up with custom software (you can actually download it here). People might set up a node because they strongly believe in the anonymous browsing movement. The complex layers of computer communications abstractly resemble the layers of an onion, hence, ‘onion routing.’
Tor was designed to totally protect its users identity and it is extremely effective. [Tweet This]
So who built Tor? Was it a secret group of hackers?
Actually, it was the US Government. The US Government developed and refined the Tor browser technology to protect their own anonymity and communication channels.
“The core principle behind Tor, namely, “onion routing”, was originally funded by the US Office of Navel Research in 1995, and the development of the technology was helped by DARPA in 1997” -Joseph Babatunde Fagoyinbo. Tor was finally released to the public in 2002.
Is there a way of hacking Tor? If someone is on Tor, can I find out?
Again, yes and no. To our current understanding, it is infeasible to hack the Tor algorithm. You can certainly track it backward through the maze of computers to the source. Unfortunately, by the time you are done, your great-great-great grandchildren will be very old.
As humans are fallible, it is much easier to leverage human nature to get an idea of what people are doing in the deep corners of the Dark Web. Tor also doesn’t protect you from downloading malware that broadcasts your location to would-be attackers.
The Silk Road had a nearly perfect system. If it weren’t for a series of fortunate tips, and mistakes by the founder, it would probably still be running.
For a similarly interesting story, WW2s Enigma machine was only cracked from an analysis of human nature.
You can find out if someone is using Tor (this is not true for very sophisticated adversaries). Many Tor exit nodes are well known, and mapped. An exit node is the last computer that people hit before going to their target site. As a result, the exit notes can be mapped with reasonable certainty.
End result? If someone wants to search something online anonymously, there is very little you can do about it. You can, however, have a pretty good idea how many people are using Tor to access your website!
To wrap it up:
- The Internet is relatively simple.
- The Deep Web is HUGE. It is also pretty boring.
- Dark things do happen on the Deep Web, but not nearly as much as the media would like us to think.
- If you want to use Tor, you can download and install it from here: https://www.torproject.org/. You will be fully anonymous while browsing. It will also be significantly slower! Remember, however, it doesn’t protect you from you.
Echosec is a location-first open data search engine. Gain situational awareness through public open source data. Book a consultation to learn more.