There can be many reasons why simple techniques for accessing blocked content, such as using cached versions of pages from a search engine, or using a simple Web proxy, are inadequate.
In some cases, these simple techniques aren't enough to get around the blocking. In others, the restrictions may affect services beyond simple Web browsing -- such as instant messaging, e-mail or video streaming. In that case, Web proxies probably won't solve your problem.
There may be cases when you want to bypass Internet blocking, but also want to prevent the organization doing the blocking from knowing what pages you are accessing, or prevent them from knowing you are even bypassing filters in the first place.
In these scenarios, there are many more advanced techniques available, and each of them solves different problems in different ways.
This chapter introduces some key technical concepts that will help you make an informed choice about which solution is applicable in a given situation. It also covers, in some detail, many of the different ways access to information may be blocked.
The Internet is based on a series of protocols, standardized sets of rules that govern how the networks of computers communicate. The principal set of protocols for managing connections and message packets for the Internet is TCP/IP (TCP over IP). Protocols can handle a wide range of data, with software to break long transmissions into smaller, numbered packets for transmission, and reassemble the data segments on the receiving end. The most common way to specify which protocol to use is to address packets to a specific port number. For example, HTTP for the Web normally uses port 80, and POP3 for receiving e-mail normally uses port 110. Blocking traffic to a particular port at a particular IP address disables normal access to one service at that site, while leaving the rest of the services available. The simplest way to circumvent a blocked port is to provide the service on a non-standard port, but this can only be done by the operator of the service, not by the user.
Network protocols are often described as existing in a set of layers. For the Internet, the bottom layer (called the Link Layer) is closest to the hardware, and the highest (called the Application Layer) is closest to the human user. The critical protocols in the middle two layers are TCP (Transmission Control Protocol), which is in the Transport Layer, and IP (Internet Protocol), which is "below" it in the Internet Layer. The two together are commonly referred to as TCP/IP. Less well known but also important is UDP (User Datagram Protocol), which is at the same level as TCP. Many, but not all services offered over TCP are also available over UDP, while some services are on UDP only.
The top level, called the Application Layer, includes protocols such as DNS (for domain names), FTP (file transfers), HTTP (Web), IRC (chat), NNTP (Usenet), POP3 (retrieving e-mail), SIP (Voice-over-IP), and SSH (encrypted communications). IANA (Internet Assigned Names Authority) assigns port numbers for each of these application services, such as port 53 for DNS lookup queries, 80 for HTTP, and 110 for POP3 (Post Office Protocol 3). These assignments are defaults, for convenience, and using them is not generally a technical requirement of the protocols; in fact, any sort of data could be sent over any port. There are also numerous default port assignments for UDP that operate in the same way. In some, but by no means all, cases, a service can be accessed on the same port using either TCP or UDP. One exception is NTP (Network Time Protocol), which is one that is provided only on UDP. UDP is also commonly used for real-time multimedia applications such as Voice over IP (VoIP) protocols, some of which are not available over TCP.
Users are normally not concerned with port assignments, which are handled automatically in the default cases. Use of the standard ports is not mandatory, however. By prior arrangement between service providers and users, system administrators can set up servers for access to standard services at non-standard port numbers. This allows software to circumvent simple port blocks intended to prevent use of these services.
Some software can be configured to use a non-standard port number. URLs also have a particularly convenient standard way of specifying a port number inside the URL. For example, the URL http://www.example.com:8000/foo/would make an HTTP request to example.com on port 8000, rather than the default http port 80.
[Adapted from "Access Denied", Chapter 3, by Steven J. Murdoch and Ross Anderson]
The goals of deploying a filtering mechanism vary depending on the motivations of the organization deploying them. They may be to make a particular Web site (or individual Web page) inaccessible to those who wish to view it, to make it unreliable, or to deter users from even attempting to access it in the first place. The choice of mechanism will also depend upon the capability of the organization that requests the filtering—where they have access to, the people against whom they can enforce their wishes, and how much they are willing to spend. Other considerations include the number of acceptable errors, whether the filtering should be overt or covert, and how reliable it is (both against casual users and those who wish to bypass it).
In this section we will describe how particular content can be blocked once the list of resources to be blocked is established. Building this list is a considerable challenge and a common weakness in deployed systems. Not only does the huge number of Web sites make building a comprehensive list of prohibited content difficult, but as content moves and Web sites change their IP addresses, keeping this list up-to-date requires a lot of effort. Moreover, if the operator of a site wishes to interfere with the blocking, the site could be moved more rapidly than it would be otherwise.
An IP packet consists of a header followed by the data the packet carries (the payload). Routers must inspect the packet header, as this is where the destination IP address is located. To prevent targeted hosts being accessed, routers can be configured to drop packets destined for IP addresses on a blacklist. However, each host may provide multiple services, such as hosting both Web sites and e-mail servers. Blocking based solely on IP addresses will make all services on each blacklisted host inaccessible.Slightly more precise blocking can be achieved by additionally blacklisting the port number, which is also in the TCP/IP header. It is very common for many completely different web-sites to be hosted on the same IP-number, on the same port number, 80.
Access to ports may be controlled by the network administrator of the organization that hosts the computer you're using -- whether a private company or an Internet café, by the ISP that is providing Internet access, or by someone else such as a government censor who has access to the connections that are available to the ISP. There are many reasons other than censorship that ports may be blocked -- to reduce spam, or to control costs associated with high-bandwidth uses such as peer-to-peer filesharing.
If a port is blocked, all traffic on this port becomes inaccessible to you. Censors often block the ports 1080, 3128, and 8080 because these are the most common proxy ports. If this is the case, you have to find proxies that are listening on an uncommon port. These can be difficult to find.
You can test which ports are blocked on your connection using Telnet. Just open a command line (terminal or DOS prompt), type "telnet login.icq.com 5555" or "telnet login.oscar.aol.com 5555" and press Enter. The number is the port you want to test. If you get some strange symbols in return, the connection succeeded.
If, on the other hand, the computer immediately reports that the connection failed, timed out, or was interrupted, disconnected, or reset, that port is probably being blocked. (Keep in mind that some ports could be blocked only in conjunction with certain IP addresses.)
Some of the most commonly used ports are:
For example, in one university, only the ports 22 (SSH), 110 (POP3), 143 (IMAP), 993 (secure IMAP), 995 (secure POP3) and 5190 (ICQ instant messaging) may be open for external connections. This means that you would need to find a proxy server or VPN server running on one of these ports, or convince a friend or contact outside the university to set up a server on such a port. The ability to run servers on ports others than those they normally run on is what makes several circumvention techniques possible in the first place.
TCP/IP header filtering can only block communication on the basis of where packets are going to or coming from, not what they contain. This can be a problem for the censor if it is impossible to establish the full list of IP addresses containing prohibited content, or if some IP address contains enough non-infringing content to make it seem unjustifiable to totally block all communication with it. There is a finer-grained control possible: the content of packets can be inspected for banned keywords. As routers do not normally examine packet content but just packet headers, extra equipment may be needed. Typical hardware may be unable to react fast enough to block the infringing packets, so other means to block the information must be used instead. As packets have a maximum size, the full content of the communication will likely be split over multiple packets. Thus while the offending packet will get through, the communication can be disrupted by blocking subsequent packets. This may be achieved by blocking the packets directly or by sending a message to both of the communicating parties requesting they terminate the conversation. Another effect of the maximum packet size is that keywords may be split over packet boundaries. Devices that inspect each packet individually may then fail to identify infringing keywords. For packet inspection to be fully effective, the stream must be reassembled, which adds additional complexity. Alternatively, an HTTP proxy filter can be used, as described later.
Most Internet communication uses domain names rather than IP addresses, particularly for Web browsing. Thus, if the domain name resolution stage can be filtered, access to infringing sites can be effectively blocked. With this strategy, the DNS server accessed by users is given a list of banned domain names. When a computer requests the corresponding IP address for one of these domain names, an erroneous (or no) answer is given. Without the IP address, the requesting computer cannot continue and will display an error message.
Note that at the stage the blocking is performed, the user has not yet requested a page, which is why all pages under a domain name will be blocked.
An alternative way of configuring a network is to not allow users to connect directly to Web
sites but force (or just encourage) all users to access those sites via a proxy server. In addition
to relaying requests, the proxy server may temporarily store the Web page in a cache. The
advantage of this approach is that if a second user of the same ISP requests the same page, it will be returned directly from the cache, rather than connecting to the actual Web server a second time. From the user’s perspective this is better since the Web page will appear faster, as they never have to connect outside their own ISP. It is also better for the ISP, as connecting to the Web server will consume (expensive) bandwidth, and rather than having to transfer pages from a popular site hundreds of times, they need only do this once.
However, as well as improving performance, an HTTP proxy can also block Web sites. The proxy decides whether requests for Web pages should be permitted, and if so, sends the request to the Web server hosting the requested content. Since the full content of the request is available, individual Web pages can be filtered, based on both page names and the actual content of the page.
As the requests intercepted by an HTTP proxy must be reassembled from the original packets, decoded, and then retransmitted, the hardware required to keep up with a fast Internet connection is very expensive. So systems exist that provide the versatility of HTTP proxy filtering at a lower cost. They operate by building a list of the IP addresses of sites hosting prohibited content, but rather than blocking data flowing to these servers, the traffic is redirected to a transparent HTTP proxy. There, the full Web address is inspected and if it refers to banned content, it is blocked; otherwise the request is passed on as normal.
Where the organization deploying the filtering does not have the authority (or access to the network infrastructure) to add conventional blocking mechanisms, Web sites can be made inaccessible by overloading the server or network connection. This technique, known as a Denial-of-Service (DoS) attack, could be mounted by one computer with a very fast network connection; more commonly, a large number of computers are taken over and used to mount a distributed DoS (DDoS).
As mentioned earlier, the first stage of a Web request is to contact the local DNS server to find
the IP address of the desired location. Storing all domain names in existence would be infeasible, so instead so-called recursive resolvers store pointers to other DNS servers that are more likely to know the answer. These servers will direct the recursive resolver to further DNS servers until one, the "authoritative" server, can return the answer.
The domain name system is organized hierarchically, with country domains such as ".uk" and ".de" at the top, along with the nongeographic top-level domains such as ".org" and ".com". The servers responsible for these domains delegate responsibility for subdomains, such as example.com, to other DNS servers, directing requests for these domains there. Thus, if the DNS server for a top-level domain deregisters a domain name, recursive resolvers will be unable to discover the IP address and so make the site inaccessible.
Country-specific top-level domains are usually operated by the government of the country in
question, or by an organization appointed by it. So if a site is registered under the domain of a
country that prohibits the hosted content, it runs the risk of being deregistered.
Servers hosting content must be physically located somewhere, as must the administrators who operate them. If these locations are under the legal or extra-legal control of someone who objects to the content hosted, the server can be disconnected or the operators can be required to disable it.
The above mechanisms inhibit the access to banned material, but are both crude and possible to circumvent. Another approach, which may be applied in parallel to filtering, is to monitor which Web sites are being visited. If prohibited content is accessed (or attempted to be accessed) then legal (or extra-legal) measures could be deployed as punishment.
If this fact is widely publicized, it could discourage others from attempting to access banned content, even if the technical measures for preventing access are inadequate by themselves.
Cryptography is -- among other applications -- a technical defense against surveillance that uses sophisticated mathematical techniques to scramble communications, making them unintelligible to an eavesdropper. Cryptography can also prevent a network operator from modifying communications, or at least make such modifications detectable.
Modern cryptography is thought to be extremely difficult to defeat by technical means; widely available cryptographic software can give users very powerful privacy protection against eavesdropping. On the other hand, encryption can be circumvented by several means, including targeted malware, or in general through key-management and key-exchange problems, when users cannot or do not follow the procedures necessary to use cryptography securely. For example, cryptographic applications usually need a way to verify the identity of the person or computer at the other end of a network connection; otherwise, the communication could be vulnerable to a man-in-the-middle attack where an eavesdropper impersonates one's communication partner in order to intercept supposedly private communications. This identity verification is handled in different ways by different software, but skipping or bypassing the verification step can increase one's vulnerability to surveillance.
Another surveillance technique is traffic analysis, where facts about a communication are used to infer something about the content, origin, destination, or meaning of the communication even if an eavesdropper is unable to understand the contents of the communication. Traffic analysis can be a very powerful technique and is very difficult to defend against; it is of particular concern for anonymity systems, where traffic analysis techniques might help identify an anonymous party. Advanced anonymity systems like Tor contain some measures intended to reduce the effectiveness of traffic analysis, but might still be vulnerable to it depending on the capabilities of the eavesdropper.
Social mechanisms are often used to discourage users from accessing inappropriate content. For example, families may place the PC in the living room where the screen is visible to all present, rather than somewhere more private, as a low-key way of discouraging children from accessing unsuitable sites. A library may well situate PCs so that their screens are all visible from the librarian’s desk. An Internet café may have a CCTV surveillance camera. There might be a local law requiring such cameras, and also requiring that users register with government-issue photo ID. There is a spectrum of available control, ranging from what many would find sensible to what many would find objectionable.