Past Experiments with Fastly, Cloudflare, Bittorrent Sync, and Tor Website Mirrors
In 2014, I ran some experiments in using various free/very cheap content distribution networks to act as a mirror of Tor Project's website. I setup a mirror of Tor Project's website using their instructions. The goal was to simply see how user traffic responded.
I tried 4 methods:
1. Cloudflare,
2. Fastly,
3. BTsync, and
4. Linode with a caching proxy running lighttpd.
I'm not going to explain each service in this post, but you can learn more about each service by visiting their respective links.
Cloudflare
Cloudflare blocks a lot of potential users through some sort of filter for "bad traffic". As the user, you can't really control this, even at the lowest setting of "allow most". The results are that using cloudflare doesn't offload the binaries, which are what make up the bulk of traffic on the mirror. From past web log analysis, the vast majority of traffic is downloading binaries, as linked from 3rd party forums and sites around the world. It seems almost no one uses my mirrors to read documentation or look at anything else on
the website. Cloudflare would be great if everyone wanted to look at the documentation or other html pages.
The week included a new release of tor browser, which generally creates some load on my mirror. The results are available here.
Fastly
Over the past few weeks, I've been experimenting with Fastly's content
network in order to see how users of the tor website mirror responded. As expected, Fastly speeds up access to the mirror by at least 10x in my own experiments. I conducted completely non-scientific siege tests from Australia, China, California, Brazil, Egypt, Iran, and Sweden. The increased performance of the mirror seems to result in many more downloads of torbrowser than prior to Fastly. Again, this is just a mirror of torproject.org and not advertised anywhere.
The results of 55 days on Fastly are available here.
Bittorrent Sync
As an experiment, I setup a BT Sync copy of the mirror as
well. I sent "folder hash" out to a few people I know around the world. They report that everything "just works". They love the automatic sync and ability to have the latest copy of the site within a few hours of torproject.org publishing. The time lag is because my mirror rsyncs every 2 hours from torproject.org master.
Let me add a +1 to Bittorrent for supporting FreeBSD by default.
Linode/Lighttpd configuration
I rented a $10/mo Linode. I then installed, configured, and secured the latest version of Ubuntu and setup a lighttpd proxy configuration. I put the Linode in Asia and then advertised the link to my mirror in a few forums popular with various hacker cultures in a few Asian countries; Taiwan, China, Japan, South Korea, and Vietnam. I also setup a BTSync mirror with the hash/address in the source code of the HTML of a dummy site.
I set an alert at Linode to notify me when the virtual machine uses more than 2 mbps of bandwidth. After about 20 of these alerts in a day, I reset the level to 5 mbps. It seems people really like using btsync to mirror the site.
This configuration works well. The mirror is merely proxying to my actual mirror and is easy to replicate on a low-end system with minimal disk space. A challenge with a full torproject website mirror is that it consumes about 24GB of disk space on average. While this doesn't seem like much, it typically requires a more expensive virtual machine than the cheapest option at most virtual machine companies.
Here's the relevant lighttpd configuration for the proxy:
## proxy to the torproject server for everything
$HTTP["host"] == "www.example.com" {
proxy.server = ( "" => ( ( "host" => "IPv4 or IPv6 ADDRESS HERE", "port" => 80 ) ) )
}
Cost vs. Benefit
For the purposes of this test, the costs were zero to $10/mo. This isn't true for most companies which need a content delivery network to handle the load and deliver the site to global users as fast as possible. The other costs are not directly monetary, but for some, are worthy of consideration. By using a third party, you are giving up data on who browses the website. This is more than just the customer IP address. As explored in Customer Transparency Reports, it's offering a lot of potential data to the third party hosting the mirrors of the site. The general term to describe this is "metadata", which can be incredibly valuable to a number of businesses. The network at the third party hosting company can see all, it can see everything which comes and goes to your sites, from your webmasters updating it to the customers browsing and buying from it. This data may also give you more insight into your own customers, as well.
Summary
All of the tested solutions provide the content to the world. Fastly provides content and binaries the fastest, which seems to have a sticky effect to getting users to spend more time browsing the site and downloading the software. Cloudflare blocks much traffic deemed bad. As the owner of the account can't really control much of this, nor can I opt out, even on the lowest level. However, it did speed up html content, but all binary downloads came from my source server (proxied through cloudflare).
BTSync is a great solution if everyone has 24GB of space free. This seems to be more the case on desktops and laptops than cheap virtual servers.
The linode/lighttpd solution works really well. It's the most expensive of the tested solutions, but with minimal effort and just enough disk space for the operating system, you can setup a mirror which is close to the intended audience, thereby making it faster.
There is a lot of research which shows the faster and more responsive a website is to the human browsing it, the more time they spend on the site. In the eCommerce world, this correlates to a higher percentage of browsers turning into buyers.
For another post, all of these solutions allowed me to learn about the who, from where, how often, and how long, browsers of the torproject website. This was fascinating on a whole different level.
This not an endorsement of any product, just an experiment.