Tuesday 3: Poul-Henning Kamp, Varnish HTTP Cache Server

Usually writes operating systems, but writing

"Never argue with someone who buys ink by the barrel"

Content production not new. Not a lot has changed. Text, images, get it together and deliver and replicate. Now we do it with computers, not humans.

Content creation needs
input methods
flexibly typography
cross-referencing
composition rules
user-generated content

Used to be printing needed massive amounts of weight
Now with "the Heidelberg wing" much lighter.

Varnish delivers content fast and reliably.
100,000 requests per second

Setting goals
only a HTTP accelerator
CM feature set focus

accelerator != cache

Configuration files are hell
What you want is a process diagram of the system with controls laid out on it
So use... basically hooks! Event-driven process using Varnish Configuration Language (VCL)
Instantaneous changes

VCL rocks
compiled to C code - runs at full speed - no observable effect on performance
Load balancing with scripting based on client IP

Managing Varnish
CLI for real-time control
Varnish architecture - one binary running as two processes
Manager runs through C compiler, cacher gets a shared object
telnet :81 to configure in real-time

Performance and speed
not something you add: something you subtract
If you don't have to text process all HTTP errors, don't!
virtual memory, sendfile and other tricks

Traditional model with a bus and a CPU on it? Rubbish
These days you have CPUs with multiple cores and shared caches and disks and RAM and pagefiling

Modern model
Caches all the way down
Virtual Page Cache (formerly known as RAM)

Performance pricelist
Things the CPU can do
Things that involve protected memory
Mechanical operations - the real bad guys

Moving round in memory, you can sometimes fight with existing caches

Classical logging expensive
disk I/O
Instead use a shared memory segment, which gets written by the OS kernel
If the process crashes, the kernel writes the logfile. If the kernel crashes, you lose anyway.
Speed up - factor of 100

We should be able to deliver web content at line speeds.
The world's largest FTP server in 1996 maxed out a 100MBit/s line. So we should be able to do that.

Where does my traffic come from? varnishtop
What is my most popular URL? varnishtop
At newspapers this is a good indicator of which page has the most scantily-clad women
Response-time histogram w/varnishhist

If you get hit by CNN or Reddit you will need a cache.

Real-time statistics via shared memory
X-ray vision does not slow down Varnish!
If squid runs slow, then when you're fiddling with it you're slowing it down.

Content management features
Instant purges via regexp
TTL/caching policy using VCL
Load mitigation also using VCL
Header washing - get rid of confusing headers
Understand Vary headers
Edge-side-includes

Purges
cache EVICTION based on exact criteria
understands vary: versions

Bans
cache PREVENTION
Prevent cache hit based on loose criteria
These have a cost

Extra features
including inline C code in VCL

ESI includes
SGML extensions to HTML
CMS system has to spit out ESI includes
This locks you into Varnish, unfortunately
You can do tricks with Javascript to try to unlock you

Lots more stuff

"How easy to configure for multiple vhosts?"

By default, configured on host header

"SSL"

SSL - we don't do that. Do a lot of things very well, and don't want to do anything, sort of, eugh. I can't spot what Varnish can do that SSL itself already does. Would be convenient, but we have to deal with crypto-secrets, and be careful not to spill them. Can't see it happening any time soon.

"Any change visible at the browser end?"

You'll see a performance change, obviously... but chunk encoding is the only difference - and that's an improvement. Some browsers start processing when they have their first chunk.

"Separate VCL files for different virtual hosts?"

I wouldn't say it's just because I'm lazy that I haven't done that yet... a couple of wrinkles in the backend. Trunk now does it right, so we could put that in harmlessly.

"Compression?"

Is stuck in the queue. One of the requirements for 3.0, though.

"Logging?"

Varnish NCSA.

"Varnish 3.0 in terms of timing?"

... That's a very interesting question, yes. If you set up a feature list, the entire economy could go bust and the release might not happen for five years.

"Shared caches across many servers?"

Generally quicker performance for server A to just ask server B to get it out of its cache, then cache it itself.

Varnish ESI layer in Drupal, for blocks and panels, is on its way.