Wednesday 2: Stuart Broz, "Making Drupal scale (Trellon sponsored session)"

Wednesday, August 25, 2010 - 12:07

Earth Day, biggest secular holiday

Collecting pledges for responsible acts - billion acts of green - through authenticated users. Slightly scary uncacheable numbers

Events page -Apache Solr as backend, geo IP lookup

Only a fair way in - two weeks before - where we realised the site arch wasn't right.
Single db server
One instance of web server with Pressflow
And a lot of hopes and dreams
Expected traffic was 25 million hits before 10am EST - in 2009 that's when the site went down!
We didn't expect to be involved in performance
But audits showed it was going to go down

New setup
New hardware
Varnish, Multiple apache servers, single Drupal instance? Master/many slave MySQL setup. Repackaged core and contrib to read data off slaves
Would have loved to do e.g. Cassandra, but didn't have time.
Master performance is the bottleneck.
Never switch on in high per site:
Statistics module. Db write on every page view. That locks db table.
Solr queries entire node table - half to whole second - have to hack

Surprise
Pledge widget featured on other sites
Webservices iPhone application
Featured on app store before Earth Day
Every one of these had to be authenticated
Facebook app
... One more thing: Google featured Earth Day on homepage
NZ staff saw traffic really early on: ohshit moment.

Db layer in D6 just not up for this
When data doesn't reach your slaves
Stale data
Users not replicating
In the end these were edge cases
Patched modules and used CDN module - requires core to be ha er patched
Lots of caching, using memcache
MySQL slow query log
Just don't trust contrib

On the day
We were worried
A site we ourselves didn't build
Emergency measures - auth users? Sadly wd have crippled site
Chris and Carl sitting in coder lounge, switching piecemeal to memcache, slowly improving
Issue with Varnish - by checking syslog discovered caches rebuilding every four seconds. Removes any value of Varnish. Check your cacheing mechanism over long periods.

Site stayed up
80 million page views in 48hrs
Average page load below 4 seconds - was loading in over 20s when we got it!
At the time one of the highest ever Drupal traffic densities.

"MySQL proxy?"

Didn't use much more tech on the stack, not this either. Didn't have time to investigate many possibilities. This could've provide a failover but also more complexity.

"Hack core to do slave queries?"

Pressflow couldn't do this. It was choosing a random server. It includes the master server along with the slaves, and the master was REALLY sensitive to I/o load and we didn't want to kill it, so remove all reads.

"Varnish problem?"

A module was running cache-clear-all.

"Timescale to look at architecture?"

Got the site in March for very specific work. Basically week of DrupalCon when we realised. We were testing our own web services under expected load. Programmers notified the day they were leaving.

"Superbowl day - ended up second hit for superbowl video on Google. Didn't even have any superbowl video. Varnish has one thread per conn, if you have more conn than N threads, it will queue up to N more, and then starts referring 503s."

We ran into the same problem.

"nginx zone limit - we'll allow at most 500 concurrent connections per host header. Some were getting 503s but at least Varnish didn't shut down."

Shared files - nfs share, has built-in caching. Once pulled once, caches it on each web node. Nothing changing on the server.

Deploy the CDN - take a lot of load off Varnish.

Wishlist
D7 supports CDN
MySQL replication
caching strategies better
Dbo makes horiz scaleout much easier
Field API store in MongoDB or Cassandra

"benchmark, stress testing?"

Not really doing that: we benchmarked for expected concurrency using av and two EC2 nodes. Using munin to monitor.

"Soasta will do this for you. High-end solution. Give them a testing plan and they'll spin up."

I'm a Drupal Association member!

My individual membership of the Drupal Association

Drupal 8 API tutorials

Want to learn about the Drupal 8 APIs, with worked examples? Follow my series of tutorials, covering routing, caching, entities, config and much more!

Want to hire me?

I'm currently

fully booked

Blog category:

conferences

Recent blogposts

Altering the length of a Drupal 8 text field that contains data

Friday, July 21, 2017 - 11:31
A menagerie of testing: behavioural, unit, system, smoke, regression, oh my!

Friday, June 2, 2017 - 10:11
Including Javascript in Behat tests, all inside a headless, virtual machine

Tuesday, May 30, 2017 - 16:51

All blogposts

About me

I'm J-P Stacey, and I'm a freelance technical developer and software architect, working with Drupal, Javascript, Symfony, PHP and devops, with experience in project and process management and an emphasis on usability.

I live in the UK; my website is self-hosted on bigv.io; my email is hosted by Google, and that's also what I use to share files. (More info|What is this?)

Secondary menu

Wednesday 2: Stuart Broz, "Making Drupal scale (Trellon sponsored session)"

I'm a Drupal Association member!

Drupal 8 API tutorials

Want to hire me?

Blog category:

Tags:

Recent blogposts

Altering the length of a Drupal 8 text field that contains data

A menagerie of testing: behavioural, unit, system, smoke, regression, oh my!

Including Javascript in Behat tests, all inside a headless, virtual machine

About me

Find me elsewhere