Go to bottom

Optimizing Pouet (+ web in general)

category: offtopic [glöplog]
This thread might be of interrest to more web server admins.. This technology is really nice, I've used it for more than a year now. So it's still nice to inform about how it works IMHO, and pouet is a very good use case with all the small image files.

First some background:
I notice pouet has trouble delivering images sometimes, mostly screenshots, the web browser (Chrome) "stalls".

I suspect this is due to the fact that apache gets process saturated.
Safari/Chrome for instance happily opens 3-5 connections simultaneously, and I think recent firefox versions does that as well.

Maybe possible to look at adding some "Varnish"? It's a reverse-caching server which is quite easy and non-intrusive to implement on top of existing websites..

In debian (which pouet supposedly runs by looking at the headers), just "apt-get install varnish".

Replacing apache with something that scales better like lighttpd+php-fcgi would of course be nice but is an order of magnitude more work and likely to cause problems initially.

Either Varnish can be put in front of the whole site (not recommended initially). A better approach is putting it on a subdomain - images.pouet.net for instance, which loads images from the regular apache in the background.. No major change of the normal apache setup is needed,
you just list the apache's public IP in varnish as a "backend". It's also possible to run css and javascript through there - all static content basically. You can configure varnish to act with different "backends" for different domains, so it's possible to cache other stuff on the same server. This way apache only does what it's best at - serving the PHP output. Also it's very easy to roll back this change since every document is "no-cache" due to the dynamic nature of the site.

The biggest change to apache would be to add cache headers that will work for varnish, from what I can tell there are none right now, only E-tag and Last-modified which I _think_ means the browser still connects to the server to check the dates which thus still eats apache processes.

Easiest to admin is the apache expires module - once loaded can be configured like so:
ExpiresActive Off
<VirtualHost... bla bla>
...normal config...
<Location /images>
ExpiresActive On
ExpiresDefault "access plus 365 day"

Or in a .htaccess file, as long as the expires module is loaded...

Or as a 3rd but hackish option: If impossible to implement above strategies, one can load everything through a PHP script that adds the appropriate headers - i.e. change all image url's to: http://images.pouet.net/cache/foo/bar/baz.png then have a PHP script catch the default 404/403 pages defined in /cache/.htaccess on the apache server.

Still, even though it would run everything through PHP it will still be faster in the end, if the data fits in the caching server that is.

Varnish can use disk or ram for storage, I guess ram cache is overkill unless there's enough to spare, but Varnish takes hits of several hundred concurrent connections with a breeze... It's designed around BSD and Linux way of dealing with TCP/IP.

It uses b-trees internally, so even the disk cache is pretty efficient and Varnish translates and compiles its config with gcc internally which I think is kind of cool.
added on the 2011-09-10 16:44:40 by jaw jaw
i'm not the only fool here!
added on the 2011-09-10 21:33:38 by M77 M77
Yes you are.
added on the 2011-09-10 23:26:15 by __ __


Go to top