Archive

Archive for June, 2012

What HTTP headers do browsers send on CTRL+F5?

If you’ve been into website development, and in particular website optimization, you probably have stumbled upon that question at least once: what HTTP headers do the different browsers send when the user presses F5 or (even better) CTRL+F5, to make the web server bypass the cache system?

Well, the user “some” on Stackoverflow answered that in a very complete way here: http://stackoverflow.com/questions/385367/what-requests-do-browsers-f5-and-ctrl-f5-refreshes-generate/385491#385491

Apache Rewrite Cheatsheet

This is the life-saver cheatsheet you need if you’re ever going to want to understand complex rewrite rules like the ones generated by the Boost module in Drupal:

http://www.askapache.com/htaccess/mod_rewrite-variables-cheatsheet.html

In particular, in  RewriteRule .* – [S=5], the [S=5] rule means “skip the next 5 lines”. It can represent a considerable efficiency boost! Same thing can be said for [L] (last rule)

A set of nice examples is also available on the same site: http://www.askapache.com/htaccess/modrewrite-tips-tricks.html

Munin 2.0 on Debian

June 25, 2012 3 comments

Munin 2.0 has been released and packaged for Debian, and even backported to Squeeze (from backports.debian.org).

Even though there are still some quirks in this version (or just the Debian packaging), it is far better (more scalable, more powerful and prettier) than version 1.4.

Basically, the following article should cover it all: http://munin-monitoring.org/wiki/CgiHowto2, but doesn’t quite achieve it, so far.

Let’s see together how to install it successfully on Debian Squeeze. I will however not cover the agent (Munin Node), as there is no significant difference between basic installation of its 1.4 and 2.0 versions.

As a first significant performance improvement, Munin is now able to use RRDcached (it fairly reduces the disk I/O pressure on RRD files), and it is fairly easy to setup. Just install package rrdcached (who would have guessed?), then add the following options to OPTS in /etc/default/rrdcached:

OPTS="-s munin -l unix:/var/run/rrdcached.sock -b /var/lib/munin/ -B -j /var/lib/munin/journal/ -F"

This will override its defaults. And of course, restart then the daemon.

Adapt /etc/munin/apache.conf to your likings, in this case, we are going to uncomment all cgi and fastcgi-related blocks.

Install packages libapache2-mod-fcgid and spawn-fcgi, then download the following script and install it as an initscript on your system (e.g. as /etc/init.d/spaw-fcgi-munin-graph and running insserv):

http://files.julienschmidt.com/public/cfg/munin/spawn-fcgi-munin-graph (though this version is still buggy and quite fragile, contact me for a slightly improved version)

apt-get install libapache2-mod-fcgid spawn-fcgi

Add user munin and www-data to group adm, and allow group adm to write to /var/log/munin/munin*-cgi-*.log:

adduser munin adm
adduser www-data adm
chmod g+w /var/log/munin/munin*-cgi-*.log

Add user www-data to group munin and the opposite:

adduser www-data munin; adduser munin www-data

Start the spawn-fcgi-munin-graph service and check it is indeed running.

Enable the fcgid and rewrite Apache modules and restart the Apache2 service.

Customize /etc/munin/munin.conf to your likings, enabling the (Fast)CGI parts.

Whenever monitoring more than a single host, I recommend moving (i.e. commenting and copying) the localhost definition to some new /etc/munin/munin-conf.d/ file per domain (e.g. beeznest.conf), and add your hosts there, with a meaningful domain name.

Curso: Medición con XHProf y técnicas comunes de optimización, con diferencias entre PHP4 y PHP5

June 12, 2012 2 comments

Como de toda forma tengo que preparar el programa de mi curso: Medición con XHProf y técnicas comunes de optimización, con diferencias entre PHP4 y PHP5, lo cuelgo por aquí por si interesa alguien.

PHP

  • Historia del proyecto PHP (15′)
  • Diferencias entre PHP4.3, PHP5, PHP5.2, PHP5.3 y PHP5.4 (30′)

Profiling

  • Zend Debugger, XDebug y XHProf (15′)
  • Instalación y configuración XHProf (30′)
  • Medición con XHProf, Xdebug y KCacheGrind (30′)

Optimización

  • Optimización de servidores (hardware + SO) (10′)
  • Optimización de servidores web (10′)
  • Optimización de bases de datos (10′)
  • Técnicas de micro-optimización de PHP (15′)
  • Caché: system, opcode, script/variables (30′)
  • PHP-FPM (15′)
  • Integración continua con Jenkins PHP (30′)

Referencias

HTTP persistent connections to a server

My work and readings have lead me through a lot of things related to the simultaneity of connections from a browser (client) to a web server and the HTTP context in which all that happens. It is still a bit difficult for me to find the time to put all that in order, but below are a few good reads on the topic.

Respect for HTTP limits

First of all, you should know that HTTP 1.1’s RFC says (between many other things) that “a single-user client SHOULD NOT maintain more than 2 connections with any server or proxy”. This explains a lot as to why some websites can be very slow, while others can be very fast. It doesn’t explain, though, why a browser queues one single request to a server when it’s a request to a full page, but I can imagine this is a corrolary to this part of the HTTP’s RFC.

In any case, what that means is that, if you load a page from a web server and all the CSS, JS and images on this web server are located under the same domain name, you will only ever be able to download 2 at a time. When one resource has downloaded completely, your browser will be able to initiate the call to another resource. This sometimes makes websites appear “bit by bit” in an upsetting manner.

Now browsers used to respect that recommendation in the past, but that tended to limit the viewing speed of a website, so browsers developers started to allow for more. Today, for example, you can load the “about:config” page in Firefox and look for the network.http.max-persistent-connections-per-server variable. You will see it is set to 6 (which is the default now in Firefox). You can pretty much set it to 20 if you like, but beware that this is not necessarily the best setting (it might make it difficult for your browser to manage that many downloads at a time).

So, in conclusion, your browser is limited to 6 downloads (sometimes 4, sometimes 8) at a time, by default, although the standard says they should limit it to 2.

Multi-CDN-ification

And this is why major websites have started multiplying the number of CDN domains they use: if the browser is limited to 6 downloads at a time *by domain*, why not multiplying the number of domains?

Well, it’s true: by increasing the number of domains (and these can also be subdomains), you increase the total number of files a single user can download, thus potentially increasing the download speed for your website.

What’s more, if you spread these domains between several web servers, you provide a spread bandwidth and the speed is further improved.

If you abuse this system though, you might get a reverse effect: the time it takes to resolve the IP addresses of all these domains might increase the load time for your site.

Optimizing CDNs

This is why some techniques are available to cover that last problem (which isn’t to say that you should use 20 different domain names for your site), whereby you can ask the browser to “prefetch” the translation of domain names to IP addresses while the other domain names haven’t been asked for yet. You do that by adding prefetch instructions into your HTML header section. This is apparently implemented by most browsers.

You can also hire a specialized service that makes CDNs resolution super-fast, which can save up to 500ms on connections to your website.

Manually user-level multi-threading PHP (sort of)

Something that is not directly an extension of the previous topics is that a web server will only execute one PHP process on one single core of a multi-core processor, so if you want to load a very heavy script two times in parallel, you’ll have to take into account that your browser will try to limit the number of concurrent connections to a server and actually queue them (one by one), which means it will be impossible for you to execute simultaneous requests and use the 16 cores of your super Xeon XYZ processor.

In order to “hack” this process, you can start several browsers at the same time. This will work (but there are only so many different browsers you can install on one computer). An extension to this is that you can start individual Firefox sessions by launching (on the command line or in your ini script) it this way: firefox –no-remote -ProfileManager. This way, you can manage 4 different Firefox sessions and even overload your server if your computer can manage more simultaneous instances of Firefox.

References

This is all for today. I’ll leave you with a few references which have helped me working out these details

DNS optimization

June 11, 2012 1 comment

DNS optimization is a technique by which you optimize (i.e. you reduce) the time required for any browser to find your website. It might seem ridiculous at first, but having bad DNS providers can make your site load in about one full second more than others, which as you know is important in terms of traffic and customers.

A few techniques to optimize this are:

  • using the prefetch technique available in most browsers
  • hiring a worldwide-spread, fast, DNS provider so most users will get a much faster DNS translation

A few resources for reading about DNS optimization…

At BeezNest, we handle these for our customers whenever they need it, so they don’t have to worry about that.

Categories: English, Optimization, Techie Tags: ,

Profiling MySQL/MariaDB queries

MariaDB logoIf you ever face an optimization issue in MySQL or MariaDB and want to know how to measure/benchmark the differences in execution between two queries, you should definitely know about the set profiling=1; command.

Here is how it works:

mysql> set profiling=1;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT count(login_user_id)  FROM stats.track_e_online WHERE DATE_ADD(login_date, INTERVAL 60 MINUTE) >= ‘2012-06-11 11:56:20’;
+———————-+
| count(login_user_id) |
+———————-+
|                   65 |
+———————-+
1 row in set (0.00 sec)
mysql> show profile;
+——————————–+———-+
| Status                         | Duration |
+——————————–+———-+
| starting                       | 0.000022 |
| checking query cache for query | 0.000055 |
| checking permissions           | 0.000010 |
| Opening tables                 | 0.000012 |
| System lock                    | 0.000005 |
| Table lock                     | 0.000023 |
| init                           | 0.000027 |
| optimizing                     | 0.000010 |
| statistics                     | 0.000008 |
| preparing                      | 0.000009 |
| executing                      | 0.000004 |
| Sending data                   | 0.001994 |
| end                            | 0.000006 |
| query end                      | 0.000004 |
| freeing items                  | 0.000014 |
| storing result in query cache  | 0.000227 |
| logging slow query             | 0.000004 |
| logging slow query             | 0.000028 |
| cleaning up                    | 0.000003 |
+——————————–+———-+
19 rows in set (0.01 sec)

mysql> SELECT count(login_user_id)  FROM stats.track_e_online WHERE ‘2012-06-11 11:56:20’  <= DATE_ADD(login_date, INTERVAL 60 MINUTE);
+———————-+
| count(login_user_id) |
+———————-+
|                   65 |
+———————-+
1 row in set (0.00 sec)

mysql> show profile;
+——————————–+———-+
| Status                         | Duration |
+——————————–+———-+
| starting                       | 0.000032 |
| checking query cache for query | 0.000083 |
| checking permissions           | 0.000014 |
| Opening tables                 | 0.000011 |
| System lock                    | 0.000005 |
| Table lock                     | 0.000031 |
| init                           | 0.000026 |
| optimizing                     | 0.000009 |
| statistics                     | 0.000008 |
| preparing                      | 0.000009 |
| executing                      | 0.000005 |
| Sending data                   | 0.001989 |
| end                            | 0.000007 |
| query end                      | 0.000003 |
| freeing items                  | 0.000015 |
| storing result in query cache  | 0.000340 |
| logging slow query             | 0.000005 |
| logging slow query             | 0.000030 |
| cleaning up                    | 0.000003 |
+——————————–+———-+
19 rows in set (0.00 sec)
mysql> set profiling=0;
Query OK, 0 rows affected (0.00 sec)

Well, granted, there isn’t much of a difference between those two examples. I believe that’s because these queries have been stored in the query cache already, but you get the idea…

Note that profiling is a session variable, which means it only acts in your session of the MySQL/MariaDB client. You can find more information about profiling in the MySQL documentation pages.

Of course, this also works in the great MariaDB server.

Nginx Anti-DOS filter for Fail2Ban

We are currently trying out this Fail2Ban rule on one of our server, to block simple (but very upsetting) DOS attacks on Nginx automatically (after 30 seconds).

New filter in /etc/fail2ban/filter.d/nginx-dos.conf:

# Fail2Ban configuration file
#
# Generated on Fri Jun 08 12:09:15 EST 2012 by BeezNest
#
# Author: Yannick Warnier
#
# $Revision: 1 $
#

[Definition]
# Option:  failregex
# Notes.:  Regexp to catch a generic call from an IP address.
# Values:  TEXT
#
failregex = ^<HOST> -.*"(GET|POST).*HTTP.*"$

# Option:  ignoreregex
# Notes.:  regex to ignore. If this regex matches, the line is ignored.
# Values:  TEXT
#
ignoreregex =

In our jail.local, we have (at the end of the file):

[nginx-dos]
# Based on apache-badbots but a simple IP check (any IP requesting more than
# 240 pages in 60 seconds, or 4p/s average, is suspicious)
# Block for two full days.
# @author Yannick Warnier
enabled = true
port    = http,8090
filter  = nginx-dos
logpath = /var/log/nginx/*-access.log
findtime = 60
bantime  = 172800
maxretry = 240

Of course, in case you would be logging all resources of your site (images, css, js, etc), it would be really easy to get to those numbers as a normal user. To avoid this, use the access_log off directive of Nginx, like so:

 # Serve static files directly
        location ~* \.(png|jpe?g|gif|ico)$ {
                expires 1y;
                access_log off;
                try_files $uri $uri/ @rewrite;
                gzip off;
        }
        location ~* \.(mp3)$ {
                expires 1y;
                access_log off;
                gzip off;
        }
        location ~* \.(css)$ {
                expires 1d;
                access_log off;
        }
        location ~* \.(js)$ {
                expires 1h;
                access_log off;
        }

We’ll see how that works for us… (and report here)