Request For Comments: My Plone/Apache Stack

I was recently took up the task of engineering my own plone infrastructure. I want to share what I did, and get any feedback I can about the scalability and usefulness of what I’ve done.

Requirements

  • SSL – It must be secure while logged-in users are on the site*.
    • It must NOT be secure for public pages (unless the user is logged in)
    • It must force SSL for login pages.
  • Hide /Plone – the infrastructure must present the user with URLs that don’t contain the plone instance id.
  • Admin access – the infrastructure must provide SSL-encrypted access to the root ZMI.

* I understand that in recent versions of plone, this isn’t important to protect the user’s session cookies, however, I’m concerned about SSL encryption after login due to the content that will be stored on the site.

Approach

I’m very familiar with the Apache httpd server, so I wanted to leverage that experience as much as possible. I’m using the Apache 2.2.9 package that comes with Debian Lenny.

I used Plone 3.3.1 as I was developing this infrastructure, but I don’t think there’s anything about it that wouldn’t work with any version of Plone (as long as it’s got a virtual host monster).

We’ve settled on using one IP address (an alias) per Plone stack to keep the sites autonomous. This makes deployment and host registration easier.

I used IP-based Virtual Hosting and mod_proxy in conjunction with mod_proxy_balancer.

I’m logging to the local disk, and will be setting up a log rotation scheme separately.

Here’s what it looks like:

Production Plone Infrastructure (Existing Setup)

Production Plone Infrastructure (Existing Setup)

So I’ve got 3 Zope instances, acting as Zeo clients, on ports 1040, 1041, and 1042. I’ve heard more is just wasteful and fewer tends to not be enough.

I then have three virtual hosts. One is the load balancer, and back-ends to the Zope instances. It’s listening on port 8080 (the traditional proxy port, it really could be anything).

The other two virtual hosts are for "public" traffic (port 80), and SSL (port 443). They both back-end via a ProxyPass directive to the balancer virtual host. What’s different is what they tell the Virtual Host Monster to rewrite the URLs to and the RewriteRules that nudge the browser to use the right virtual host when doing certain actions, like logging in and logging out. Of course, the SSL virtual host is set up to do SSL using my SSL certificate (self-signed for now).

Configuration

I’ve got the config files sort of spread out because I’m using some buildout recipes I wrote to configure all this stuff, but here’s what it looks like all mashed into one file, pretending that you’re not using my recipes:

# httpd.conf for Apache2 - inspired by:  http://www.links.org/?p=264

ServerRoot  /etc/apache2

ServerName  localhost

Listen 127.0.0.1:8080
Listen 127.0.0.1:443
Listen 127.0.0.1:80

PidFile  /var/run/apache2.pid
LockFile /var/lock/apache2/accept.lock

LoadModule alias_module            /usr/lib/apache2/modules/mod_alias.so
LoadModule auth_basic_module       /usr/lib/apache2/modules/mod_auth_basic.so
LoadModule authn_file_module       /usr/lib/apache2/modules/mod_authn_file.so
LoadModule auth_digest_module      /usr/lib/apache2/modules/mod_auth_digest.so
LoadModule authz_user_module       /usr/lib/apache2/modules/mod_authz_user.so
LoadModule authz_host_module       /usr/lib/apache2/modules/mod_authz_host.so
LoadModule authz_groupfile_module  /usr/lib/apache2/modules/mod_authz_groupfile.so
LoadModule dir_module              /usr/lib/apache2/modules/mod_dir.so
LoadModule include_module          /usr/lib/apache2/modules/mod_include.so
LoadModule mime_module             /usr/lib/apache2/modules/mod_mime.so
LoadModule ssl_module              /usr/lib/apache2/modules/mod_ssl.so
LoadModule cgi_module              /usr/lib/apache2/modules/mod_cgi.so
LoadModule rewrite_module          /usr/lib/apache2/modules/mod_rewrite.so
LoadModule headers_module          /usr/lib/apache2/modules/mod_headers.so
LoadModule proxy_module            /usr/lib/apache2/modules/mod_proxy.so
LoadModule proxy_http_module       /usr/lib/apache2/modules/mod_proxy_http.so
LoadModule proxy_balancer_module   /usr/lib/apache2/modules/mod_proxy_balancer.so
#[BUILT-IN] LoadModule log_config_module  modules/mod_log_config.so
#[BUILT-IN] LoadModule logio_module       modules/mod_logio.so

# worker MPM - tuned
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
StartServers           1
MaxClients            16
MinSpareThreads        2
MaxSpareThreads        8
ThreadsPerChild        8
MaxRequestsPerChild    0

User www-data
Group www-data
ServerAdmin me@somehost.com

DirectoryIndex index.html

# The following lines prevent .htaccess and .htpasswd files from being
# viewed by Web clients.
<FilesMatch "^\.ht">
    Order allow,deny
    Deny from all
    Satisfy All
</FilesMatch>

DefaultType text/plain

<IfModule mime_module>
    TypesConfig /etc/mime.types
</IfModule>

# "combinedio" includes actual counts of actual bytes received (%I)
# and sent (%O) - requires the 'mod_logio' module

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

ErrorLog /var/log/apache2/error.log
CustomLog /var/log/apache2/access.log combinedio


### Zeo Cluster Virtual Host ###################################################
<VirtualHost 127.0.0.1:8080>
    ProxyPreserveHost On
    
    Header add Set-Cookie "balancer_zopes=route.%{BALANCER_WORKER_ROUTE}e; path=/;"
    
    ProxyPass / balancer://zeocluster/
    <Proxy balancer://zeocluster>
        BalancerMember http://127.0.0.1:1040 route=client1
        BalancerMember http://127.0.0.1:1041 route=client2
        BalancerMember http://127.0.0.1:1042 route=client3
    </Proxy>
</VirtualHost>

### SSL Virtual Host ###########################################################

# example from http://httpd.apache.org/docs/2.2/mod/mod_ssl.html#sslsessioncache
# I think this should be conservative
SSLSessionCache shm:/usr/local/apache/logs/ssl_gcache_data(512000)

<VirtualHost 127.0.0.1:443>
    ServerName localhost
    
    SSLEngine on
    
    # this didn't work...
    # SSLCACertificatePath ...

    SSLCertificateKeyFile /home/jj/localhost.key
    
    # SSLCertificateChainFile /home/jj/localhost.ca-bundle
    
    SSLCertificateFile /home/jj/localhost.crt
    
    # ProxyPassInterpolateEnv On
    RedirectMatch /.*logged_out$ http://localhost:80/
    
    ProxyPass /admin http://localhost:8080/VirtualHostBase/https/localhost:443/VirtualHostRoot/_vh_admin
    ProxyPass / http://localhost:8080/VirtualHostBase/https/localhost:443/plone/VirtualHostRoot/
    
</VirtualHost>

### Non-SSL Virtual Host #######################################################

<VirtualHost 127.0.0.1:80>
    ServerName localhost

    ProxyPass / http://localhost:8080/VirtualHostBase/http/localhost:80/plone/VirtualHostRoot/
    RewriteEngine On
    RewriteCond %{HTTP_COOKIE} "__ac="
    RewriteRule ^(.*) https://%{SERVER_NAME}:443$1 [L]
    RewriteRule (.*)/login_form https://%{SERVER_NAME}:443$1/login_form [R]
</VirtualHost>

I was able to start apache using this file on a fresh Debian Lenny VM (bear in mind you have to create a self-signed certificate and a private key first)

The command looks like this:

   $ sudo /usr/sbin/apache2 -f /path/to/above/httpd.conf &

This has to be done as root (using sudo here) since port 80 and 443 are "priveledged" ports.

I, of course, have a process management daemon (runit) that handles this for me (and buildout recipes to configure it), which I will blog about at some point

I initially had this set up for testing with SSH tunneling (ssh -L) for the ports, and it worked well. This did however affect the setup, primarily I had to change the VHM paths so the URLs would be re-written to the local side of the tunnels, so if I connected like this:

    $ ssh -L1400:localhost:80 -L1443:localhost:443 jj@vmhost

The ProxyPass directive for the SSL VH looks like this:

    ProxyPass /admin http://localhost:8080/VirtualHostBase/https/localhost:1443/VirtualHostRoot/_vh_admin
    ProxyPass / http://localhost:8080/VirtualHostBase/https/localhost:1443/plone/VirtualHostRoot/

And the ProxyPass directive for the other host:

    ProxyPass / http://localhost:8080/VirtualHostBase/http/localhost:1400/plone/VirtualHostRoot/

This way the urls are pointed back through the SSL tunnel.

Discussion

The general Apache configuration is an amalgam of the /etc/apache2.conf file that came with Debian and some website I found about optimization.

Cluster

The Zeo Cluster virtual host uses mod_proxy_balancer to round-robin the Zope clients.

To get "sticky" sessions, I set a cookie using mod_headers that holds the "route" of the client. The routs are defined as additional information passed to the BalancerMember, and manifest in an evnironmen variable called "BALANCER_WORKER_ROUTE".

This way once someone comes into the site, as long as the cookie persists, they will always be proxied to the same client. I opted to let the cookie expire when the user closes their browser since that’s how Plone is set up by default.

SSL

The SSL virtual host is a very straight forward SSL setup. Since I used a self-signed certificate, I didn’t need a "ca-bundle" file.

The interesting bit is when I use ProxyPass to route all traffic to the load balancer. Here I utilize the Virtual Host Monster (standard issue in Zope/Plone these days) to ensure urls are properly re-written to use HTTPS. And while I’m at it, I also "hide" the plone root (called /plone in my case) so that it looks like https://www.myhost.com/ is the root of the plone site, instead of https://www.myhost.com/plone. Nice.

There’s another ProxyPass directive that proxies any access to /admin to the load balancer, but also uses some VHM magic to give the user access to the root of Zope, and the core ZMI. This allows access since we’re proxying / to /plone As a bonus, since this directive only exists in the SSL virtual host, this also "forces" the user to use SSL to do root ZMI stuff, which helps protect the admin password (and gives the administrator warm fuzzies).

I also add a RedirectMatch directive to push the user over to the non-ssl virtual host when they log out. No sense in keeping them on SSL when there’s no need.

Non-SSL HTTP

Finally, the "public" virtual host. It consists of a ProxyPass directive similar to the one used in the SSL virtual host, except in this case, we’re rewriting to port 80, standard http. The VHM bits are the same.

The last few directives are rewrite rules to get the user to use SSL if they are logged in, and redirect the user to the SSL virtual host when they click the "log in" link. I’m sure there’s more to do to sew this up (like alter the template for the login portlet), but this covers most of the bases.

The Future

Production Plone Infrastructure (Future Plans)

Production Plone Infrastructure (Future Plans)

Logging

I’d like to set up separate logging for each virtual host, possibly to syslogd (or some other collective logging solution).

Virtual Crash Cart Access

I may add a fourth Zope client. This fourth client wouldn’t be in the load balancer group and is solely responsible for handling ZMI requests. This was something Joel Burton mentioned at the Advanced Bootcamp, and I’ve always been keen on it. All the Zope clients can be busy and/or locked up, apache can be hosed, and I can still get to an unhosed client’s ZMI to try to shut things down or manually fix whatever caused the problem.

Caching

I haven’t included caching in this setup for two reasons:

  1. I don’t like to preemptively optimize, and
  2. Cache-fu does an extraordinarily good job just being there. :)

This infrastructure makes it easy to drop in a caching mechanism between the public virtual host and the load balancer virtual host if I need it.

I’m keen on mod_cache, since it can be loaded easily and keeps with my "do as much as we can in apache" mantra. However, my unscientific tests have shown little benefit from it’s use. Of course, that was before I came up with this scheme, and I was trying to keep everything in one apache host (which I’ve dubbed the "one host to unite them, one host to rule them" folly). Stay tuned for more on that.

Even if mod_cache doesn’t work out, there are still lots of other options, from Varnish to Squid to PigeonNet.

Summary

So here’s my attempt to do Plone Up RightTM, and distill all of the many tutorials that are out there into what I think is one pretty good approach. Please, leave comments and let me know if I’m out of my mind, or setting myself up for some pain later. All criticism is welcome. If I’m doing it right, that might be good to hear too :)

Advertisements
This entry was posted in plone. Bookmark the permalink.

3 Responses to Request For Comments: My Plone/Apache Stack

  1. I recently set up something similar. Unfortunately, there is a possible hole in your configuration: the Plone logged_out page has a login form. Your configuration directs logged_out to HTTP, so if someone logs out and uses the logged_out page to log in again, I think they will send their credentials over HTTP rather than HTTPS.

    I have not confirmed this, but if I’m right, I think the best solution is to change logged_out to display a link to the login page, rather than the login form. Another option is to leave users in HTTPS mode when logging out.

  2. Dale VanZile says:

    Fix for the logged_out page with login form: just skip it and drop the user back on the page….

    RewriteCond %{REQUEST_URI} ^.*/logged_out$
    RewriteRule ^(.*) %{HTTP_REFERER} [C]
    RewriteRule ^https://(.*) http://$1 [L]

  3. Dale VanZile says:

    If one wanted to also force port 80 when *not* logged in AND *not* at the login_form, how could it be done?

    I tried this in and it didn’t work:

    RewriteCond %{HTTP_COOKIE} !”__ac=”
    RewriteCond %{REQUEST_URI} !^.*/login_form.*
    RewriteRule ^(.*) http://%{SERVER_NAME}:80$1 [L]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s