Multidomain CQ mappings and Apache configuration

19 July 2012
Tomasz Rekawek
Frink_Cognifide_2016_HeaderImages_0117

It is quite common that a single CQ instance handles multiple sites. Usually, each page has a separate domain assigned and can have a few language versions. Having such a multi-domain installation requires some additional, nontrivial configuration. This post is intended to serve as a quick referecnce document wherein you can learn how to configure 3 language versions of the well-known Geometrixx site to work on 3 domains: geometrixx.comgeometrixx.de and geometrixx.fr.

First, let's configure CQ itself (with the Sling Mappings engine), create Apache rewrite rules and VirtualHosts, eliminate the cross-domain cache injection threat and finally we will perform refactoring of the Apache configuration to make it more concise.

Sling Mappings engine

According to dev.day.com, the best way to map a domain name to a web site is to use Sling Mappings. Mappings provide two useful features:

  • long links in page content are shortened to a friendly form,
  • short links are resolved to a full content path.

You can achieve short links resolution on the Apache level, using mod_rewrite. However, you cannot shorten links without esoteric modules like mod_subsitute. So, following Day's advice we will use Sling Mappings.  By default, mappings are placed in the JCR /etc/map/http directory. In our projects we usually use /etc/map/http on an author instance and /etc/map.publish/http on a publish instance (so one common package with mappings can be installed on both instances). You can change this path in the configuration of the JcrResourceResolverFactoryImpl OSGi component.

This is my suggestion of the /etc/map.publish/http:

00 {
01     jcr: primaryType: "sling:OrderedFolder",
02     geometrixx_com: {
03         sling:internalRedirect: ["/content/geometrixx/en.html"],
04         jcr:primaryType: "sling:Mapping",
05         sling:match: "geometrixx.com/$"
06     },
07     geometrixx.com: {
08         sling:internalRedirect: ["/content/geometrixx/en"],
09         jcr:primaryType: "sling:Mapping",
10         redirect: {
11             sling:internalRedirect: ["/content/geometrixx/en/$1","/$1"],
12             jcr:primaryType: "sling:Mapping",
13             sling:match: "(.+)$"
14         }
15     },
16     ...
17 }

After three dots in line 16 there are similar entries for .de and .fr domains: geometrixx_de with geometrixx.de and geometrixx_fr with geometrixx.fr. You can download a CQ package with the full version of the mappings.

Mapping geometrixx_com (lines 2-6) is responsible for redirecting to the root page. So, if the user enters geometrixx.com she or he will receive the page /content/geometrixx/en.html. The dollar sign at the end of sling:match (5) is a regexp control character meaning "end of the string", which results in the fact that this mapping will not be applicable if the user enters any path after the slash.

Mapping geometrixx.com (7-15) is more complex. It consists of the parent (7-15) and the child (10-14). The parent does not contain the sling:match property, so the node name (geometrixx.com) is used as a URL pattern. This entry is responsible for shortening long links to a shorter form with a domain name, e.g. /content/geometrixx/en/products will be shortened to geometrixx.com/products.html.

A child entry is responsible for URL resolution. In order to match this mapping, a URL has to begin with geometrixx.com (a domain inherited from the parent mapping) and after that it has to contains non-empty path string (regular expression (.+)$ at line 13). sling:internalRedirect at line 11 is a list containing two entries: /$1 and /content/geometrixx/en/$1. If the user enters geometrixx.com/etc/designs/geometrixx.css, the first entry will be used. If the user enters geometrixx.com/products.html, Sling will choose the second one and return /content/geometrixx/en/products.html.

You can play with mappings using the Apache Felix web console. Just click the Sling Resource Resolver link in the menu.

Apache mod_rewrite

After defining mappings (and probably adding an appropriate domain to the hosts file) we can enjoy our multidomain CQ installation with short links. There is only one problem: a dispatcher. If we use some standard dispatcher configuration, there will be one cache directory for all sites. If the user requests the page geometrixx.com/products.html, a dispatcher will create the file /products.html in the cache dir. Now, if some other user requests the page geometrixx.de/products.html, a dispatcher will find its cached English version and will serve it to the German user. In order to avoid such problems we should reflect the JCR directory structure in a dispatcher. The easiest way to expand shortened paths is to use the Apache rewrite engine. Basically, we will try to simulate the Sling resolving mechanism. The following rules will do the job:

00   RewriteEngine On
01   RewriteRule ^/$ /content/geometrixx/en.html [PT,L]
02   RewriteCond %{REQUEST_URI} !^/apps
03   RewriteCond %{REQUEST_URI} !^/bin
04   RewriteCond %{REQUEST_URI} !^/content
05   RewriteCond %{REQUEST_URI} !^/etc
06   RewriteCond %{REQUEST_URI} !^/home
07   RewriteCond %{REQUEST_URI} !^/libs
08   RewriteCond %{REQUEST_URI} !^/tmp
09   RewriteCond %{REQUEST_URI} !^/var
10   RewriteRule ^/(.*)$ /content/geometrixx/en/$1 [PT,L]
At the begining (1) we check if the entered URL contains an empty path (e.g. http://geometrixx.com/). If so, the user will be forwarded to the homepage. Otherwise, we check if the entered path is shortened (it does not begin with appscontent,  homeetc. - lines 2-8). If it is, the rewrite engine will add /content/geometrixx/en while creating the absolute path (9).

Apache VirtualHost

As you can see, this rule is valid only for the geometrixx.com domain, so we need similar rules for each domain and some mechanism for recognizing a current domain. Such a mechanism in Apache is called VirtualHost. A sample configuration file of the Apache2 VirtualHost looks as follows:

<VirtualHost *:80>
    ServerAdmin webmaster@localhost
    ServerName geometrixx.com

    DocumentRoot /opt/cq/dispatcher/publish
    <Directory /opt/cq/dispatcher/publish>
        Options FollowSymLinks
        AllowOverride None
    </Directory>

    <IfModule disp_apache2.c>
        SetHandler dispatcher-handler
    </IfModule>

[... above rewrite rules ...]

    LogLevel warn
    CustomLog ${APACHE_LOG_DIR}/access-geo-en.log combined
    ErrorLog ${APACHE_LOG_DIR}/error-geo-en.log
</VirtualHost>

All VirtualHosts can use a shared dispatcher directory. Create similar files for each domain.

Cross-domain injection threat

Because users are able to enter a full content path after a given domain name, e.g. geometrixx.com/content/geometrixx/en/products.html, they may as well get a page that belongs to some other domain, e.g. geometrixx.com/content/geometrixx/fr/products.html. In order to avoid such a situation, we need to check all requests for path beginning with /content and reject these which are not related to any campaign, DAM or a current domain:

RewriteCond %{REQUEST_URI} ^/content
RewriteCond %{REQUEST_URI} !^/content/campaigns
RewriteCond %{REQUEST_URI} !^/content/dam
RewriteRule !^/content/geometrixx/en - [R=404,L,NC]

Macros

Our rewrite configuration has become quite complicated and (what is worse) has to be included in each Apache VirtualHost configuration. Fortunately, we can avoid repetitions using the Apache macro module. Add the following expand-cq-paths file to your conf.d directory:

<Macro ExpandCqPaths $path>
        RewriteEngine On

        RewriteRule ^/$ $path.html [PT,L]

        RewriteCond %{REQUEST_URI} ^/content
        RewriteCond %{REQUEST_URI} !^/content/campaigns
        RewriteCond %{REQUEST_URI} !^/content/dam
        RewriteRule !^$path - [R=404,L,NC]

        RewriteCond %{REQUEST_URI} !^/apps
        RewriteCond %{REQUEST_URI} !^/content
        RewriteCond %{REQUEST_URI} !^/etc
        RewriteCond %{REQUEST_URI} !^/home
        RewriteCond %{REQUEST_URI} !^/libs
        RewriteCond %{REQUEST_URI} !^/tmp
        RewriteCond %{REQUEST_URI} !^/var
        RewriteRule ^/(.*)$ $path/$1 [PT,L]
</Macro>

After that you can include a macro in each VirtualHost with the Use directive:

Use ExpandCqPaths /content/geometrixx/en

Because the Macro module is an external Apache2 library, you might need to install it separately. On Debian you can install and enable it using two commands:

# apt-get install libapache2-mod-macro
# a2enmod macro

If you use any other Linux distribution or Windows, please find the appropriate version of the module and the installation instruction on the mod_macro homepage.

Dispatcher configuration

You can use the out-of-the-box dispatcher configuration. The only assumption is that its docroot is set to /opt/cq/dispatcher/publish. Apache and dispatcher configuration is available to download.

Summary

Now, you have configured a CQ installation with 3 domains, using Sling Mappings, Apache 2 mod_rewrite and VirtualHost mechanisms. We have also prevented cross-domain injection attacks and performed Apache 2 configuration refactoring using mod_macro. The configuration described above should be enough to prepare a custom multi-domain installation from scratch.

Questions?

Do not hesitate to ask! Add them as a comment..

Additional Resources