Multidomain CQ mappings and Apache configuration

It is quite common that a single CQ instance handles multiple sites. Usually, each page has a separate domain assigned and can have a few language versions. Having such a multi-domain installation requires some additional, nontrivial configuration. This post is intended to serve as a quick referecnce document wherein you can learn how to configure 3 language versions of the well-known Geometrixx site to work on 3 domains: geometrixx.com
, geometrixx.de
and geometrixx.fr
.
First, let's configure CQ itself (with the Sling Mappings engine), create Apache rewrite rules and VirtualHosts, eliminate the cross-domain cache injection threat and finally we will perform refactoring of the Apache configuration to make it more concise.
Sling Mappings engine
According to dev.day.com, the best way to map a domain name to a web site is to use Sling Mappings. Mappings provide two useful features:
- long links in page content are shortened to a friendly form,
- short links are resolved to a full content path.
You can achieve short links resolution on the Apache level, using mod_rewrite. However, you cannot shorten links without esoteric modules like mod_subsitute. So, following Day's advice we will use Sling Mappings. By default, mappings are placed in the JCR /etc/map/http
directory. In our projects we usually use /etc/map/http
on an author instance and /etc/map.publish/http
on a publish instance (so one common package with mappings can be installed on both instances). You can change this path in the configuration of the JcrResourceResolverFactoryImpl
OSGi component.
This is my suggestion of the /etc/map.publish/http
:
00 { 01 jcr: primaryType: "sling:OrderedFolder", 02 geometrixx_com: { 03 sling:internalRedirect: ["/content/geometrixx/en.html"], 04 jcr:primaryType: "sling:Mapping", 05 sling:match: "geometrixx.com/$" 06 }, 07 geometrixx.com: { 08 sling:internalRedirect: ["/content/geometrixx/en"], 09 jcr:primaryType: "sling:Mapping", 10 redirect: { 11 sling:internalRedirect: ["/content/geometrixx/en/$1","/$1"], 12 jcr:primaryType: "sling:Mapping", 13 sling:match: "(.+)$" 14 } 15 }, 16 ... 17 }
After three dots in line 16 there are similar entries for .de and .fr domains: geometrixx_de
with geometrixx.de
and geometrixx_fr
with geometrixx.fr
. You can download a CQ package with the full version of the mappings.
Mapping geometrixx_com
(lines 2-6) is responsible for redirecting to the root page. So, if the user enters geometrixx.com
she or he will receive the page /content/geometrixx/en.html
. The dollar sign at the end of sling:match
(5) is a regexp control character meaning "end of the string", which results in the fact that this mapping will not be applicable if the user enters any path after the slash.
Mapping geometrixx.com
(7-15) is more complex. It consists of the parent (7-15) and the child (10-14). The parent does not contain the sling:match
property, so the node name (geometrixx.com
) is used as a URL pattern. This entry is responsible for shortening long links to a shorter form with a domain name, e.g. /content/geometrixx/en/products
will be shortened to geometrixx.com/products.html
.
A child entry is responsible for URL resolution. In order to match this mapping, a URL has to begin with geometrixx.com
(a domain inherited from the parent mapping) and after that it has to contains non-empty path string (regular expression (.+)$
at line 13). sling:internalRedirect
at line 11 is a list containing two entries: /$1
and /content/geometrixx/en/$1
. If the user enters geometrixx.com/etc/designs/geometrixx.css
, the first entry will be used. If the user enters geometrixx.com/products.html
, Sling will choose the second one and return /content/geometrixx/en/products.html
.
You can play with mappings using the Apache Felix web console. Just click the Sling Resource Resolver
link in the menu.
Apache mod_rewrite
After defining mappings (and probably adding an appropriate domain to the hosts
file) we can enjoy our multidomain CQ installation with short links. There is only one problem: a dispatcher. If we use some standard dispatcher configuration, there will be one cache directory for all sites. If the user requests the page geometrixx.com/products.html
, a dispatcher will create the file /products.html
in the cache dir. Now, if some other user requests the page geometrixx.de/products.html
, a dispatcher will find its cached English version and will serve it to the German user. In order to avoid such problems we should reflect the JCR directory structure in a dispatcher. The easiest way to expand shortened paths is to use the Apache rewrite engine. Basically, we will try to simulate the Sling resolving mechanism. The following rules will do the job:
00 RewriteEngine On 01 RewriteRule ^/$ /content/geometrixx/en.html [PT,L] 02 RewriteCond %{REQUEST_URI} !^/apps 03 RewriteCond %{REQUEST_URI} !^/bin 04 RewriteCond %{REQUEST_URI} !^/content 05 RewriteCond %{REQUEST_URI} !^/etc 06 RewriteCond %{REQUEST_URI} !^/home 07 RewriteCond %{REQUEST_URI} !^/libs 08 RewriteCond %{REQUEST_URI} !^/tmp 09 RewriteCond %{REQUEST_URI} !^/var 10 RewriteRule ^/(.*)$ /content/geometrixx/en/$1 [PT,L]At the begining (1) we check if the entered URL contains an empty path (e.g.
http://geometrixx.com/
). If so, the user will be forwarded to the homepage. Otherwise, we check if the entered path is shortened (it does not begin with apps
, content
, home
, etc
. - lines 2-8). If it is, the rewrite engine will add /content/geometrixx/en
while creating the absolute path (9).
Apache VirtualHost
As you can see, this rule is valid only for the geometrixx.com domain, so we need similar rules for each domain and some mechanism for recognizing a current domain. Such a mechanism in Apache is called VirtualHost. A sample configuration file of the Apache2 VirtualHost looks as follows:
<VirtualHost *:80> ServerAdmin webmaster@localhost ServerName geometrixx.com DocumentRoot /opt/cq/dispatcher/publish <Directory /opt/cq/dispatcher/publish> Options FollowSymLinks AllowOverride None </Directory> <IfModule disp_apache2.c> SetHandler dispatcher-handler </IfModule> [... above rewrite rules ...] LogLevel warn CustomLog ${APACHE_LOG_DIR}/access-geo-en.log combined ErrorLog ${APACHE_LOG_DIR}/error-geo-en.log </VirtualHost>
All VirtualHosts can use a shared dispatcher directory. Create similar files for each domain.
Cross-domain injection threat
Because users are able to enter a full content path after a given domain name, e.g. geometrixx.com/content/geometrixx/en/products.html
, they may as well get a page that belongs to some other domain, e.g. geometrixx.com/content/geometrixx/fr/products.html
. In order to avoid such a situation, we need to check all requests for path beginning with /content
and reject these which are not related to any campaign, DAM or a current domain:
RewriteCond %{REQUEST_URI} ^/content RewriteCond %{REQUEST_URI} !^/content/campaigns RewriteCond %{REQUEST_URI} !^/content/dam RewriteRule !^/content/geometrixx/en - [R=404,L,NC]
Macros
Our rewrite configuration has become quite complicated and (what is worse) has to be included in each Apache VirtualHost configuration. Fortunately, we can avoid repetitions using the Apache macro module. Add the following expand-cq-paths
file to your conf.d
directory:
<Macro ExpandCqPaths $path> RewriteEngine On RewriteRule ^/$ $path.html [PT,L] RewriteCond %{REQUEST_URI} ^/content RewriteCond %{REQUEST_URI} !^/content/campaigns RewriteCond %{REQUEST_URI} !^/content/dam RewriteRule !^$path - [R=404,L,NC] RewriteCond %{REQUEST_URI} !^/apps RewriteCond %{REQUEST_URI} !^/content RewriteCond %{REQUEST_URI} !^/etc RewriteCond %{REQUEST_URI} !^/home RewriteCond %{REQUEST_URI} !^/libs RewriteCond %{REQUEST_URI} !^/tmp RewriteCond %{REQUEST_URI} !^/var RewriteRule ^/(.*)$ $path/$1 [PT,L] </Macro>
After that you can include a macro in each VirtualHost with the Use
directive:
Use ExpandCqPaths /content/geometrixx/en
Because the Macro module is an external Apache2 library, you might need to install it separately. On Debian you can install and enable it using two commands:
# apt-get install libapache2-mod-macro # a2enmod macro
If you use any other Linux distribution or Windows, please find the appropriate version of the module and the installation instruction on the mod_macro homepage.
Dispatcher configuration
You can use the out-of-the-box dispatcher configuration. The only assumption is that its docroot is set to /opt/cq/dispatcher/publish
. Apache and dispatcher configuration is available to download.
Summary
Now, you have configured a CQ installation with 3 domains, using Sling Mappings, Apache 2 mod_rewrite
and VirtualHost mechanisms. We have also prevented cross-domain injection attacks and performed Apache 2 configuration refactoring using mod_macro
. The configuration described above should be enough to prepare a custom multi-domain installation from scratch.
Questions?
Do not hesitate to ask! Add them as a comment..
Additional Resources