Four approaches to integrating AEM with external systems

20 May 2014
Jan Kuzniak
Frink_Cognifide_2016_HeaderImages_0117

It all began six years ago, when I first built a Communiqué 4 site that talked to some external systems for package tracking and integration with some legacy search engine. Back then the CMS was just a small part of the bigger infrastructure and such scenarios were not very common. When Adobe acquired Communiqué, the number of enterprise implementations shot through the roof and enterprise customers brought more and more integration requirements. Today I rarely see a project that doesn't integrate with at least one external system. Actually, I am surprised when CQ5 / AEM is not used as a focal point where internal systems (and business departments) meet together.

It all looks great in theory, while at the same time giving a serious headache to all of us – the technical bunch. Luckily, once you do a few of those integrations, some patterns emerge. I strongly believe that generally it all falls into one of four bags. All in all your page has to be assembled from CMS and external content somewhere, and usually you get to choose from c.a. four places to do that.

01-diagram

The diagram above shows the usual scenario. Your users connect to the AEM instance via some caching / proxy layer – most often a CDN and a Dispatcher instance. Let’s explore this diagram and see where we can put the page assembly and what would that mean for the system.

Designer’s Heaven

02designers-heaven

The obvious choice is the one that is the most natural – use AEM, the focal point, to do that. So it’s AEM that grabs external content and puts it into the page by the means of custom components. That is simple, easy to integrate with AEM features and gives great authoring experience. It’s also easy to change and optimise, so it gives a quick time to market. Heaven. Well, almost.

CQ5 instance was never the speed king when it comes to page rendering. While it’s yet to be found how much quicker AEM is in real life application we should assume the dispatcher cache is essential. With dynamic content on a page however that basic whole-page cache may not be a viable option. In practice going this route means rigorous performance and caching optimisation and possibly big deployments.

Another challenge for some will be security, especially when considering customer’s sensitive data (e.g. banking). If such data are to flow through CQ – the whole system should be audited, which in some cases may mean expensive external audits for each release. While at it, sharing authorisation between AEM and an external system is not to be sneezed at either.

My recommendation is to limit this approach to situations where you don’t expect much traffic, there is no authorisation, data are not sensitive and ideally cacheable on a short (but reasonable) TTL.

Let’s fuel apps

03lets-fuel-apps

To avoid AEM being the performance bottleneck the page assembly can be moved outside, e.g. to the third party application itself. In that scenario AEM would produce sort of a template to be populated with data by the external system. That means anything between outputting Velocity Markup placeholders where price would be and producing a full blown PHP page. This way or another, authors would manage the static content that can easily get cached. And caching means the performance issue is gone. On top of that, the external system can manage users, keep sessions and provide its own, custom flow without any complications to the AEM implementation itself.

This however comes at a price. The most obvious challenge is providing reasonable authoring experience – especially when the template produced is not pure HTML. Never mind if you go for producing two renditions, or emulating the external system’s template execution – it always gets to be additional effort. But the biggest drawback is the general entanglement between the two systems which will hold you back when releasing to any of the two.

Go Home IE6!

04-go-home-ie6

So far we exchanged AEM performance for system entanglement. While it enables more scenarios, it clearly is not ideal. What if we moved that assembly far away from our systems? And the furthest part of the diagram is the browser. If only you can ignore old and exotic browsers that is – which recently became more and more common.

In this approach AEM again renders just a template. A plain HTML template I must add – so that it can easily be reused for authoring. The third party application exposes the external (dynamic) part via RESTful web service (e.g. as JSON). The browser communicates with the web service to populate empty HTML template with content. As you can see, that is again very easy to cache without a drawback of tangling two systems together. It’s also letting you centralise all look and feel in AEM and change without worrying about the other system.

The drawbacks include a delay in serving the content – the browser reads the external content after rendering a page, which most likely results in a “blink”. That’s usually acceptable unless you optimise for mobile, where reducing a number of requests is the key to good experience. Even with that it’s still a good idea to use this approach for multi-form flows, or visually small dynamic components that would otherwise make page caching impossible.

Performance Heaven

05-performance-heaven

The last area that could do page assembly is the caching layer. On the CDN you most likely can execute Edge Side Includes (ESI), and on a dispatcher – Server Side Includes (SSI), e.g. with mod_include on Apache HTTP Server. This means the AEM is again rendering a template with SSI / ESI include directives where HTML snippets from a third party app will come. It is again a trade-off: you give some HTML control away but you page is assembled already when it reaches a mobile device (no delay).

In practice you will most likely use it to include whole “components” rather than fill in placeholders on your site. If only you have your dynamic content restricted to specific areas (components) you get caching as good as with JavaScript approach but with no additional requests and API hidden from the public. The biggest cost of that approach is more complex Development – Test – Acceptance – Production route. The requirement on additional servers for Dev and Test can however be easily bypassed with tools like Sling Dynamic Includes.

Summary

So there you go. When you look at it from a perspective of “who gets the hit” it’s very straightforward. You just have to look into your performance requirements (server vs. client side assembly), availability (DoS risk, cache’s TTL) and content lifecycle (especially around cache invalidation). Just remember – there is no right or wrong way, all of the four have their applications.