Adobe CQ5 - OpenCalais Integration

14 August 2012
Mateusz Kula
Frink_Cognifide_2016_HeaderImages_0117
calais_logo

 In the massive amount of information available on the Internet it is getting more and more difficult to find relevant and valuable content and categorize it in one way or another. No doubt tagging this overwhelming amount of data is becoming more and more crucial from the SEO and digital marketing point of view as it plays important role in site positioning and allows end users a keyword search. Problems appear when editors are not scrupulous enough to add tags for new pages, press releases, blogs and tweets and to update them when content significantly changes. The worst case scenario is when there is a CMS filled with a whole bunch of untagged content. Then it may take too much time and resources to catch up with tagging. OpenCalais turns out to be a great solutions to such problems and what is more it allows for auto-tagging and can be easily integrated with other services.

About OpenCalais

OpenCalais (www.opencalais.com) was started as a metadata generation service allowing to incorporate semantic functionalities within Internet services. The OpenCalais web service analyzes submitted text and generates rich semantic metadata, part of which are entities that can be used for content tagging.

OpenCalais uses many techniques to generate these metadata including natural language processing and machine learning. The toolkit can find entities like people, places, organizations, technical terms and many, many others.

What is important to note is that the OpenCalais web service is free for private and commercial use. It offers quotas for 50 000 transactions per day and 4 transactions per second which should be more than enough for most use cases. On the website we can also find the information that this quota is negotiable if there is a need.

Document viewer

To make you able to quickly get through what the service has to offer the OpenCalais team has created a very useful document viewer tool which is available on this page. It is good to see what kind of metadata can be extracted from a few lines of text. To check what it can do just go there, paste something and see generated results with social tags, entities, events and facts.

Obtaining keys

To use the OpenCalais web service (and to make our integration package work) the user needs to register and request a free API key. To get the key you should visit this page.

OpenCalais integration

At Cognifide we thought it would a very tempting idea to integrate the OpenCalais service with the Adobe WEM platform and share it with the community to make automatic tagging easy and straightforward for editors worldwide and make the first step into promoting the semantic web. A fairly good way to integrate OpenCalais with CQ5 was to create a workflow step for content tagging that could be embedded into any workflow fired on some CQ event or by hand. The mentioned step pulls data internally from a page and calls the OpenCalais integration OSGi service.
Text data is collected from the fields of components lying on a page, then the integration service sends concatenated text to a web service and pulls tags back. Finally the workflow step adds nonexistent tags into the CQ tag manager "Calais" namespace and applies tagging to a page.

The diagram below shows data flow when the workflow containing the tagging step is fired. Data flow within the CQ - OpenCalais integration package Data flow within the CQ - OpenCalais integration package

Configuring the OpenCalais integration service

Once the integration package is installed it needs some configuration to work. The most important and in fact the only option required is to set the API key value. To do this you need to navigate to the host:port/system/console/configMgr and find "OpenCalais Integration" on the list and click the "Edit" button. Then fill in the "API Key" field with the key that was previously sent to you.
The next field that should be configured is "Content fields" (it contains information about the fields that will be sent to OpenCalais for analysis). Entries in this field have to be separated with a vertical line "|" (eg. "text | jcr:title"). Setting the API key and content field is enough for us to make tagging work but there are still two more options to consider (both not required): "Allow distribution" and "Allow search". The former indicates whether the extracted metadata can be distributed by OpenCalais. The latter indicates whether future searches can be performed on the extracted metadata by OpenCalais.

OpenCalais tagging workflow

In the installation package we deliver not only a workflow step for tagging but also an example workflow containing only our new step that allows to start without creating one’s own workflow.
The fastest way to get started is to navigate to some page on an author instance and run the OpenCalais tagging workflow. You can also configure workflow launcher to trigger auto-tagging on every page update.
If it was set up properly new tags should be added to "Calais" tags like in the picture below where we have run it on the Discover Geometrixx page. List of generated tags
List of generated tags

Page also should be tagged with generated tags like in the picture below: Tags generated for a sample page
Tags generated for a sample page

Summary

In this post I have described the importance of content tagging, what OpenCalais is and how it may help to enable content auto-tagging and to catch up with the whole untagged content already existing on the web. I have shown the solution for integrating OpenCalais with Adobe CQ5 that we have found at Cognifide and how it was implemented in a package that is available here and can be installed using package manager. To explain how to use the integration package provided by Cognifide I have also provided a simple example of page auto tagging within the Geometrixx sample content that comes with the default CQ installation.