Pagination, sorting, crawling and storing data with Sitecore and Lucene

20 November 2013
Marek Musielak
Frink_Cognifide_2016_HeaderImages_0117

sitecore-lucene-search-and-indexing
Most scenarios in which Lucene indexing and search are used in Sitecore are related to big sets of items. In other cases Sitecore API and Sitecore queries are sufficient. That's why when we display the results of Lucene search we need to include proper sorting and pagination of result sets and displaying the data from the Lucene index directly without accessing corresponding items for returned Lucene documents.
 
This post assumes that you have basic knowledge about using Lucene with Sitecore. If you need an introduction on setting up indexing in your application, you can find it in my other blog post 'A quick guide how to setup the simplest Lucene search in Sitecore' and for information on how to create Lucene queries please refer to my other post 'The most common Lucene queries for Sitecore'.

 It's quite easy to introduce pagination and sorting while using Lucene search in Sitecore. Sorting requires to pass an additional parameter of type Sort to the Search method. It can be either simple sorting or sorting using more than a single criteria, e.g.:
Sort simpleSort = new Sort(new SortField("__created", SortField.STRING));

Sort sortByAuthorThenByCreatedDateDesc = new Sort(
  new SortField[]
    {
      new SortField("__created by", SortField.STRING),
      new SortField("__created", SortField.STRING, true)
    }
  );
Pagination requires additional number parameters while executing query and while fetching results from SearchHits. Here is a simple method for retrieving items which match a query including pagination and sorting:
public List<Item> GetItems(Query query, Sort sort, int page, int itemsPerPage)
{
  using (IndexSearchContext sc = 
    SearchManager.GetIndex("Custom Index").CreateSearchContext())
  {

    TopDocs docs = sc.Searcher.Search(query, null, page * itemsPerPage, sort);

    SearchHits searchHits = 
		  new SearchHits(docs, sc.Searcher.GetIndexReader());

    return searchHits
      .FetchResults((page - 1) * itemsPerPage, itemsPerPage)
      .Select(r => r.GetObject<Item>()).ToList();
  }
}

List<Item> items = GetItems(query, sort, page, itemsPerPage);
One thing you should be aware of is that by default Sitecore indexes text fields in a tokenized way which allows easy search for keywords but does not allow to use simple sorting on those fields. Text field types are:
  • Single-Line Text
  • Rich Text
  • Multi-Line Text
  • text
  • rich text
  • html
  • memo
  • Word Document
This means that one won't be able to sort by default by e.g. Title field which is Single-Line Text. Fortunately with Sitecore you can easily extend the crawler which is used to indexing and add your own not-tokenized field to the index with a very short custom crawler class inheriting from Sitecore.Search.Crawlers.DatabaseCrawler and a reference to your custom crawler class in the config file:
public class MyCrawler : Sitecore.Search.Crawlers.DatabaseCrawler
{
  protected override void 
    AddAllFields(Document document, Item item, bool versionSpecific)
  {
    base.AddAllFields(document, item, versionSpecific);

    document.Add(CreateField("my_title", item["title"], false, 1));

    WorkflowState state = item.State.GetWorkflowState();
    document.Add(CreateField("my_final_state", 
	    state != null && state.FinalState ? "1" : "", false, 1));

    document.Add(CreateDataField("data_title", item["title"]));
  }
}
<locations hint="list:AddCrawler">
  <custom-location-1 type="My.Assembly.Namespace.MyCrawler,My.Assembly">
  <Database>master</Database>
  <Root>/sitecore/content/Home</Root>
  </custom-location-1>
</locations>
Crawler above adds not only a not-tokenized field called 'my_title' to the index. It also adds information if the workflow state of the items is final (which is taken indirectly from the workflow state item). And what is more it creates a Data Field called 'data_title' which tells Lucene that it should store title of the item so one can get it directly from index without a need to retrieve the item itself from Sitecore, e.g.:
public List<string> GetTitle
    (Query query, Sort sort, int page, int itemsPerPage)
{
  using (IndexSearchContext sc = 
    SearchManager.GetIndex("Custom Index").CreateSearchContext())
  {

    TopDocs docs = sc.Searcher.Search(query, null, page * itemsPerPage, sort);

    SearchHits searchHits = new SearchHits(docs, sc.Searcher.GetIndexReader());

    return searchHits
      .FetchResults((page - 1) * itemsPerPage, itemsPerPage)
      .Select(r => r.Document.Get("data_title")).ToList();
  }
}
Knowledge from this post should be sufficient for most scenarios of usage Lucene indexing and searching in Sitecore. If you have any questions or comments, or if you want me to cover any other topic concerning Sitecore and Lucene, please leave a comment below. And if you're interested in more information about Lucene and Sitecore, check out my other
Sitecore blog posts.