Sitecore: Indexing Patterns

Sitecore: Indexing Patterns

Proper Search Index implementation is an essential part of development for the Sitecore platform. When you begin to work with indexes, you have to decide what index to use in your project and how to store data in it. This article will show you possible options and give you tools to make informed decisions.

The Basics: Sitecore Indexes

Let’s start from the basics. Sitecore provides three predefined indexes: (I hope that every developer knows about it; eleven more indexes were added in Sitecore 8+ but at this point that’s not important!):

  • sitecore_core_index
  • sitecore_master_index
  • sitecore_web_index

These three indexes contain all the versions for all the items from the corresponding Sitecore database. Sitecore uses these indexes for Item Buckets, Search based Fields like “Multilist with Search”, media items search, etc.

Now, after we know how Sitecore uses its indexes, it’s time to decide how to store the custom data that we need. Let’s talk about typical scenarios, which could include site content search, Product Catalog, Blog, Locations, Employee Directory, etc. All of these scenarios have common features:

  • they all use some subset of Sitecore items.
  • those items share the same Template (sometimes a few templates).
  • they represent one business area of the site. In software engineering we call that the “Business Domain.”

So, what are our options (patterns) to store the data?

God Index

A God Index is an approach where all the data is stored in an existing sitecore_DbName_index index. That index is available out of the box and seems to be a good candidate for storing our data. Typically, a developer only needs to add computed fields for custom data to make the index work.

Despite its simplicity, the approach I’ve described has several disadvantages. If you are familiar with the “God Object” pattern you can probably guess what one of those could be: The index has too many responsibilities.

Below are issues I have seen in sites implementing that pattern:

  1. The index keeps too much data. Data for every version, in every language, for every Sitecore item are stored in that index. Furthermore, every document includes all possible fields. As a result, index update/rebuild becomes slowly and rebuild causes significant downtime because of the index’s size.
  2. Data for multiple tenants is stored in one index. You cannot maintain your sites independently because of shared storage. A configuration or data model update that you make for one site can break other tenants.
  3. Queries are overcomplicated. We need to add extra filters to exclude data that isn’t relevant for our business logic.
  4. Every document contains lots of extra fields. That affects performance and reduces data readability and the ability to debug.
  5. All the data shares the same update strategy. You cannot update specific entities like products, blogs, pages and articles on individual schedules.
  6. Multi-region implementations use the index from different time zones. This limits the time when index maintenance can be performed.
  7. Maintenance collapse. Very soon you can arrive at a state where nobody knows how exactly the index is used. In such situations, you can’t refactor the index. Instead, you have to freeze the existing data model and flow.

All of these issues significantly complicate index maintenance. That’s the price we have to pay for simplified configuration. But, there is an alternative approach–which we’ll call “Domain Index” – that addresses the issues above.

Domain Index

With a Domain Index we are not trying to put all our business data into the existing Sitecore_DbName_index index. Instead, we identify our business domains and configure a new index for every domain. That includes building individual indexes per tenant for multi-tenant implementations, and creating separate indexes for master and web databases.

The Upside

A Domain Index needs to include only the data (documents and fields) that is relevant for our current business area. Data should be stored in the format that works best for us. This is how index names can look in a product catalog:

  • myTenant_products_master_index
  • myTenant_products_web_index

If you want to learn the advantages of Domain Index, please scroll back up to the section outlining the issues presented by the God Index. The Domain Index addresses all of them!

The Downside

The one negative to a Domain Index is more complicated setup. It’s key to configure the indexes to exclude data “noise.” The configuration below is applicable for Sitecore 8.1. The same result can be achieved in previous Sitecore versions, but the paths may be different.

  • Filter Items by Root Item – sitecore/contentSearch/configuration/indexes/index/locations/crawler/Root
  • Filter items by Template Type – sitecore/contentSearch/indexConfigurations/yourIndexConfiguration/documentOptions/include
  • Remove standard fields by turning off indexAllFields in sitecore/contentSearch/indexConfigurations/yourIndexConfiguration or exclude fields one by one using sitecore/contentSearch/indexConfigurations/yourIndexConfiguration/DocumentOptions/exclude hint=”list:AddExcludedField” section
  • Add computed fields into sitecore/contentSearch/indexConfigurations/yourIndexConfiguration/documentOptions/include section
  • Configure update strategy (you can read more about that in this John West post)

The Sitecore Perspective

Someone might ask ourselves: That’s all great, but doesn’t using a Domain Index conflict with Sitecore’s strategy? Does Sitecore want us to use that pattern?

The answer is – Yes! Sitecore is actually using that pattern by itself.

Now it is time to look at these new 11 Indexes assigned to Sitecore 8. There are some names:

  • sitecore_marketing_asset_index_master
  • sitecore_marketing_asset_index_web
  • sitecore_fxm_master_index
  • sitecore_fxm_web_index

That index name schema should look familiar.

And now, if you take a look at three default Sitecore_DbName_index indexes from a domains perspective, you will immediately realize that they have also implemented a Domain Index pattern. Their domain is “Content Authoring” and these indexes are well designed for that domain. So how can a Domain Index become a God Index? It is because of you and me–developers–who start using indexes for the wrong purpose.

That reminds me of Schrödinger’s cat. Sitecore indexes implement a Content Authoring Domain search out of the box. But the moment developers query them for the purposes of another business area, something strange happens. The indexes cease to be pure Domain Indexes. They take on something from dark side of the God Index anti-pattern.

Enjoy!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s