Improving Solr performance in Sitecore - Part 1
Recently I got an assignment, the most important one I’ve had since I started to work with Sitecore. This project had some search components, and those components were extremely lazy.
Most end users don’t care about things like architecture, design patterns, and yes, website performance. They want the most beautiful application that runs without errors and that’s it. Simple, isn’t it?
Well, when this application starts to grow up, then the problems start to appear.
Because Sitecore comes with Solr integration, we use it to deliver a better experience and increased performance in our projects. I’m here today to explore some ways to implement a good environment to make your Sitecore solution work smoothly with Solr.
First things first, what is Solr?
Solr is an open source search platform managed by the Apache Software Foundation and described as: “Solr is a search server built on top of Apache Lucene, an open-source, Java-based, information retrieval library. It is designed to drive powerful document retrieval applications - wherever you need to serve data to users based on their queries, Solr can work for you.”
Example Solr application integration (image source - apache.org)
One of the most common questions asked when starting a project/page from scratch that has a search component is: Do we have time to implement a full search component architecture?
Most of the time I bet that answer is “no” but, actually, working with Solr is not complicated if you have, at least, a base knowledge.
Let’s create a sample schema to build a new Index into Solr. Here is an XML example:
Let’s explore some important points above:
- You can add specific templates to your new index. This is a recommended step, once you have defined which templates you want in your new index. If you are going to have more than 1 search component in your solution, and you can split the context into different templates, then create a separate schema for each component.
- You can specify which template fields you want to be attached to each document you are going to index. Be subjective in what the final user can use to search in your search component. Avoid adding generic fields like rich-text fields (this one we can clean the content before sending it to Solr by removing all HTML code), media fields, or generic link fields. The main goal here is to have indexed only the real and necessary content that should be searched.
Let’s say you have a rich-text field that has too much content, and does not necessarily need to be indexed, you can create a new field in this template to store only the keywords that are present in that rich-text content. In this case, you are going to send this new keyword field to Solr instead.
- You can define the strategy to tell Sitecore when the item should be indexed, search about the available strategies and select which one fits best in your solution.
- You can specify which database should be used to get the Sitecore items to be indexed too. Do not add specific items from the Master database to Solr unless extremely necessary; remember, the final user is accessing the content presented in the Web database, what is the real reason for having Sitecore items from the Master database indexed? Use the location property to define from which database (and start path) and Sitecore will send the items to Solr.
In the next article, I’m going to talk about how to override the main Sitecore process to access the field values before being indexed, and I will provide some code examples on how to get the Solr content using Solr Queries.