Exploring Apache Sling Sitemap Generator With Customization in AEM
Prerequisite
AEM 6.5 installed with Service Pack 11+ or AEMaaCs SDK.
What is Apache Sling Sitemap Generator
As we all know, one crucial aspect of any website is its Sitemap, a file that provides a roadmap of all the pages of a website for search engines to crawl and index the website.
Below is the sample sitemap.xml file:
The Apache Sling Sitemap Generator uses Apache Sling to generate sitemaps dynamically based on the pages and content in an AEM website. This makes it easy for website owners to keep their sitemap up-to-date as they add, modify, or delete pages on their site.
Setting up Apache Sling Sitemap Generator
Setting up the Apache Sling Sitemap Generator in AEM is a straightforward process that involves a few steps:
- sling:sitemapRoot property
There are two approaches:
a) We need to paste this property on the page’s jcr:content node on which we need to generate a sitemap.xml and its descendant’s children. It is the boolean property set to TRUE.
b) Go to Sites —-> your project—-> page(where you want to apply this property)—-> Properties option—->Advanced Tab and then check to “Generate Sitemap“ checkbox. - Sitemap Servlet Configuration (Note: This step is only required for custom page resource type)
Go to the http://<host>:<port>/system/console/configMgr and search for “SitemapServlet” configuration, and add the page resource type under property “sling.servlet.resourceTypes”.
- Apache Sling Sitemap – Scheduler configuration
It is the Out of the box configuration (OOTB) provided by Apache Sling which generates the sitemap based on the cron expression under the path “/var/sitemap“ for those pages which have the “sling:sitemapRoot“ property as TRUE.Below is the sample screenshot of the configuration:
Required properties:
a) Name: To provide any generic name to the sitemap.
b) Schedule: A cron expression used to run the sitemap generation job like this “0 0/5 * * * ?”.
c) Search Path: This is the path from where the pages will be searched. We can change the path based on the requirement. By default, it is set to “/content”.
The scheduler will run based on the cron expression and the sitemap.xml file will be generated. Go to http://<host>:<port>/crx/de and search under this path “/var/sitemaps” sitemap.xml will be present as shown below in the screenshot:
We can access the sitemap by adding the sitemap.xml extension on the page for which we want to generate the sitemap like below http://<host>:<port>/<page-url>.sitemap.xml.
Sitemap Features
We can use a few available sitemap features based on our requirements.
- Excluding Pages Based on Robot Tags
Go to sites—-> your project—-> page(where you want to apply this property)—> properties option—-> advanced tab and then apply Robots Tags value to “noindex”.The page on which the above property is set will be deleted from the sitemap. This approach is more useful in the case of individual pages. - Changing Sitemap Storage Location
We can modify the path where the sitemaps are stored using the OOTB configuration “Apache Sling Sitemap Storage Configuration”. We need to create a new folder by following the same set of rules and permission as provided under “/var/sitemaps”.
Sitemap Customization
Recently we experienced a unique use case while implementing Sitemap dynamically in AEM using the Apache Sling Sitemap Generator tool. Here we will learn how to customize the OOTB generation of the sitemap based on the AEM website and business requirements. Some of the customization options are as follows:
- Excluding Pages Based on Template Types
Exclude multiple pages from the sitemap based on the following key factors page template types, page published, etc, by overriding the shouldInclude method of ResourceTreeSitemapGenerator class. In our use case, we excluded pages based on page template types for that, we created a Custom OSGI Configuration to exclude the custom template types.Below is the screenshot of the Custom OSGI Configuration:
Below is the sample code to override the shouldInclude method:
After this step go to Apache Sling Sitemap – Scheduler configuration and add this above class name to the “includeGenerator” property.
- Adding Custom Properties
We can add or remove custom properties like change frequency, priority, etc. by overriding the addResource method from the same ResourceTreeSitemapGenerator class.
Below is the sample code to override the addResource method:
- Changing the Sitemap Format
The Apache Sling Sitemap Generator supports both XML and HTML sitemap formats.
Now that you know how to configure and customize Apache Sling Sitemap Generator, you can use it according to your requirements and use cases. If you want to know more about the customization features, you can easily explore them.