Content Migration in AEM using SlingPostServlet
A very basic migration flow looks as follows:
In this scenario, you have a CMS(that could be Sitecore, Drupal, WordPress or any other CMS) which has source content that needs to be migrated to AEM. To achieve this, we typically need to do following things:
- Get content from source CMS in any format(XML, CSV, etc)
- Process this content and extract content that needs to be exported to AEM. This would include parsing XML/CSV exported by source CMS and massaging it(if needed).
- The processed content is then imported to AEM. There can be various strategies for this, like Talend, Package Manager and SlingPostServlet.
I like SlingPostServlet as I feel it is closer to coding than other strategies. This blog is focused on that only. Considering that source CMS gives you XML, I created a Groovy script for migration. Here are the steps:
- Parsing XML using Groovy XML Parser
[java]
def records = new XmlParser().parseText(file.text)?.blog
[/java]
- Creating a map out of parsed content
[java]
records.eachWithIndex { blog, idx ->
def name = blog.name.text()
def content = blog.outline.text()
def status = blog.status.text()
def parentSubject = blog.subject_parent.text()
Map contentMap = [
"./jcr:primaryType": "cq:Page",
"./jcr:content/jcr:primaryType": "cq:PageContent",
"./jcr:content/jcr:title": "${name}",
"./jcr:content/blog/sling:resourceType": resourceType,
"./jcr:content/blog/status": status,
"./jcr:content/blog/parentSubject": parentSubject,
"./jcr:content/blog/text/sling:resourceType": "foundation/components/text",
"./jcr:content/blog/text/text": "${content}"
]
callPost("${baseContentPath}blog${idx}", contentMap)
}
[/java]
Any key in the map corresponds to a property in JCR. If you split the property by “/”, last element would give you the property name and elements from first to second last gives you the hierarchy. Taking example of
[code]"./jcr:content/blog/status": "Published"[/code]
entry in the map, this key would create hierarchy jcr:content -> blog and blog node would have a property status and value Published .
- Posting content to AEM
[java]
void callPost(String baseURL, Map contentMap) {
/*Setting auth basic in request doesnt work… Had to set it in headers*/
// http.auth.basic("admin", "admin")
MigrationConfiguration.client.request(Method.POST) {
uri.path = baseURL
requestContentType = ContentType.URLENC
headers.’Authorization’ = "Basic ${"admin:admin".bytes.encodeBase64().toString()}"
body = contentMap
response.failure = { resp -> println "\nERROR: ${resp.statusLine} for ${uri.path}" }
}
}
[/java]
And that is all you need to do. 🙂
You can now check the content hierarchy in CRX. You can modify this Groovy script as per your use case.
In addition, if you would like to know more on Content Migration to AEM, here’s a simple step-by-step guide on how to do it?
Please put in your comments in case there are suggestions to improve it or if you face any issue with this.
Thanks!!
Wondering if anyone has tried both options and know the performance gap..
Interestingly, we had unexpected performance improvement while using REST based post (oracle WCM though) as against API level call. May be the custom code that used API call was not done right, but still curious to see if anyone has tried both options in AEM.
Thanks for the post. It’s an interesting approach. I’m trying an osgi custom polling importer route, but nice to know there’s another option.
@coloradobaugh : Thanks!! Its always good to compare different approaches.. Would be great to know the approach you are trying..
Do you have any experience how the performance is? Is it still useable if you have some million Nodes which have to be migrated?
PostServlet works fine for moderate amounts of content. for larger amounts I recommend to package the importer logic (themone parsing the XML in this example) as an OSGi bundle, deploy it into AEM and create the JCR nodes via the JCR API. This is usually a lot faster because the request processing needed for the Sling-based method is not needed anymore.
I AM NOT ABLE TO FIND THE GROOVY SCRIPT. PLEASE SHARE THE CODE.
@Sören : Sorry, Somehow I missed notification for the comment.
I did a dry run for about 500 pages that created about 2000 nodes nodes in JCR… It took me 11 odd seconds for the same..
I am not sure of the breakpoint by when SlingPostServlet would work better than JCR API but I think POST Servlet would be doing that behind the scene but it could be writing nodes in bulk which would be better in performance than writing individual nodes… Request processing time would surely be there in this approach as Michael mentioned…
What I liked about it was that I did not have to manually create all the nodes.. Based on the map that I POST, all nodes are automatically created.. Moreover in this, we need not worry about order in which nodes are there in the map…