Important Considerations for Migrating Content from a CMS to AEM – Blog Series – Blog 2
This is the second part of the two-blog series on ‘Important things to consider for Migrating Content to AEM’. Click here to checkout blog-1.
Following are some more points to help in the process of migration to AEM:
- Migration Strategy: Decision of what strategy/approach is needed to be followed for migration is very important. Following are the 3 strategies and respective criteria that can help in that decision:
- Manual migration: If most of the content doesn’t follow any specific pattern and requires a lot of cleanups in post-migration, or in cases where content follows a defined pattern but there is not much content (<1000 pages) to be migrated, manual content migration proves to be the efficient choice.
- Automation Migration Scripts: If the majority of the content follows a pattern and there is a medium level of content (1000-3000 pages of 1-2 templates) and is easy and fast to program with minimal coding, choice of automation migration scripts makes more sense.
- Automation Migration Scripts using ETL Tool: In the case of a large amount of content migration(>3000 pages), use of an ETL tool like Talend helps reduce the effort of writing the boilerplate code, and increases modularity, reusability, productivity and performance by a large multiple. Major advantages of an ETL are Talend for content migration. Read more about this on my blog here.
- Obviously, any of the above approaches can be mixed with one another to get the most productive process that suits your needs.
- Migration preparation: Pre-migration preparation includes cleanup and global variable initialization of static data. Static content includes any configurations or row-specific data that may be fetched from a property, XML or excel file.If the static content is required to be read for every row(page), it should be fetched once and kept in global variables before actual content migration starts. This eliminates the need to run the file handling code for every row of data. Although, at first it may seem as wasteful to store everything from files into global variables, but it’ll prove its worth in performance optimization in the long run. Eg: In one of the migration projects, the links to assets (documents and images) were present in 2 excel files and were connected to respective pages through an ‘ID’ matching the pages.The time to run the migration scripts for over 15k pages reduced by a third when I moved the data from excel files to global variables as instead of reading from a file, now I just had to traverse a global list of objects.
- Persist Status Logs in a file instead of console: Depending on the content size, migration is a performance intensive task no matter how much performance optimization is done. This sometimes may cause your system to crash due to insufficient memory or frozen processor or also may be due to external factors like power-outage. Persisting migration status logs in a file rather than on a console may seem to add a little overhead to the scripts, but it helps in knowing the exact status of the migration process, and sometimes even the cause of failure, before the system failure occurred which in turn helps in identifying if anythings needs to be reverted or fixed.
- AC Handling using package manager: If the content to be migrated or the parent path where the content to be migrated contains ACL node, you should look at the ACL filter modes that you need to add in the filter.xml when you are migrating using AEM’s Package Manager. By default, the ACL Nodes are left untouched on installing a package. Filter modes available for AC Handling are:
- Ignore – preserve ACLs in the repository
- Overwrite – overwrite ACLs in the repository
- Merge – merge both sets of ACLs
- Clear – clear ACLs
- More information about AC handling can be found in Adobe’ s documentation.
- Mime types of DAM Assets: MIME Type defined in the cq:resource node of DAM assets should be given special attention, as it determines the behavior of an asset for the end user. When a DAM asset is migrated, the actual file (Image or Document) is renamed to “original”. Due to the absence of an extension, the default MIME Type assigned to the asset is “application/octet-stream” instead of the extension derived MIMETypes shown in Felix console as shown below:
This creates two problems:- Changes the behavior of an asset, as now a GET request to that asset’s path will download it instead of opening it in a browser window irrespective of the browser supporting the plugin to handle that asset file.
- The thumbnails of the asset will not be generated, even if DAM workflows are enabled.