Snapshot Migration to S3 and Extract Data from Snapshot using Athena

AWS, Cloud, Data & Analytics

11 / Nov / 2023 by Patel Akshaykumar 0 comments

Introduction

Sometimes We have to keep our database backup for longer retention for Security and DR Compliances in RDS. But as we know, if we have a longer backup period in RDS, it will also create more burden on the pocket, as RDS charges for storage in Normal S3 bucket format. So we can export our backup snapshots from RDS to an S3 bucket with Apache Parquet Format, which can save some charges, and we can use those snapshot data for future reports or compliance checks as well using Athena and Glue.

We can export all types of backup from RDS to S3
Steps to export to s3:
- Create an S3 bucket with Proper IAM Permission
- Create a KMS key for server-side encryption (SSE)
- We can use CLI or GUI to perform this action
No Impact on Performance as export runs in the background
Data will be exported into S3 in Apache Parquet format. It is 2x faster and stores 1/6 of the storage compared to the normal test format
We can use that exported data to create reports using Athena and EMR

Architecture of Migration Process

Used Tools and AWS Components –

AWS RDS
AWS S3
AWS KMS
AWS IAM
AWS Glue
AWS Crawler
AWS Athena

Steps Performed for this Activity

Create an S3 bucket in AWS
Create a KMS Key or use the Existing Key for Encryption
Take RDS Database Snapshot
Export Snapshot to S3 Bucket with Parquet Format
Validate the Snapshot Migration (Check Size and Cost)
Create a Glue Crawler and Run it to fetch data
Use Athena to Execute Query and generate Report – Data Will be stored in CSV format in the S3 bucket location
You can also use those output files to import data into another database.

Execution Steps –

This process has been performed on the Beta Database of PostgreSQL Snapshot.

Step 1: Export of RDS DB Snapshot to S3 Bucket

Take a Snapshot of the Beta Database from RDS
Export Database Snapshot to S3 bucket from Snapshot Menu
Created KMS key as “glue-kms” in Key Management Service with default settings and copied the arn of the key created to export db snapshot configuration, and Crawler will use same to extract data by AWS Glue.
Select S3 Bucket to upload and provide a proper IAM Role with a Newly created KMS Key to encrypt the database snapshot.
Snapshot has been exported in the S3 bucket and checked directly from the S3 Console.
After completion of the export task to the S3 bucket, I am able to see the databases and tables in the bucket named test-beta.

Step 2: Create and Run Crawler in AWS Glue to export S3 data in Glue Data Catalog

In AWS Glue Console, Go to the crawler option and click on the add crawler button. Then, give the crawler name as beta-demo and click next
Click on Next and select a data source
Select the S3 folder and provide the path that you want to extract and get data from a particular table or full database.
Here, we have selected a specific table customer_master to check.
Select the IAM role which AWS Glue and Crawler will use to access Snapshot from S3 Bucket

Create Database

1. Review the Created Crawler

2. Select Data Source for Crawler

3. Create a schedule for Crawler

4. Run The Crawler and see the process

5. Check the Database in AWS Glue, and you will find the table that the crawler extracted from the S3 Bucket

Step 3: In Athena, run of queries and store of queries output in S3 bucket

1. Check Query Editor

2. In Query Editor, You will find the Data Source as AWS Data Catalog, and demodb will be there as a database because it is the only database available right now.

3. Go to Settings and specify the location to store output, which you will get after the execution of queries in Athena , we have provided the Same S3 bucket with a New Folder in it.

4. Execute the query and see the result.

5. Check the Output in the Query editor and the S3 bucket, where it will be saved in CSV format.

You can now download the CSV file open it, and check the result to verify.

Notes:-

A crawler will be used if the requirement comes to get a report or data from any historical snapshot
Costing of the crawler will be based on the minutes of usage
Costing of Athena will be based on the data explored from snapshot

Costing Summary

S3 Storage – $0.025 per GB per month

KMS Key – $1 per key per month

KMS encrypt pricing – $0.03 for encrypting 10,000 objects

RDS Export to S3 Cost – $0.010 per GB of snapshot size

Glue Crawler Pricing – $0.44 per DPU-Hour, billed per second, with a 10-minute minimum per crawler run

Athena Pricing – $5 per TB of data scanned

Cost Estimation for the Complete Operation

Total Size of Snapshots (TB)	65

Activity	Quantity	Rate (per quantity’s dimension)	Cost	Frequency	Trial Performed on a snapshot
KMS Key	1	$1.00	$1.00	Monthly	Size of Snapshot (TB)	2.5
S3 Storage (GB)	6,500	$0.025	$162.50	Monthly	Size after Compression (TB)	0.239
S3 API (number of PUT objects)	3209205	$0.0000153	$48.95	One-time	Objects created from snapshot migration	118000
KMS Encrypt	3209205	$0.000003	$9.63	One-time	S3 API Cost	1.8
RDS Export (per GB)	65000	$0.01	$650.00	One-time
Crawler Running Minutes	240	$0.0073	$1.76	Monthly	Assumptions Derived
Athena Data Scanned (TB)	3	$5.00	$15.00	Monthly	Compression Ratio	10:1
		Total	$888.84		Number of Objects per GB after Compression	494
					API Cost per GB after compression	$0.0075
					API Cost per object	$0.000015
NOTE: Additional KMS API charges will be applicable when Crawler and Athena will decrypt the objects
					Additional Assumptions
					Crawler running hours per month	4
					Data Scanned by Athena per month (TB)	3
					Number of KMS Keys	1

Comparison Of Costing

Operation	Old Costing	Changes	New Estimated Cost
Storage	1500 USD per Month	Snapshot migrated to S3 and size will be reduced up to 8 times	190 USD per Month

Blogs

Snapshot Migration to S3 and Extract Data from Snapshot using Athena

Introduction

Architecture of Migration Process

Used Tools and AWS Components –

Steps Performed for this Activity

Execution Steps –

Step 1: Export of RDS DB Snapshot to S3 Bucket

Step 2: Create and Run Crawler in AWS Glue to export S3 data in Glue Data Catalog

Create Database

Step 3: In Athena, run of queries and store of queries output in S3 bucket

Costing Summary

Cost Estimation for the Complete Operation

Comparison Of Costing

Leave a Reply Cancel reply

Blogs

Introduction

Architecture of Migration Process

Used Tools and AWS Components –

Steps Performed for this Activity

Execution Steps –

Step 1: Export of RDS DB Snapshot to S3 Bucket

Step 2: Create and Run Crawler in AWS Glue to export S3 data in Glue Data Catalog

Create Database

Step 3: In Athena, run of queries and store of queries output in S3 bucket

Costing Summary

Cost Estimation for the Complete Operation

Comparison Of Costing

Tag -

Leave a Reply Cancel reply

Tips for writing a blog

Learn how to write a caption