Enhancing User Experience: Strategies for Addressing Missing Image Assets hosted on S3 via Cloudfront

28 / Aug / 2024 by Sahil Sahni 0 comments

Introduction

In today’s competitive online environment, where user experience directly influences brand reputation and revenue, ensuring that every element of your website functions flawlessly is paramount. Broken image links, often overlooked, can quickly erode user trust and engagement. For businesses relying on AWS services like S3 and CloudFront, these issues can escalate if not addressed promptly. In this blog, we’ll delve into how you can harness the power of S3 logs to swiftly identify and resolve missing images, ultimately enhancing your website’s reliability and performance.

 

Problem Statement

Images often disappear when content is delivered through a CloudFront distribution using S3 as the source, leading to broken links that negatively impact user experience. Identifying these missing assets can become increasingly challenging as the volume of requests and assets grows. To mitigate these issues promptly and alert the right stakeholders, implementing a robust monitoring system is crucial.

Requirement

In websites with numerous images spread across different pages, manually checking each page for missing images is not only tedious but also inefficient. As the website grows, this task becomes increasingly challenging, leading to a higher risk of overlooked broken links. To address this issue, a centralized monitoring solution is necessary. This system will allow you to easily identify missing images across the entire site by visualizing the data in Kibana or receiving alerts through SNS. This approach ensures that any missing images are promptly detected and resolved, maintaining a seamless user experience.

Solution Overview

Arch Diagram

To tackle the challenge of identifying missing images, we can establish a monitoring system by analyzing S3 logs that tracks image requests that did not succeed. The solution comprises the following components:

  • CloudFront and S3 Configuration:

  • S3 is set as the source for CloudFront. When a user attempts to access an image not present in the bucket, S3 logs the request.
  • S3 Access Logging:

  • Enable server access logging for the S3 bucket designated as the origin. These logs capture all requests to the bucket, including those for missing images. The logs are stored in a separate S3 bucket that is specifically configured for analysis.
  • Log Collection with Logstash:

  • Logstash is utilized to gather and process the S3 access logs. It can be configured to interpret the logs, extract relevant information regarding missing images, and prepare this data for further analysis.
  • Data Processing and Visualization:

  • Elasticsearch and Kibana:The processed logs can be uploaded to Elasticsearch for indexing and searching. With Kibana, dashboards can be created to visualize the number of missing images over time.
  • SQS and Lambda:
    Additionally, the processed logs can be sent to an Amazon SQS queue. When new messages arrive in the queue, a Lambda function is triggered. After processing these messages, the Lambda function can utilize SNS to send out notifications.

Implementation Steps

To set up the monitoring system for identifying missing images in S3 from CloudFront requests, follow these steps:

1. Set Up S3 and CloudFront

  • CloudFront Distribution: Configure your CloudFront distribution with your S3 bucket as the origin.
  • Enable S3 Access Logging: Ensure that S3 access logging is enabled and configured to log to a separate S3 bucket.

2. Set Up Log Collection with Logstash

  • Install Logstash: Set up Logstash on your server.
  • Configure Logstash to Read S3 Logs:
    Logstash will read access logs from the designated S3 bucket.
    Use filters to analyze the logs, specifically targeting 404 errors, which indicate missing images.
  • The Logstash configuration for filtering 404 errors is written as follows:
 
input {
  s3 {
    bucket => "my-log-bucket-111"  # Replace with your log bucket name
    prefix => "logs"               # Specify the path prefix for logs in your S3 bucket
    region => "us-east-1"          # Replace with your bucket's region
    codec  => "plain"
    type   => "logging"
  }
}

filter {
  grok {
    match => {
      "message" => '%{WORD:owner} %{NOTSPACE:bucket} \[%{HTTPDATE:timestamp}\] %{NOTSPACE:clientip} %{NOTSPACE:requester} %{NOTSPACE:request_id} %{NOTSPACE:operation} %{NOTSPACE:key} "%{NOTSPACE:request_method} %{NOTSPACE:request_url} %{NOTSPACE:request_protocol}" %{NOTSPACE:status_code}'
    }
  }

  # Drop logs that are not 404 errors
  if [status_code] !~ /^404$/ {
    drop {}
  }

  if [type] == "logging" {
    mutate {
      add_field => { "[@metadata][index_name]" => "s3-404-logs-%{+YYYY.MM.dd}" }
    }
  }
}

output {
  if [type] == "logging" {
    # Output to Elasticsearch
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "%{[@metadata][index_name]}"
      manage_template => false
    }

    # Output to SQS
    sqs {
      id => "my_sqs_output_prod"
      queue => "failed-images"   # Replace with your SQS queue name
      region => "us-east-1"      # Replace with your SQS region
      codec => "json"
    }
  }
}

3. Data Processing and Visualization

  • Elasticsearch and Kibana:
    Forward the filtered logs to an Elasticsearch cluster. Use Kibana to create visualizations and dashboards that track the occurrence and trends of missing image requests

 

kibana visualization

 

  • SQS and Lambda
    As an alternative, direct the data to an Amazon SQS queue. Set up a Lambda function to activate upon receiving new messages. This function can handle the log data and use SNS to notify the team about any missing images.

 

  • Important: Replace sns_arn with your actual SNS topic ARN in the Lambda function.
    import boto3
    import json
    import logging
    import urllib.parse
    import os
    
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    def sendSnsNotification(sns_arn, snsSubject, snsMessage, region_name):
      logger.info("SNS subject: " + snsSubject)
      logger.info("SNS message: " + snsMessage)
      client = boto3.client('sns', region_name=region_name)
      response = client.publish(
          TopicArn=sns_arn,
          Subject=snsSubject,
          Message=snsMessage,
      )
      logger.info(response)
    
    def lambda_handler(event, context):
      logger.info("Event: " + json.dumps(event))
      try:
          if 'Records' in event:
              for record in event['Records']:
                  if isinstance(record, dict):
                      body_content = record.get("body", "")
                      logger.info("Raw body content: " + body_content)
    
                      try:
                          body = json.loads(body_content)
                          logger.info("Parsed Body: " + json.dumps(body))
                      except json.JSONDecodeError as e:
                          logger.error("JSONDecodeError: Could not parse body content: {}".format(e))
                          continue
    
                      key = body.get('key')
                      bucket = body.get('bucket')  # Extract the bucket name from the body
                      logger.info(f"Bucket: {bucket}, Key: {key}")
    
                      if not key or not bucket:
                          logger.warning("Key or bucket name is missing in the message body.")
                          continue
    
                      file_name = urllib.parse.unquote(key)
                      missing_uri = f"s3://{bucket}/{key}"
                      print("sns arn printing")
                      sns_arn = “arn:aws:sns:us-east-1:123456789012:my-topic”                   # Replace with your SNS ARN
                      sns_subject = f"ToTheNew | Failed Images| Event Notification"
                      sns_message = f"The file '{missing_uri}' was not found in the bucket 's3://{bucket}/'."
    
                      sendSnsNotification(sns_arn, sns_subject, sns_message, region_name="us-east-1")
    
      except Exception as e:
          logger.error("Error processing records: {}".format(e))

     

    mail alert

    Conclusion

    Effectively overseeing and tracking missing images in a CloudFront distribution with S3 as the source is vital for ensuring a seamless user experience. By utilizing S3 access logging, Logstash, Elasticsearch, Kibana, SQS, Lambda, and SNS, we can create a reliable system for detecting and informing teams about absent assets. This approach not only aids in pinpointing problems but also offers valuable insights into how users interact with the content delivery network.

 

FOUND THIS USEFUL? SHARE IT

Leave a Reply

Your email address will not be published. Required fields are marked *