S3 Cost Savings

Amazon S3 

Amazon S3 is a fantastically versatile and scalable storage solution, but keeping costs under control is crucial. This blog post dives into key strategies to lower AWS S3 costs gradually.

Optimizing Storage With Storage Classes

S3 offers a variety of storage classes, each with different pricing based on access frequency. Here's how to leverage them:

Automated Cost Management Using Terraform Configuration

Shell
 
# Configure AWS Provider

provider "aws" {

  region = "us-east-1" # Replace with your desired region

}



# Define your S3 Bucket

resource "aws_s3_bucket" "my_bucket" {

  bucket = "my-bucket-name"

  acl    = "private"

}



# Define Lifecycle Policy with transition rules

resource "aws_s3_bucket_lifecycle_configuration" "lifecycle_rules" {

  bucket = aws_s3_bucket.my_bucket.id



  rule {

    id     = "transition-to-ia-and-glacier"

    status = "Enabled"



    # Transition to S3 Standard-Infrequent Access after 90 days

    transition {

      days          = 90

      storage_class = "STANDARD_IA"

    }



    # Transition to Glacier storage after 1 year in S3

    transition {

      days          = 180

      storage_class = "GLACIER"

    }

  }

}


Explanation

  1. Provider configuration and S3 bucket definition: These remain the same as in the previous example.
  2. S3 lifecycle policy: Similar to before, this defines the lifecycle rules for the S3 bucket.
  3. Lifecycle rule: The rule uses the same structure but has two transition actions defined within a single rule block.
    1. First transition: Objects are transitioned to the S3 Standard-Infrequent Access (STANDARD_IA) storage class after 90 days (adjustable based on your needs). This provides a cost-effective option for data accessed less frequently than daily but still requires faster retrieval times compared to Glacier.
    2. Second transition: After spending 180 days in S3-IA (again, customize this period), objects are transitioned to the Glacier storage class for the most cost-effective long-term archival.

Benefits of this Approach

Additional Considerations

Remember, retrieving data from Glacier incurs retrieval fees, so factor in access needs when defining transition times.

Explore using S3 Intelligent-Tiering if you have unpredictable data access patterns, as it automatically manages object movement between storage classes.

Additional Cost-Saving Strategies

Incomplete Multipart Uploads

Multipart uploads are a method for efficiently uploading large files to S3. It breaks the file into smaller chunks, uploads them individually, and then S3 reassembles them on the backend.

How Multipart Uploads Work

Imagine uploading a large video file to S3. With multipart uploads, S3 allows you to split the video into smaller, more manageable parts (e.g., 10 MB each). Each part is uploaded as a separate request to S3. After all parts are uploaded successfully, you send a final "complete multipart upload" instruction to S3. S3 then reassembles the individual parts into the original video file.

Incomplete Multipart Uploads

Problems can arise during the upload process. An internet outage, application crash, or other issue might interrupt the upload before all parts are sent. In such scenarios, S3 won't have all the pieces to create the complete file. These interrupted uploads are called "Incomplete Multipart Uploads."

Why They Matter

Although the complete file isn't created, S3 still stores the uploaded parts. These leftover parts occupy storage space and you get billed for them. If you have many incomplete multipart uploads, it can significantly inflate your S3 storage costs. Additionally, they clutter your S3 bucket, making it harder to manage your data.

Key Points To Remember About Incomplete Multipart Uploads

They occur due to interrupted uploads before completion. They consume storage space and incur charges. They can be identified and cleaned up manually or through automated lifecycle rules. By managing incomplete multipart uploads, you can optimize your S3 storage usage and costs.

Code 

Shell
 
resource "aws_s3_bucket_lifecycle_configuration" "default" {

  bucket = aws_s3_bucket.your_bucket_name.id



  rule {

    id     = "incomplete-multipart-uploads"

    status = "Enabled"



    abort_incomplete_multipart_upload {

      days_after_initiation = 7  # Change this value as needed (minimum 1)

    }

  }



  rule {

    id     = "expired-uploads"

    status = "Enabled"



    # This rule targets the objects transitioned from the previous rule

    noncurrent_version_transition {

      days          = 0  # Expire immediately after transition

      storage_class = "GLACIER"  # Optional: Specify a cheaper storage class

    }

  }

}



resource "aws_s3_bucket" "your_bucket_name" {

  # ... your bucket definition ...

}


Important Considerations

Aborting an incomplete multipart upload doesn't automatically remove any data already uploaded as individual parts. You might need to consider additional cleanup logic. Before removing IMUs, ensure they are truly unwanted and not actively being completed. By implementing one of these approaches, you can integrate incomplete multipart upload removal.

S3 Versioning 

S3 Versioning is a powerful feature offered by Amazon Simple Storage Service (S3) that allows you to keep track of all the different versions of objects you store in your S3 buckets. 

Understanding Versioning Costs

S3 charges storage for all object versions, including the latest and all previous ones. This cost can accumulate if you retain numerous versions for extended periods.

Cost-Saving Strategies

1. Version Lifecycle Management

2. Versioning for Critical Objects Only

3. Regular Version Cleanup

4. Analyze Versioning Needs

Additional Considerations

Code 

Shell
 
# Configure AWS Provider

provider "aws" {

  region = "us-east-1" # Replace with your desired region

}



# Define your S3 Bucket

resource "aws_s3_bucket" "my_bucket" {

  bucket = "my-bucket-name"

  acl    = "private"

}



# Define Lifecycle Policy with versioning cleanup

resource "aws_s3_bucket_lifecycle_configuration" "versioning_cleanup" {

  bucket = aws_s3_bucket.my_bucket.id



  rule {

    id     = "delete-old-versions"

    status = "Enabled"



    # Delete non-current versions after 30 days

    noncurrent_version {

      days          = 30

      # Optional: exclude specific prefixes from version deletion

      exclude        = ["critical-data/"]  # Adjust as needed

    }

  }

}


Consideration

Important Note

Once a version is deleted through a lifecycle rule, it's gone permanently and cannot be recovered. Ensure your retention period is sufficient for your rollback and audit requirements.

Request Minimization

Analyze S3 requests (PUT, GET, DELETE, etc.) and identify areas for reduction. This reduction in requests can lead to significant cost savings, especially when dealing with cloud-based services that charge per request.

Utilizing Caching Mechanisms

Optimizing Data Transfers

Designing for Efficient Access

For Big Data Caching Mechanism 

Steps

1. Configure a Cache Bucket in S3

2. Enable Spark Result Fragment Caching

3. Optional Configuration (Advanced)

How It Works

When Spark Result Fragment Caching is enabled, Spark analyzes your Spark SQL queries. It identifies subqueries or parts of the query plan that produce frequently used results (fragments). These result fragments are then cached in the designated S3 bucket after the first execution. Subsequent queries that utilize the same cached fragments can reuse them directly from S3, significantly improving performance compared to recomputing the results.

Benefits

Things To Consider

This caching mechanism is most effective for queries that have significant data reduction after certain stages (e.g., filtering, aggregation). The effectiveness also depends on the access patterns of your queries. If subsequent queries rarely reuse the cached fragments, the benefit might be minimal. Consider enabling S3 Lifecycle Management for the cache bucket to automatically manage older versions of cached fragments.

Transfer Acceleration

While S3 Transfer Acceleration can boost data transfer speeds, it comes with additional costs. Evaluate if the speed improvement justifies the expense.

1. Evaluate Transfer Acceleration Usage

2. Leverage S3 Intelligent-Tiering

3. Optimize Data Transfer Size

4. Utilize S3 Glacier for Long-Term Archival

5. Monitor and Analyze Costs

By implementing these strategies, we can significantly reduce your S3 storage costs. Regularly monitor your usage and adapt your approach as your data needs evolve. Remember, the key lies in aligning storage classes with access requirements. There's no one-size-fits-all solution, so find the optimal mix for your specific use case.

 

 

 

 

Top