AWS S3 (Simple Storage Service) is a cloud storage service provided by AWS. AWS started S3 in 2006 and offers unlimited storage space for any file. It solved the central issue of upfront hardware costs for many small and mid-sized companies. It allowed companies to focus on application development catering to actual business needs rather than supporting infrastructure and maintenance. 

AWS provides a Simple Storage Service (also known as S3) for storing any object or file. It is a highly available, scalable, durable, and performant service. It can be used to backup and restore, disaster recovery, archive solutions ( AWS Glacier), and build AWS native data lake. It further acts as the backbone for storage to many AWS Analytical services. 

You can refer to the S3 notes below if you are planning to appear for various AWS certification exams like:

In this article, we’ll talk about the following topics in detail: 

S3 Bucket

We can store the data in S3 Buckets. S3 Bucket is the container that holds all the data. S3 bucket name must be unique globally and must follow the following conventions:

  • Bucket names should be greater than three characters and less than 63 characters.
  • It can only have small case letters, numbers, dot and hyphen
  • Must end with a character or letter
  • Can not follow IP address format as the bucket name
  • Cant start with prefix xn--
  • Cant end with the suffix -s3alias
  • Buckets used for s3 transfer acceleration can not have a dot in them.

S3 URI

  • The S3 URI (unique name identifier) follows below format: s3://the-bucket-name/directory/nested-directory-object.extension
  •  The bucket name is ‘the-bucket-name’. 
  • The object key is ‘directory/nested-directory-object.extension’
  • The object key can have a forward slash (/) in it. It gives the impression of a directory or node structure.

S3 Bucket supports the following features:

Bucket Versioning

  • Bucket versioning is a bucket-level feature.
  • It allows uploading multiple versions of the same file.
  • When you delete the s3 object, S3 only adds a delete marker as the s3 version.
  • One can recover the latest object version by deleting the object version.
  • The older object’s version can also be retrieved using the version key.
  • It helps to recover the data in case of accidental delete.

Multi-factor authentication (MFA) delete

  • MFA delete provides an additional layer of security to avoid unintended deletions.
  • Once enabled, additional authentication is required when changing an existing object’s versioning state or permanently deleting an object version.

Tags

  • S3 object tags are key-value pairs associated with data.
  • One s3 object can have ten tags associated with it.
  • Tag key and values are case-sensitive.
  • You can use it to enable fine-grained access control.
  • You can use it to enable object life cycle management with a tag-based filter.
  • It can be used with AWS Cloud watch metrics by specific tag.

Encryption

  • AWS S3 support client-side encryption and server-side encryption
  • It supports three types of S3 server-side encryption:
    • SSE-S3
      • S3 service manages the encryption/decryption keys.
    • SSE-KMS
      • AWS KMSmanages the encryption/decryption keys.
    • SSE-C
      • The customer can provide his encryption-decryption keys. 
      • S3 service is responsible for The encryption/decryption operation.
      • S3 won’t store the keys.
      • The client must send the key over HTTPS
  • In client-side encryption, the data is encrypted at the client-side and then stored on S3.
  • The client is responsible for maintaining the encryption-decryption keys when using the client-side encryption method.

S3 Storage class

  • S3 supports multiple storage classes.
  • Each object can have its storage class.
  • The storage classes are:
    • Standard S3
      • You can use it for frequently used data.
    • Intelligent Tiering
      • S3 service moves data intelligently between different storage classes depending on access patterns to optimize the cost.
    • Standard infrequent ACCESS (S3 Standard-IA)
      • Used for infrequently accessed files those need to be stored for a longer duration
    • One zone infrequent access (One zone-IA)
      • Store infrequently accessed data.
      • S3 will store the data in a single availability zone only.
    • Glacier:
      • Archive infrequently accessed data.
      • Have retrieval latency of few hours (3-5 hours)
    • Glacier Deep Archive
      • You can use it for data that hardly requires any retrieval.
      • Retrieval time is up to 12 hours.

Intelligent Tiering Archive configuration

  • AWS S3 moves data intelligently move data between storage classes to optimize the storage cost.
  • Provides two archiving options:
    • Deep Archive Access Tier
    • Archive Access Tier
  • S3 moves the data that has not been accessed by at least 30 days to the infrequent access storage class.
  • S3 then moves the data to Glacier after 90 days if not accessed. You can configure this duration from 90 to 730 days.
  • S3 then moves data to S3 Glacier deep Archive if not accessed for 180 days.

Server access logging

  • Once server access logging is enabled, S3 services logs access requests in a separate s3 bucket.
  • The source bucket and log bucket must be in the same region and owned by the same account.
  • Server access logging logs include:
    • Authentication failures
    • Object operations
    • Bucket op[erations
    • Various fields such as total time, object size, HTTP referer, and turn around time.
    • Lifecycle transitions, object expiration, and restores
    • Logs keys collected from batch delete operations
  • No additional cost apart from log storage cost.
  • S3 service will deliver the logs every few hours.

AWS Cloud trail data events

  • S3 service allows logging for object-level API operations through AWS Cloud Trail Data events.
  • It can deliver logs to multiple destinations.
  • You can store the logs on the cross-account bucket as well.
  • Logs include encryption operations, object operations, bucket operations.
  • Data events will be available every 5 minutes. Management event will be available in 15 minutes.

Event Notifiaction

  • S3 event notification sends out notification in response to s3 object-level operations.
  • The object event can be new object created event (s3:ObjectCreated:*), object removal event (s3:ObjectRemoved:*), restore object event (s3:ObjectRestore:*) or replication event.
  • S3 service sends the S3 event notification to the following AWS Services:
    • Amazon Lambda
      • The source S3 bucket and lambda must be in the same AWS region.
    • Amazon Simple Notification Service (Amazons SNS)
      • The source S3 bucket and SNS must be in the same AWS region.
    • Amazon Simple Queue Service (Amazon SQS)
      • The source S3 bucket and the SQS must be in the same AWS region.

Transfer acceleration

  • S3 transfer acceleration (S3TA) is a bucket-level feature.
  • It allows faster and secure data transfer of files over a long distance.
  • AWS built an S3 transfer Accelerator on top of a globally distributed Amazon CloudFront edge location.
  • Once transfer acceleration is enabled, you would receive accelerated endpoint edge. mycloudnotes.s3-accelerate.amazonaws.com
  • The data transferred to this endpoint goes to the cloud front edge location and then travels internally on the AWS network instead of the internet.
  • You can use S3 transfer acceleration when you want your clients all over the world to upload data to a centralized location.

Object Lock

  • S3 object lock helps you prevent data from being deleted or overwritten for a certain amount of time or indefinitely.
  • It follows Write-once-read-many (WORM) model.
  • You will need to enable the object lock feature at the time of bucket creation. You cannot allow object lock once you create the bucket.
  • You must enable the object version to use the S3 object lock feature.
  • Object lock provides two ways for object management and retention.
    • Retention period:
      • The object remains locked for a certain amount of time and can’t be deleted or overwritten.
    • Legal Hold:
      • Same as that of retention period but with no retention period.

Requester Pays

  • In general, the S3 bucket owner bears the cost of data storage and transfer.
  • Requestor pays features to allow the requester to pay the cost of the request and the data downloaded.
  • The bucket owner still gets charged for data storage.
  • The requester must include x-amz-request-payer in request headers (POST, GET, and HEAD)

Static web hosting

  • S3 service allows you to host a static website using S3 static web hosting.
  • You can specify the index page or error page while enabling static web hosting.

Block public access bucket settings

  • S3 allows/block public access to buckets and objects through access control list, bucket policies, and access point policies.
  • The block public access setting applies to the bucket and its access points.
  • It enables blocking bucket and object through:
    • New access control lists (ACLs)
    • Any access control lists (ACLs)
    • New public bucket or access point policies
  • It also enables blocking public and cross-account access to buckets and objects through any public bucket or access point policies.

Bucket Policy

  • S3 bucket policy is a resource-based IAM policy.
  • You can allow/disallow access to the bucket and its resources using the bucket policy.
  • The typical bucket policy includes:
    • Resources: Bukctes, objects, or access points
    • Actions: Actions to be allowed or disallowed on Resources, e.g., s3:ListBucket
    • Effect: Allow or Deny
    • Principal: The account or user to allow access to resources
    • Conditions: Conditions when the policy is in effect

Object Ownership

  • You can use S3 object ownership property to control who owns the object when uploaded by a different account/account user.
  • By default, the object ownership lies with Object Writer.
  • You can transfer the object ownership to the bucket owner by using the bucket-owner-full-control flag while uploading the object.

Access Control List (ACL)

  • S3 Access Control List is used to manage access to the bucket and objects.
  • The GRANTEE is assigned the object and bucket ACL.
  • The grantee can be
    • Bucket owner
    • Public (everyone)
    • Authenticated AWS user groups 
    • S3 log delivery group
  • ACL permission can be
    • READ
    • WRITE
    • READ_ACP
    • WRITE_ACP
    • FULL_CONTROL

Cross-origin resource sharing (CORS)

  • CORS allows the resources belonging to one domain to interact with resources in different domains.

Storage metrics, request metrics, replication metrics

  • S3 provides various cloud watch metrics for usage, request, and data transfer activity.
  • It also provides the replication metrics.

Lifecycle rules

  • S3 life cycle rules allow object transition from one storage to another storage class, archiving them or deleting them.
  • You can configure lifecycle rules with the below actions:
    • Move the current version of the object to another storage class.
    • Move previous object versions to another storage class.
    • Expire current version of objects
    • Delete the previous version of objects
    • Delete incomplete multipart upload
    • Delete expired delete markers.

Replication rules

  • S3 replication rule helps to replicate data from source bucket to destination bucket.
  • You must enable the S3 object version on the source bucket for setting replication rules.
  • The replication can happen across regions (cross-region replication) or the same region (same-region replication)
  • You can replicate the data encrypted with AWS-KMS.
  • You can change the storage class for the replicated object.
  • You can enable Replication time control (RTC) to ensure 99.99% of data replication within 15 minutes. 
  • Replica modification sync setting allows replicating the metadata changes made to the source object.

Inventory configuration

  • S3 inventory creates the inventory report containing the list of objects created and their metadata.
  • You can generate the report on the S3 bucket in the same account or a different account.
  • You can set the report frequency to Daily or Weekly.
  • The report format will be CSV, Apache ORC, or Apache Parquet.
  • The additional metadata fields like size, You can add last modified, etc., etc., to the report.

S3 Access Points

  • The S3 access point is named network endpoints that are attached to the bucket.
  • The network origin can be VPC or Internet. If VPC, you can not access the bucket content over the internet.
  • S3 Access point is associated with its own Block Public Access Setting and Access Policy Setting.

S3 Object Lambda Acess Point

  • S3 object lambda access point allows you to run a custom transformation on stored S3 object while retrieving the object.
  • It will have its own S3 object lambda access policy and Public block Setting.

S3 FAQs

AWS S3 Presigned URL gets expired after expiry time. Anyone having access to S3 pre-signed URL can download the S3 object. Hence, it is necessary to protect the S3 pre-signed URL.
AWS S3 supports hosting static websites. Dynamic websites need a server to serve the dynamic content to the visitor. Hence, it is not possible to host dynamic websites only with S3. However, it is possible to host the dynamic website by using other AWS services in conjunction with S3.
S3 bucket can be mounted to EC2 instance as a file system known as S3FS. It is similar to an attached network drive.
S3 doesn't support rename functionality for the S3 bucket. You'll need to create a new bucket, copy all the data to the new bucket and delete the old bucket.
Yes, AWS S3 provides an access key and secret key that you can use to access S3 bucket/objects using AWS SDKs.
You'll need to delete all objects from the S3 bucket before deleting, including all the object versions. Also, MFA delete is enabled; you'll need MFA to delete the empty bucket.
AWS S3 object key supports forward-slash ('/') in object key. It gives the impression of having folders in the S3 bucket.
Yes. Amazon has advertised S3 as an infinitely scalable service without you needing to provision servers.
S3 is an object storage service and supports any time of objects. Databases don't support S3 as a direct storage system for databases. However, S3 can be used or backing up databases instances.
Various AWS services support querying S3 data. The services include the S3 select AWS Athena & Redshift spectrum.
No. Empty S3 buckets don't cost money.
AWS S3 uses a proprietary storage system developed in-house
AWS RDS doesn't use S3 for storing the database itself. However, it can be used for storing data snapshots.
Yes. It helps decrease the cost of data being transferred out of S3 only when the data is high volume. For low-volume data, using actual S3 is cheap.
Dynamodb uses its own storage system. However, Dynamodb can use S3 for storing the continuous data backup that you can use for point in time recovery (PITR)
Redshift uses redshift managed storage to store redshift data. Redshift spectrum, on the other hand, can query data stored on S3. S3 data can be loaded into redshift storage using load commands.
S3 can configure to use the VPC endpoint, which ensures that the S3 data can only be accessed through VPC.
S3 APIs support both HTTP and HTTPS protocols. It is recommended to use HTTPS protocol.
AWS CloudFront is having an edge location across the world. When the user requests the file, AWS CloudFront checks if the file is cached on an edge location. If not, it will retrieve the file from the S3 bucket. If cached, it will serve the file content from the cache reducing the stress on S3.
S3 bucket can be accessed through AWS web console. An alternate way is to use the access key and secret key with AWS SDKs to access the data on S3.
Private AWS S3 bucket can be accessed using the access key and secret key with AWS SDKs.
You can access the S3 bucket in the browser using the AWS web console.
You can specify bucket lifecycle policy and move S3 objects to Glacier archive or Glacier Deep archive after a certain duration.
You can delete AWS 3 bucket using AWS web console or AWS SDKs like boto3. You'll need to delete all S3 objects (including all versions) before deleting the S3 bucket.

Conclusion

In this article, we looked at various AWS S3 concepts and features.