Querying an object's access tier
Posted: | Updated: | Tags: aws cloud athena storage tilAmazon S3 Intelligent-Tiering moves your data to the most cost-effective S3 storage tier based on the object’s access pattern for the price of $0.0025 per 1,000 objects it monitors. Since the movement is done by the service you don’t know, or need to know, the access tier the object is currently in as all objects can be retrieved asychronously. If you opt-in for asynchronous archive tiers, you can find out if an object is in one of these tiers by requesting the HEAD
of an object. This only works for these opt-in tiers, if you’d like to find out if the object is in the Frequent Access, Infrequent Access or Archive Instance Access tiers you will need to refer to the Amazon S3 Inventory. The S3 Inventory provides a snapshot of your object’s metadata at a daily or weekly frequency, this snapshot also includes S3 Intelligent-Tiering access tier, the key we are interested in.
Well, not me personally, but I did answer a question on StackOverflow that asked how to get this information, so I might as well share1 this here.
The short answer is to set up S3 Inventory on the bucket or prefix level of the objects you are interested in. You can then either download this report and query it locally or use Amazon Athena to query this report for you. I’ll be going the Amazon Athena route in the steps below.
Configure S3 Inventory
- Navigate to the bucket which has the objects you wish to query and click on the Management tab.
- Under Inventory configuration click Create inventory configuration.
- Choose an appropriate scope based on your object prefixes along with the object version, if versioning is enabled.
- Choose an output location for this report, in an S3 bucket.
- For Output format, select a format that’s appropriate for how you wish to query your data. All three formats - CSV, ORC and Parquet - are supported by Athena, although querying ORC and Parquet files are more efficient.
- For _Additional metadata fields, tick Intelligent-Tiering: Access tier.
- The other options can be configured based on your requirements.
Read through the documentation for other options and considerations such as bucket policies, and encryption keys.
Setup Athena
If it’s your first time using Athena, going to the Athena console will prompt you to set up the web editor. You can go ahead and complete this.
If you choose to use the AWS CLI to run your queries, as I will demonstrate later, set up a workspace and output location for that. You can choose to use the default workspace or create a new one. Set the query output location to an S3 bucket you can access and note down the workgroup name for the CLI command.
Create a database and table in Athena
From the Athena web editor, create a new database. Replace DATABASE_NAME
with your desired name.
CREATE DATABASE DATABASE_NAME
No need to run a Glue crawler or define our own table schema as the documentation has the outline for us based on which output format we selected for the S3 Inventory report. Pick the query that matches your output format, edit the table name and change the location to match that of your S3 Inventory report’s output, make sure to copy the S3 URI of the hive/
prefix. This will only appear after the first report has populated the bucket which could take up to 48 hours.
Query the access tier from the web editor
Once the table report has run and the table is created you can use the Athena editor to query the access tier of an object. Replace OBJECT_NAME
with the key of your object, and DATABASE_NAME
and TABLE_NAME
with the resources we created before.
SELECT dt, intelligent_tiering_access_tier FROM "DATABASE_NAME"."TABLE_NAME"
WHERE key = 'OBJECT_NAME'
ORDER BY dt DESC
LIMIT 1;
The query will return the latest access tier of your object.
Query the access tier from the AWS CLI
If you wish to use the CLI you can use the start-query-execution
command. Replace the OBJECT_NAME
, DATABASE_NAME
, TABLE_NAME
and WORKGROUP_NAME
to match the object you are querying and the resources previously created.
EXECUTION_ID=$(aws athena start-query-execution --query-string "SELECT dt, intelligent_tiering_access_tier FROM DATABASE_NAME.TABLE_NAME WHERE key = 'OBJECT_NAME' ORDER BY dt DESC LIMIT 1;" --work-group "WORKGROUP_NAME" --query-execution-context Database=DATABASE_NAME,Catalog=AwsDataCatalog | jq -r ".QueryExecutionId")
The command output the query execution ID, as Athena asynchronously runs your request. My AWS CLI defaults to JSON and I parse the ID using jq
to store in an environment to use next.
aws athena get-query-results --query-execution-id $EXECUTION_ID --output json | jq -r ".ResultSet.Rows[1].Data"
The output of the get-query-results
command will return the object’s access tier and inventory date based on our query. Again, I use jq
, this time to print the values that are relevant to us.
Hopefully, this helps others, a bit of a niche ask but this could be modified for a number of different use cases.
Sharing my answer from StackOverflow under CC BY-SA 4.0, edits are made to give more information that was necessary for the StackOverflow question and to relax the language a bit. ↩︎