In this article, you will learn to elastically resize the AWS Redshift cluster to save costs.
One of the primary differences between an on-premise static infra and cloud-based infra is elasticity. On-premise infra cannot be scaled down to tens of nodes from hundreds and scaled up again ad-hoc or even on a scheduled basis. The cost of infra is constant in a typical on-premise owned infra, and in most cases keep increasing due to maintenance costs. On a cloud-based infra, the first level of cost-saving comes from the managed services of cloud, where the cloud provider manages the infra and maintenance of the same along with technology stack. But usage of cloud-based services in a static mode where the infra is not scaled as per the workload requirement, it does not let one exploit the full potential of saving the cost. AWS Redshift is one of the most popular and frequently used data related services typically used for high volume data aggregations. Large scale Redshift clusters can cost thousands of dollars. Elastically resizing such clusters can result in huge cost savings. Without any further ado, let’s see how it can be done.
Elastically Resize AWS Redshift Clusters
Let’s get started with the cluster resizing exercise to understand how we can elastically resize clusters. Before we get started, we need to have a Redshift cluster in place. If you are new to Redshift, consider referring to this article, Getting started with AWS Redshift, to create a new Redshift cluster. Once the cluster is in place, navigate to the clusters list and open the Actions menu. You would find the Resize option as shown below. Click on the Resize button to open the cluster resize page where we would find the options to configure or initiate resize of the AWS Redshift cluster.
Once the “cluster resize” wizard starts, two types of resize options are presented. One option is the Classic resize option and the other option is Elastic resize option. Let’s try to understand the difference between these two options.
- Classic Resize – This option creates a new AWS Redshift cluster in the background and copies the data from the source cluster to the newly created cluster. During these options, the cluster would be in read-only mode, and the time it takes to complete the resize process and switch over existing connections from the source cluster to the new cluster varies depending on the volume of data in source cluster and query load on it. You can read more about this option from here
- Elastic Resize – In this option, the process of resizing is similar with subtle differences. But this option is considerably faster and has certain constraints under which it operates. For example, it requires a snapshot in place before starting the resize process. It won’t sort tables or reclaim disk space with vacuum commands. You can about these differences in detail from here. For now, we can just consider the fact that it’s the faster and latest resizing feature, and we will be using it for this exercise
Below the resizing options, the details of the current cluster configuration are listed as shown below:
In the new configuration section, one can select the new capacity in terms of the number of nodes after the cluster is resized. Let’s say, we want to scale the cluster from two nodes to four nodes as shown below. After selecting the number of nodes, the next configuration is to select when to resize the cluster. The default selection has resized the cluster now, which is generally used for ad-hoc cluster resize use-cases.
In case, the requirement is to resize the cluster ad-hoc or as a one-time activity, but not instantly, instead of at a later point in time, one can schedule the resize option as shown below. Provide schedule name, date and time when the resize process should start, and an IAM role that has required permissions to execute administrative actions like resizing the AWS Redshift cluster.
The third option is to schedule a recurring resize operation. In this case, recurring means that resizing the cluster to the new size and then resizing it back to the older size. This option is typically used in workloads where the cluster capacity needs to elastically scale depending on the load on the system. For example, during batch production workloads or during non-office hours, one may want to scale-up or scale-down the workloads during a certain time or event, and then reverse the resize operation.
Let’s say, we want to resize the cluster just for demonstration purposes. Provide a schedule name, and start time as well as end time with a short time gap around an hour. Time is in UTC time zone, so convert your local time to UTC before specifying. Select the new number of nodes. In this case, here we have selected 4 nodes to scale from 2 nodes. Let’s say we want to do it on a weekly basis, so select Weekly as the option. If you are just trying out this feature one-time, the selection on this option does not matter, as after the resize operation, you may want to consider terminating the cluster itself to avoid any extra costs.
The next option is to provide the date and time when the cluster size should be scaled back. The next option is to provide the IAM role with the required permission to schedule and resize the cluster.
If you do not have an existing IAM role, one can navigate to IAM and create a new role. In the services selection step, select the Redshift Scheduler service as shown below.
In the permission step, select the required permissions. Selecting AmazonRedshiftFullAccess is not advisable, but is a sure shot way to ensure that the service has full access for a short duration just for the scope of demonstration purposes.
Provide a relevant name for the role as shown below and create the role.
Once the role has been created, select the role as shown below. Now we have provided all the details to successfully create a resize schedule for the AWS Redshift cluster. Click on Create button to create the schedule.
Once the schedule is successfully created, you would be able to find the same under the schedule tab of the cluster properties in the resize schedule sub-section as shown below.
Once the resize operation gets triggered at the scheduled time, you would find the status of the cluster as shown below. This is an indication that the cluster resize is under progress.
After the cluster is resized you would be able to see on the cluster properties page that now the number of nodes has increased from two to four as shown below.
Navigate to the schedule tab and you would find that the resize operation is complete very quickly and the status of the cluster is available.
Once the scale-down resize schedule triggers, you would again find the status of the cluster changes as shown below.
After the cluster has been resized back to its original state, you would find the status as available and the status of both the resize schedules as completed.
Considering that AWS Redshift cluster supports per-second billing, the cost of the cluster would be depending on the size of the cluster at any given time. So elastic resizing helps to meet the elastic capacity requirements as per the workloads and saves cost effectively.
Conclusion
In this article, we learned how to elastically scale AWS Redshift clusters, as well as different options to perform resizing of cluster on-demand or on a scheduled basis. We also understood different types of elastic resizing options, and the method of creating IAM roles that would be used by AWS Redshift Scheduler.
- Finding Duplicates in SQL - February 7, 2024
- MySQL substring uses with examples - October 14, 2023
- MySQL group_concat() function overview - March 28, 2023