5 strategies for practicing Snowflake cost management while still enjoying all the platform’s advantages
Article by Mick Wagner
When organizations make the decision to implement the Snowflake data warehouse as a service (DWaaS) solution, cost management usually ranks high among the chief reasons. Avoiding the high costs of building and maintaining an on-prem platform and the risks of over-paying for unused system resources can certainly be attractive incentives. Even so, we often hear of cases of “sticker shock” when the first Snowflake bill arrives, or when administrators notice their credits dwindling at alarmingly fast rates.
Yes, Snowflake can enable organizations to lower overall data warehousing costs — if the solution is used in a cost-effective manner. The pay-as-you-go pricing model is comparable to using Uber as opposed to owning your own vehicle: you avoid the expense of hardware and maintenance, but every use incurs a discrete cost. If those uses are not prudently executed, tightly controlled, and closely monitored, any hoped-for overall cost savings can quickly dissipate.
Fortunately, many of the keys to more effective Snowflake cost management are built into Snowflake itself, and knowing how to use these features can mean the difference between cost savings and a budgetary fiasco. The following five strategies can be particularly effective in preventing credits from slipping away and costs from spiraling out of control.
1. Use Resource Monitors and monitor queries with Snowflake’s Query Profile tool.
With Resource Monitors, you can track Snowflake usage at both the user level and the service account level. Monitors can notify you when consumption thresholds are reached and even auto-suspend virtual warehouses or accounts that surpass those thresholds. With Snowflake’s Query Profile tool, you can see what your longest-running queries are, so you can identify ways to improve performance.
2. Set timeouts for workloads and accounts.
You can use the STATEMENT_TIMEOUT_IN_SECONDS parameter to automatically stop queries that are running too long. This will help prevent queries with errors from consuming too many resources and running up your costs.
3. Scale your virtual warehouse appropriately and use auto-suspend/auto-resume.
A good rule of thumb with Snowflake warehouses is to start small and increase the size as needed. Since the cost of a warehouse doubles each time you each size up, starting smaller with longer running queries will allow for better cost management and monitoring.
Also, by creating virtual warehouses for specific use cases, you’ll be able to customize them for each case. For example, you can set up ELT warehouses to auto-suspend immediately after completion, since they’re no longer needed after you run them. Report warehouses, on the other hand, should have some time built in before suspending (5 to 10 minutes is a good starting point), as people are running reports and the cached data in both the result cache and warehouse cache allows for quicker data retrieval.
You may also want to set up a SQL script to change the warehouse size at certain times of day based on user consumption patterns. There may be a heavy use period in the morning, for example, when people get into the office and run reports, which could warrant a larger warehouse; in the afternoon, when demand decreases, a smaller warehouse may be all you need.
4. Model your data appropriately.
When loading your data, sort it before you ingest, as Snowflake partitions the data in the order it’s ingested. Also, using the COPY INTO command instead of INSERT will allow the data to load faster, as COPY INTO uses bulk loading, which is more efficient and cost effective.
Remember to model your data to answer business questions. Star schemas are commonly used for this purpose.
You’ll also want to pre-aggregate data wherever possible, since storage is cheaper than compute. This will not necessarily work for near-real-time data, but if you are batching data several times a day, it will help keep your compute costs down, as your BI tool will not have to query as much data to present summary information.
Also review your SQL to avoid DISTINCT, GROUP BY, or ORDER BY if they are not necessary.
Create cluster keys on fields that are used for filtering (like dates and/or codes) or that are used in joins. Remember, you can create a cluster key on more than one field.
Cluster keys can help partition the data better in Snowflake micropartitions, which will provide better pruning of unnecessary data from a query.
Snowflake is one of the latest “as a service” innovations that enable businesses to take full advantage of all the cloud offers, including the potential for overall cost savings over on-prem solutions. By implementing these five strategies, your organization can enjoy flexibility, data sharing, availability, performance, speed, and other benefits of using Snowflake, while still keeping your budget in check.
Ready to put your data to work for you? We can help. Visit our page here: https://www.logic2020.com/services/advanced-analytics