Bloom Analytics and Consulting, LLC

The best technology is built to solve a problem in such a way that you cannot imagine going back to the old way. Snowflake is one of those technologies.

I spent most of my career studying and working with SQL Server. While I still love working with SQL Server, the past couple of years working with Snowflake has been a revelation. Let’s dive into a few of the reasons I love working with Snowflake:

Unlimited Concurrency
Instantaneous and Secure Data Sharing
Immediate Scaling of resources
Platform agnostic

Almost all of the incredible functionality in Snowflake owes its existence to the core concept of micro partitions. All data in Snowflake tables is stored in immutable micro partitions. Any changes or updates do not alter existing micro partitions but rather generate new micro partitions. The table’s metadata layer is then updated to point towards this new information.

This means that data can be queried at the same time that data manipulation statements are occurring. It provides no limit to concurrency as there are no read locks required on these micro partitions. In fact, in Snowflake locks are not needed at all.

These micro partitions also allow Snowflake to enable their end users to share data with third parties in a secure and reliable way. Imagine not having to deal with CSVs at all. If you and your vendor are both in Snowflake, it is a simple process to set up a data share between you both. You can literally query your vendor’s data directly as it is imported by the vendor.

When querying you are using your own compute resources and not the vendor’s. This means that there is no cost incurred by the vendor when you access the data share outside of storage. The opposite is true if you are sharing your data with other organizations.

Provisioning new compute resources (labeled warehouses in Snowflake) is an instantaneous process. A simple SQL statement can create a new warehouse or alter an existing one to increase or decrease the compute size (and therefore the cost). This scaling can be both up (increasing the cores and memory available) and out (increasing the number of clusters).

You only spend money when your warehouses are running so it is not uncommon to have a number of specific warehouses created for certain processes. Auto-suspension of these warehouses allows you to save on cost by ensuring that they are not running when they are not needed.

Lastly, Snowflake runs on AWS, Microsoft Azure, and GCP. This is great because a lot of businesses consider themselves to be a Microsoft shop, or AWS only, etc. These businesses can stay in those environments with minimal changes from an infrastructure view.

This also allows Snowflake to provide the ultimate in disaster preparedness. Snowflake provides all of its customers with triple redundant backups within a cloud platform region (think Azure US-East2, etc.). Snowflake also allows you to create a secondary account on a separate region so if your primary region goes down you can migrate to the secondary.

Additionally, you can create another account on a different platform altogether (i.e. Azure and AWS, or Azure and GCP). With account level replication now generally available you can be up and running on your secondary account with a couple of SQL statements. Replication costs compute resources and there are egress charges, but downtimes may end up costing more.

For me, the answer to “Why Snowflake?” is because it solves all of the problems standard relational databases experience. After reading all this, I have to ask you…

“Why not Snowflake?”

Why Snowflake?

Share this:

Leave a comment Cancel reply