Concurrency really matters for SQL Data Warehouse workloads! There is a lot of talk about what you really need to have an analytics-driven organization when it comes to the underlying data platform, which will need to serve all of your data to all of your users.
The need for a server that is capable of serving in-house Data and that of users has been a concern for organizations. Do you need to store data in tables or files? What you need is something that stores data in these two cases while being able to deliver high performance on lots of data. Snowflakes are the single platform that keeps both lake-use data and warehousing workloads that need high concurrency.
Why do you need concurrency? A typical BI dashboard that most users use today to look at their data. One dashboard on BI has 12 visuals (8 charts and 4 KPIs). This means every time a user clicks and selects a piece of data, and this tool will send 12 simultaneous queries to the underlying data platform. This jam packs all the data at once, confusing the data analysts. The introduction of concurrency on a modern cloud data platform like Snowflake, which can easily run your data warehousing, Data Lake, and data engineering & to such databases, will aid the separation of the data and also affect data cleaning before delivering it to the database.
I did a test for my personal education to understand where these technologies stand in terms of serving Ad-hoc BI and reporting workloads. I will be comparing the speed, variation in Nodes used, and timing of Snowflake to Lakehouse and Snowflake to Ad-hoc BI.
I will be giving a short analysis of the results, starting with Snowflake.
Snowflake was two times faster than Lakehouse while using half the computer. Snowflake, in less than thirty seconds to meet demand, uses one node. At the end of the test, Snowflake completed this test using a total of 4 nodes. While Lakehouse completed this test using eight nodes + 1 driver node. The result shows that Snowflake is faster and consumes less computing than Lakehouse. This also means that when data is gathered, it will be delivered to the database by Snowflake in a more concise format. Snowflake was able to recognize a high concurrency demand very quickly and address it by instantly adding additional clusters, while the Data lake SQL warehouse workload looks much looks much slower to respond to the requirement for higher concurrency where the response is limited
In deciding which to go for, the choice is yours, as there is Snowflake, where everything works perfectly, and all data is secured & encrypted by default, and there is no configuration or maintenance required. With a lake house architecture, heavy concurrent workloads will result in substantially slower and inconsistent performance. I conducted the test using two tables as used by companies using ad-hoc BI. There were 200 million rows and 4000 rows of customers. I then loaded the data at the same time and in the same way on both platforms.