Kusto is a good name, but now it is only a nickname, Kusto’s official name is Azure Data Explorer or ADX.
Query data in Kusto is fast, way faster than the transitional RDBMS, such as SQL Server, MySQL, etc. Especially when the data size grows to billions of rows and continually grows in billion sizes.
We have been using Kusto as the data analysis query engine for more than two years. In the passing months, we are also building an Azure feature fully on Kusto instead of SQL Server.
In this article, I am trying to explain why Kusto is so fast from both data storage and query execution perspectives. This article may not help you write a better Kusto query immediately but may help you troubleshoot scripts that report errors and make a sensible decision when building an application on Kusto.
Before reading through it, assume you are somewhat familiar with the Kusto query and have a test cluster in your reach. if you don’t have one, it is easy to create a test cluster by following this instruction.
Why Kusto is Fast in Nutshell
Compare with SQL Server, Kusto’s high-speed query is not sourced from magic, the speed is a tradeoff of data processing, wanting some features and also giving up some.
- Kusto stores data in a distributed structure, in the end, the data bytes are located in several disks (SSD or traditional spinning hard drive), which means the data can be processed parallelly.
- By default, Kusto store the data in columnar form, so that the engine need only access to the columns that are involved in the query instead of scanning all data compare with row store data.
- Kusto cluster is node cluster, the word “node” here is actually an Azure Virtual Machine, which means the query can be processed parallelly.
- Kusto is designed for data that are read-only, delete-rarely, and no updates at all.
- Kusto is designed for ingesting data fast, it does not apply the data constraints checks, say, uniqueness check like a traditional SQL Database has.