Azure Synapse Analytics for structured big data

Azure Synapse Analytics for structured big data

Many companies today are trying to transform themselves into data-driven organizations, that is, to become the type of organization where data and analytics is part of the business strategy, systems, processes, and culture. And while being data-driven is in part a mindset, it is also about having the right underlying tools and infrastructure to enable this strategic shift.

Big data analytics platforms load, store, and analyze volumes of data at high speed, providing timely insights to businesses. Data-driven organizations leverage this data, for example, for advanced analysis to market new promotions, operational analytics to drive efficiency, or for predictive analytics to evaluate credit risk and detect fraud. Customers are leveraging a mix of relational analytical databases and data warehouses to gain business insights.

Gigaom recently published a set of benchmarks comparing the price/performance of the four leading cloud-based relational databases: Amazon Redshift, Azure Synapse Analytics (formerly Azure SQL Data Warehouse), Google Big Query, and Snowflake Data Warehouse. The whitepaper found that using representative set of queries from the TPC-H standard, Azure Synapse Analytics consistently outperformed its counterparts in both price and performance. This whitepaper, which can be found at, is well worth reading.

Rather than rehash the findings of this whitepaper – it can speak for itself – I thought this might be a good opportunity to highlight some of the top features which makes Azure Synapse Analytics such a great fit for enterprise database solutions.

What is Azure Synapse Analytics?

Azure Synapse Analytics (formerly Azure SQL Data Warehouse, or SQL DW) is Microsoft’s cloud-based Platform-as-a-Service (PaaS) for massive, structured relational databases. Synapse Analytics offers petabyte-scale due to its distributed architecture and use of parallel processing. A fully managed cloud service offering independently-scaled compute and storage, Synapse Analytics is an excellent choice when looking to store big relational data on cloud with highly performant queries. Synapse Analytics, which became generally available under the SQL Data Warehouse moniker in 2015, has proven repeatedly to perform better performing competitor offerings including Amazon Redshift, Google BigQuery and Snowflake DW.

Features in Azure Synapse Analytics

Let us highlight some of the top features which makes Synapse Analytics such a great fit for enterprise database solutions:

Secondary Index

Azure Synapse Analytics, Amazon Redshift, Google BigQuery and Snowflake all provide clustered column store index (CCI). However, Synapse Analytics offers the unique capability of a secondary index on CCI tables; this is also known as secondary B-Tree indexes or non-clustered index (NCI). Aggregation and analytical queries mostly perform well with CCI tables, but when querying relatively smaller dataset aggregation or lookup, the secondary index can improve performance greatly
Best practice:

User a secondary index for the columns which are frequently used for filter, group by, or order by
Do not overuse secondary indexes, as the additional overhead required to maintain them can negatively impacts query performance. Use for 1 or 2 columns per table.

ON [ database_name . [ schema ] . | schema . ] table_name
( { column [ ASC | DESC ] } [ ,...n ] )


Data migration is a key step in any organization’s moving to the cloud. A poorly executed migration can lead to increased latency, higher costs, and risk business continuity goals.
Data migration is where Synapse Analytics shines above all. The PolyBase feature in Synapse Analytics enables lightning-fast data transfer from multiple sources and allows data queries directly from the source without moving beforehand, creating extremely rapid execution by comparison. PolyBase can query structured data stored in Hadoop Distributed File System (HDFS), Blob Storage, Azure Data Lake Storage, SQL Server, Oracle, Teradata and MongoDB. PolyBase can also query non-relational data, like delimited files.

PolyBase offers the fastest throughput compared to BCP, SQLBulkCopy/ADF or SSIS. Every data engineer has faced two very common hurdles when joining SQL data to external data sources: either transfer all the data to same location with same format,or query them individually and bring the query result to a centralized location to perform independent joins. PolyBase removes this hassle and allows data engineers to join using T-SQL.

Compute Gen2 Tiers

One of the important features of Synapse Analytics is that it can scale storage and compute independently. To achieve this, Microsoft re-architected the product to utilize Azure Blob Storage; because blob storage is persistent, this allows massively parallel processing databases (MPP) to be “paused” and “resumed”. Of course, SSD-based storage is even faster than Blob Storage, which has led to new, compute optimized tiers.

This new tier allows the data to be cached in NVMe SSDs, an incredible innovation as it still uses Azure Blob Storage. Consequently, Synapse Analytics gains local SSD performance along with independently-scaled compute and storage. This provides up to 5x improvement in query performance compared to Gen1.

Business Continuity

Perhaps the single most important part of any data service’s security and availability is reliable backup and configuration restoration. Synapse Analytics provides automatic restore points every 30 mins. Automatic restore points cannot be deleted by any users and are used to maintain recovery SLAs. Users can restore their data warehouse using any of restore points within the previous seven days.

In addition to Automatic Restore Points, Synapse Analytics also allows user-defined, point-in-time restore points. Up to 42 can be created per Synapse Analytics instance, with custom retention period. Synapse Analytics also provides Geo-Backups. As the name implies, Geo-Backups allow users to backup their Synapse Analytics data to a paired datacenter, enabling restoration when users cannot access their primary region. The geo-backups can be restored to any data centers which support Synapse Analytics.


A robust, data estate infrastructure is a foundational enabler of any move towards becoming a data-driven organization. Deploying and leveraging cloud-based solutions such as Azure Synapse Analytics are an important step in that journey. That said, preparing for this new data-driven world requires more than just a partner that can upgrade your database platform. It requires the unique blend of expertise in business strategy, people and processes, governance, and technology that only a company like Neal Analytics can provide. Contact us and let us help you begin this journey with a professional assessment.