RSS

Tag Archives: sql

Data Warehouse Solutions in Azure

Date Warehousing Solutions at a Glance

With today’s big data requirements where data could be structured, unstructured, batch, stream and come in many other forms and size, traditional data warehouse is not going to cut it.

Typically, there are 4 types of data stage:

  • Ingest
  • Store
  • Processing
  • Consuming

Different technology is required at different stage. This also depends heavily on size and form of data and the 4 Vs: Volume, Variety, Velocity, Veracity.

Consideration for the solutions sometime also depends on:

  • Ease of management
  • Team skill sets
  • Language
  • Cost
  • Specification / requirements
  • Integration with existing / others system.

Azure Services

Azure offers many services for data warehouse solutions. Traditionally, data warehouse has been ETL process + relational database storage like SQL Data Warehouse. Today, that may not always be the case.

Some of Azure services for data warehousing:

  • Azure HDInsight
    Azure offers various cluster types that comes with HDInsight, fully managed by Microsoft, but still require management from users. Also supports Data Lake Storage. More about HDInsight. HDInsight sits on “Processing” data stage.
  • Azure Databricks
    Its support for machine learning, AI, analytics and stream / graph processing makes it a go-to solution for data processing. It’s also fully integrated with Power BI and other source / destination tools. Notebooks in Databricks allows collaboration between data engineers, data scientist and business users. Compare to HDInsight.
  • Azure Data Factory
    The “Ingest” part of data stage. Its function is to bring data in and move them around different system. Azure Data Factory supports different pipelines across Azure services to connect the data and even on-premise data. Azure Data Factory can be used to control the flow of data.
  • Azure SQL Data Warehouse
    Typically the end destination of data and to be consumed by business users. SQL DW is platform as a service, require less management from users and great for team who already familiar with TSQL and SSMS (SQL Management Studio). You can also scale it dynamically, pause / resume the compute. SQL DW uses internal storage to store data and include the compute component. SQL Data Warehouse sits on “Consuming” stage.
  • Database services (RDBMS, Cosmos, etc)
    SQL database, or other relational database system, Cosmos are part of the storage solutions offered in Azure Services. This is typically more expensive than Azure Storage, but also offer other features. Database services are part of “Storage” stage.
  • Azure Data Lake Storage
    Build on top of Azure Storage, ADLS offers unlimited storage and file system based on HDFS, allowing optimization for analytics purpose, like Hadoop or HDInsight. ADLS is part of “Storage” stage.
  • Azure Data Lake Analytics
    ADLA is a high-level abstraction of HDInsight. Users will not need to worry about scaling and management of the clusters at all, it’s an instant scale per job. However, this also comes with some limitations. ADLA support USQL, a SQL-like language that allows custom user defined function in C#. The tooling is also what developers are already familiar with, Visual Studio.
  • Azure Storage
  • Azure Analysis Services
  • Power BI

Which one to use?

There’s no right or wrong answer. The right solution depends on many others things, technical and non-technical as well as the considerations mentioned above.

Simon Lidberg and Benjamin Wright Jones have a really good presentation around this topic. See the link at reference for their full talk. But, basically, the flowchart to make decision looks like this:

data-warehouse-solutions-in-azure

Reference

https://myignite.techcommunity.microsoft.com/sessions/66581

Advertisements
 
Leave a comment

Posted by on January 20, 2019 in General

 

Tags: , , , , , , , , , , , , , , , , , ,

What is SQL Server “Included Columns” ?

SQL Server organized indexes in a table as B-tree structure, which look like this.

B-tree-structure

A table that has `clustered index` store actual data rows at leaf level.
A table that has `non-clustered index`, only store (at leaf level) the value from the indexed column and a `row locator` point to actual data rows. It does not store actual data rows.

Specifying Included Columns in `non-clustered index` tell SQL Server you want to store actual data rows at leaf level.

For example:

CREATE NONCLUSTERED INDEX IX_Employee_Department
ON Employee(EmployeeFirstName)
INCLUDE (EmployeeAddress, EmployeeDOB)
GO

This example will create `non-clustered index` and its leaf level will store value from indexed column (EmployeeFirstName) and data rows from EmployeeAddress and EmployeeDOB columns.

 
Leave a comment

Posted by on December 3, 2018 in General

 

Tags: , , , , , ,

SQL Server Configuration Manager: “Cannot connect to WMI provider. You do not have permission”

This caused by uninstalling certain version of SQL Server and re-installing different version.

See this for fixes.

Note: Run command prompt as Administrator.

 
Leave a comment

Posted by on November 13, 2018 in General

 

Tags: , , , , ,

SQL Server Concurrency Effects

Lost Update

Happens when one transaction update overwrites another, causing the update to be lost.

Dirty Reads

Reading uncommitted records.

Phantom Reads

Reading while another transaction is adding new record, result in different number of rows being returned.

Repeatable Reads

When the same query to read is executed, it will return the same result.

Reference

MSDN

 
Leave a comment

Posted by on May 22, 2018 in General

 

Tags: , , , ,

Columnstore Index

  • New in SQL Server 2012.
  • Stored by columns, a column-based index, instead of row-based like in traditional index. For example, if row-based index is consisted of Firstname and Lastname columns, the column-based index would have 2 different indexes: Firstname in its own index and Lastname in its own index.
  • The index is compressed, allowing high performance.
  • The compressed data is stored in-memory, reducing needs to read off the disks.
  • Compression ratio is generally high because the same data type in a column.
  • Generally a better choice for wide table with many columns, as commonly found in data warehouse tables.
  • Clustered and non-clustered index.
  • Can be combined with row-based index.

Reference
Microsoft Docs

 
Leave a comment

Posted by on May 16, 2018 in General

 

Tags: , , , , ,

SQL Server Isolation Level

Definition

Isolation is the “I” in ACID principal which rules the transactional state and its concurrency. The higher the isolation level, the lower the concurrency effects. And vice versa.

ISO Standard Isolation

The ISO standard defines 4 different isolation levels, from lowest to highest:
– Read uncommitted
– Read committed
– Repeatable read
– Serializable, where transactions are completely isolated from one another.

SQL Server Database Isolation

SQL Server Database supports all levels of isolation defined by ISO standard. In addition, it adds another isolation levels: the Snapshot isolation level (ALLOW_SNAPSHOT_ISOLATION).

SQL Server Database also add another implementation of Read Committed isolation level: Read Committed Snapshot Isolation or RCSI (READ_COMMITTED_SNAPSHOT). The original Read Committed implementation is referred to RC.

In Azure SQL Database, the default setting for Read Committed is RCSI (both ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT are set to ON).

In on-premise SQL Server, the default is RC (both ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT are set to OFF). But can be set to use RCSI by setting them to ON.

These two isolation levels, Snapshot isolation and RCSI, use row versioning that is maintained in tempdb.

Code

To check database settings for Snapshot isolation:

SELECT name, snapshot_isolation_state, is_read_committed_snapshot_on FROM sys.databases

To check current connection’s transaction isolation level:

SELECT CASE transaction_isolation_level
WHEN 0 THEN 'Unspecified'
WHEN 1 THEN 'ReadUncommitted'
WHEN 2 THEN 'ReadCommitted'
WHEN 3 THEN 'Repeatable'
WHEN 4 THEN 'Serializable'
WHEN 5 THEN 'Snapshot' END AS TRANSACTION_ISOLATION_LEVEL
FROM sys.dm_exec_sessions

To set transaction isolation level for current connection. Transaction isolation level is per session / connection.

SET TRANSACTION ISOLATION LEVEL { READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SNAPSHOT | SERIALIZABLE }

Isolation Level and Concurrency Effects Matrix

Isolation Level Dirty Read Lost Update Non Repeatable Read Phantom
Read uncommitted Yes Yes Yes Yes
Read committed No Yes Yes Yes
Repeatable read No No No Yes
Snapshot No No No No
Serializable No No No No

Performance

In a load test, performance is significantly higher in RCSI, but it requires a lot higher throughput in tempdb (some 50x larger I/O). So planning on tempdb scaling is very important.

Reference
Microsoft Docs
Technet Blog
MSDN Blog

 
Leave a comment

Posted by on May 7, 2018 in General

 

Tags: , , , , , , ,

SQL Logical Order of Operations

Following is SQL statement’s Logical Order of Operations (not particular to SQL Server):

  • FROM
  • WHERE
  • GROUP BY
  • Aggregations (COUNT, MAX, etc)
  • HAVING
  • WINDOW
  • SELECT
  • DISTINCT
  • UNION, INTERSECT, EXCEPT
  • ORDER BY
  • OFFSET
  • LIMIT, FETCH, TOP

That is why, running following SQL statement in SQL Server:

SELECT FirstName, LastName, COUNT(*)
FROM dbo.User
GROUP BY FirstName

Will throw:

Column ‘dbo.User.LastName’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

GROUP BY is run first and grouped User’s columns by FirstName therefore LastName columns and all other columns are not visible by SELECT clause.

Another not-so-obvious example:

SELECT FirstName, COUNT(*)
FROM dbo.User
WHERE COUNT(*) > 0
GROUP BY FirstName

Refer to logical order above, when WHERE clause is executed, COUNT(*) > 0 is not yet evaluated and thus SQL Server throw exception.

Read more here.

 
Leave a comment

Posted by on May 2, 2018 in General

 

Tags: , , , , ,

 
%d bloggers like this: