Azure Resource Manager

Azure Resource Manager Overview

Azure Resource Manager, or ARM, is a way to manage resources in Azure. With ARM, we will be able to define our platform, or infrastructure, as code.

One of the benefits of using code to manage platform / infrastructure is we can check in to source control to see different changes over time. It also allow us to reuse part of the code for other deployment.

Azure Resource Manager also have the following benefits:

  • Manage resources. Deploy, add, remove resources at ease.
  • Resource grouping. Group resource into logical set that makes sense to you, i.e.: environment, location, etc.
  • Resource dependencies. Handle dependencies between different resources.
  • Repeatable deployments. ARM template can be used to perform same. repeatable deployments.
  • Template. Using template and code to define platform / infrastructure. Template is also reusable.

Architecture

With ARM architecture, there are few components. They are:

  • Resource providers. These are the functionalities of the resources, such as compute, storage, etc.
  • Resource types. The actual resource of Azure services that will be deployed.
  • ARM REST APIs. Allow invocation of ARM command thru its APIs.

In general, ARM template has the following structure:

  • Schema (required)
  • Content Version (required)
  • Parameter
  • Variables
  • Resources
  • Output

How it Works

You can also abstract out parameters for a ARM template to its own template parameter file. ARM template parameter must also have schema and version. The version doesn’t have to be the same with the template file.

In ARM template deployment, Azure automatically detect dependencies between resource types and deploy the dependent resource first. For example, if a template has specified to deploy App Service Plan and App Service Web Site, ARM REST API will deploy App Service Plan first, which Web Site depends on.

You can also specify dependency in ARM template as well as deploy resources simultaneously for services that don’t have dependency, i.e. deploying 4 App Service Websites.

Azure Resource Manager will only deploy resources in the template that are not yet exist. When you the specified resource already exists, it will be skipped.

Within ARM template, you can also specify to deploy the application itself, this is useful, for example, when the App Service Web Site deployment will include populating the website with web application from source control.

Reference

https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-authoring-templates

https://github.com/Azure/azure-resource-manager-schemas

Advertisements

Data Warehouse Solutions in Azure

Date Warehousing Solutions at a Glance

With today’s big data requirements where data could be structured, unstructured, batch, stream and come in many other forms and size, traditional data warehouse is not going to cut it.

Typically, there are 4 types of data stage:

  • Ingest
  • Store
  • Processing
  • Consuming

Different technology is required at different stage. This also depends heavily on size and form of data and the 4 Vs: Volume, Variety, Velocity, Veracity.

Consideration for the solutions sometime also depends on:

  • Ease of management
  • Team skill sets
  • Language
  • Cost
  • Specification / requirements
  • Integration with existing / others system.

Azure Services

Azure offers many services for data warehouse solutions. Traditionally, data warehouse has been ETL process + relational database storage like SQL Data Warehouse. Today, that may not always be the case.

Some of Azure services for data warehousing:

  • Azure HDInsight
    Azure offers various cluster types that comes with HDInsight, fully managed by Microsoft, but still require management from users. Also supports Data Lake Storage. More about HDInsight. HDInsight sits on “Processing” data stage.
  • Azure Databricks
    Its support for machine learning, AI, analytics and stream / graph processing makes it a go-to solution for data processing. It’s also fully integrated with Power BI and other source / destination tools. Notebooks in Databricks allows collaboration between data engineers, data scientist and business users. Compare to HDInsight.
  • Azure Data Factory
    The “Ingest” part of data stage. Its function is to bring data in and move them around different system. Azure Data Factory supports different pipelines across Azure services to connect the data and even on-premise data. Azure Data Factory can be used to control the flow of data.
  • Azure SQL Data Warehouse
    Typically the end destination of data and to be consumed by business users. SQL DW is platform as a service, require less management from users and great for team who already familiar with TSQL and SSMS (SQL Management Studio). You can also scale it dynamically, pause / resume the compute. SQL DW uses internal storage to store data and include the compute component. SQL Data Warehouse sits on “Consuming” stage.
  • Database services (RDBMS, Cosmos, etc)
    SQL database, or other relational database system, Cosmos are part of the storage solutions offered in Azure Services. This is typically more expensive than Azure Storage, but also offer other features. Database services are part of “Storage” stage.
  • Azure Data Lake Storage
    Build on top of Azure Storage, ADLS offers unlimited storage and file system based on HDFS, allowing optimization for analytics purpose, like Hadoop or HDInsight. ADLS is part of “Storage” stage.
  • Azure Data Lake Analytics
    ADLA is a high-level abstraction of HDInsight. Users will not need to worry about scaling and management of the clusters at all, it’s an instant scale per job. However, this also comes with some limitations. ADLA support USQL, a SQL-like language that allows custom user defined function in C#. The tooling is also what developers are already familiar with, Visual Studio.
  • Azure Storage
  • Azure Analysis Services
  • Power BI

Which one to use?

There’s no right or wrong answer. The right solution depends on many others things, technical and non-technical as well as the considerations mentioned above.

Simon Lidberg and Benjamin Wright Jones have a really good presentation around this topic. See the link at reference for their full talk. But, basically, the flowchart to make decision looks like this:

data-warehouse-solutions-in-azure

Reference

https://myignite.techcommunity.microsoft.com/sessions/66581

Entity Framework, .Net and SQL Server Table Valued Parameter

This is step by step setup of using SQL Server TVP (Table Valued Parameter) in .Net application with EF (Entity Framework). In this example, I use SQL Server 2016 (SP2-CU3), .Net 4.5.1 and EF 6.20.

1. Create a table to store data.

CREATE TABLE [dbo].[Something] (
    [Id]            INT	IDENTITY(1,1)   NOT NULL,
    [Name]          VARCHAR(150)        NOT NULL,
    [Status]        CHAR(1)             NOT NULL,
    CONSTRAINT [PK_Something_Id] PRIMARY KEY CLUSTERED ([Id] ASC)
);
GO

2. Create `User Defined Table Type` in SQL Server. For simplicity, in this example the type’s columns are same as table I created on step 1. In real-world, the type’s columns could be significantly different than table where we store the data, it might even used for join with other tables.

CREATE TYPE [dbo].[udt_Something] AS TABLE (
	[Name]		VARCHAR(150)	NOT NULL,
	[Status]	CHAR(1)		NOT NULL,
);
GO

3. Create stored procedure to take parameter (of a `User Defined Table` type we created earlier) and perform necessary task to persist our data.

CREATE PROCEDURE [dbo].[sp_SaveSomething]
	@udt_Something [dbo].[udt_Something] READONLY
AS
BEGIN
    INSERT INTO [dbo].[Something]
        SELECT *
        FROM @udt_Something
END
GO

4. Create extension method to convert `IEnumerable<T>` object to a `DataTable` object. In order to use SQL TVP, we have to pass our parameter as a `DataTable`. This method will help convert our data to `DataTable` type.
Helper.cs

using System;
using System.Collections.Generic;
using System.Data;
using System.Linq;
using System.Reflection;

namespace QC
{
    public static class Helper
    {
        public static DataTable ToDataTable<T>(this IEnumerable<T> enumerable, IEnumerable<string> orderedColumnNames)
        {
            var dataTable = new DataTable();

            // Get all properties of the object
            PropertyInfo[] properties = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance);
            PropertyInfo[] readableProperties = properties.Where(w => w.CanRead).ToArray();

            // Get column
            var columnNames = (orderedColumnNames ?? readableProperties.Select(s => s.Name)).ToArray();

            // Add columns to data table
            foreach (string name in columnNames)
            {
                dataTable.Columns.Add(name, readableProperties.Single(s => s.Name.Equals(name)).PropertyType);
            }

            // Add rows to data table from object
            foreach (T obj in enumerable)
            {
                dataTable.Rows.Add(columnNames.Select(s => readableProperties.Single(s2 => s2.Name.Equals(s)).GetValue(obj)).ToArray());
            }

            return dataTable;
        }
    }
}

5. For the purpose of this example, let’s say we want to save a collection of objects. This is our object definition.
Something.cs

namespace QC
{
    public class Something
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public string Status { get; set; }
    }
}

6. Using EF, called stored procedure we created and pass in `SqlParameter`, which is a collection of objects that we converted to `DataTable`. Don’t forget to specify parameter type as `User Defined Table Type`.
DataAccess.cs

using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Linq;

namespace QC
{
    public class DataAccess
    {
        public void Save(IEnumerable<Something> data)
        {
            // Columns for ordering, the order of the columns must be the same as user defined table type
            var orderedCols = new[] { "Name", "Status" };

            // SQL parameter to pass to stored procedure
            var param = new SqlParameter("@udt_Something", SqlDbType.Structured);
            param.Value = data.ToDataTable(orderedCols);
            param.TypeName = "dbo.udt_Something";

            try
            {
                // QCDB is our EF entities
                using (var db = new QCDB())
                {
                    // Call stored procedure and pass in table valued parameter
                    db.Database.ExecuteSqlCommand("EXEC dbo.sp_SaveSomething @udt_Something", param);
                }
            }
            catch
            {
                throw;
            }
        }
    }
}

7. Example of usage.
OpsController.cs

using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Linq;

namespace QC
{
    public class OpsController : ApiController
    {
        public void SaveSomething()
        {
            var data = new List<Something>();
            data.Add(new Something { Id = 1, Name = "Chap", Status = "A" });
            data.Add(new Something { Id = 2, Name = "Stick", Status = "D" });

            var dataAccess = new DataAccess();
            dataAccess.Save(data);
        }
    }
}

Unit Test Internal Members

Applies to .Net application in Visual Studio 2010 and up.

Say I have C# project (call it QC project). In the project, I have class that has internal member(s) like this:

namespace QC
{
    public class PersonService
    {
        public bool PromotePerson(string userName)
        {
        }

        internal bool FindPerson(string userName)
        {
        }
    }
}

For unit test, I’d created separate project (call it QCTest project). Because internal members can only be accessed in same assembly, I won’t be able to invoke `QC.PersonService.FindPerson(string userName)` method from QCTest project.

To overcome this issue, in `AssemblyInfo.cs` of project being tested (in my case, QC project), simply add

[assembly: InternalsVisibleTo("QCTest")]

After rebuilding my solution, I am able to access `QC.PersonService.FindPerson(string userName)` in my QCTest project.

As an added bonus, if you use mock library like Moq or Rhino and try to mock internal members, you most likely need to add:

[assembly: InternalsVisibleTo("DynamicProxyGenAssembly2")]

What is `DynamicProxyGenAssembly2` ?

DynamicProxyGenAssembly2 is a temporary assembly built by mocking systems that use CastleProxy like Moq or NSubsitute. It is generated when the mock is needed and disposed of after the tests are finished.

Reference: https://stackoverflow.com/a/17179923

Unit Testing DTO Classes

Reason why I unit test my DTO (Data Transfer Object) classes are because I pass these classes to client app and client app might use them and expect certain properties in response. Sometimes I change DTO properties that I used in client app and forget to refactor client app.
By unit testing my DTO classes, it acts as reminder for me to refactor client app whenever DTO properties (or its name) changes.

For example, I have Person DTO.

Person.cs

public class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string Age { get; set; }
}

To unit test, I would create PersonMirror DTO.

public class PersonMirror
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string Age { get; set; }
}

Then use following helper class to unit test my DTO.

public class TestHelper
{
    public static void CompareType()
    {
        var errorMessage = "Type: {0}; Property: {1}";

        var instance = Activator.CreateInstance(typeof(T));
        var instanceType = instance.GetType();
        var mirror = Activator.CreateInstance(typeof(TMirror));
        var mirrorType = mirror.GetType();
        var mirrorProperties = mirrorType.GetProperties();

        // Remove 'Mirror' convention
        var mirrorTypeName = mirrorType.Name;
        if (mirrorType.Name.IndexOf("Mirror") > 0)
        {
            mirrorTypeName = mirrorType.Name.Substring(0, mirrorType.Name.IndexOf("Mirror"));
        }

        // Compare type name
        Assert.AreEqual(mirrorTypeName, instanceType.Name);

        // Compare properties
        foreach (var mirrorProperty in mirrorProperties)
        {
            var instanceProperty = instanceType.GetProperty(mirrorProperty.Name);

            Assert.IsNotNull(instanceProperty, string.Format(errorMessage, instanceType.FullName, mirrorProperty.Name));
        }
    }
}

Usage (using MS Test):
PersonTest.cs

[TestClass]
public class PersonTest
{
    [TestMethod]
    public void TestPersonDto()
    {
        TestHelper.CompareType<Person, PersonMirror>();
    }    
}

WebApi .Net – Add DelegatingHandler to Get Request Body

Out of the box, WebApi pipeline bind request body to parameter, therefore WebApi Controller (which inherit from ApiController) `Request.Content` object is empty.

For example, passing this json in the request body:

[
    {
        "Name": "Test",
        "Status": "C",
    }
]

Will bind it to WebApi Controller’s parameter, however, `Request.Content` object is empty.

screenshot message handler

Overcome this issue by creating `DelegatingHandler` to add request body back.

RequestHandler.cs

using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;

namespace QC
{
    public class RequestHandler : DelegatingHandler
    {
        protected override async Task SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
        {
            // Request body stream is emptied when WebApi bind request body to parameter, here we add original request body back so we can access it
            if (request.Content != null)
            {
                string body = await request.Content.ReadAsStringAsync();
                request.Properties["body"] = body;
            }

            return await base.SendAsync(request, cancellationToken);
        }
    }
}

Register in WebApiConfig.

WebApiConfig.cs

public static class WebApiConfig
{
    public static void Register(HttpConfiguration config)
    {
        config.MessageHandlers.Add(new RequestHandler());
    }
}

Viola! `Request.Content` contains request body.

screenshot message handler2

Code is from here.

What is Azure HDInsight?

Hadoop and Azure HDInsight

Azure HDInsight is Azure’s version of Hadoop as a service. It lives in the cloud, just like other Azure services, and it’s a managed service so we don’t have to worry about some of the maintenance that’s required with Hadoop cluster.

Underneath, Azure HDInsight uses Hortonworks Data Platform (HDP)’s Hadoop components.

Each Azure HDInsight version has its own cloud distribution of HDP along with other components. Different version of HDInsight will have different version of HDP. See the reference link for technology stack and its version.

When you create Azure HDInsight, you will be asked to choose the cluster type. The cluster type is the Hadoop technology you would want to use, Hive, Spark, Storm, etc. More cluster types are being added. To see what’s currently supported, see the reference link.

Azure HDInsight can be a great data warehouse solution that lives in the cloud.

Azure HDInsight and Databricks

While Azure HDInsight is a fully managed service, there are still some management we as a user have to do. HDInsight also supports Azure Data Lake Storage and Apache Ranger integration. The sort of downside to HDInsight is it doesn’t have auto-scale and you can’t pause the deployment. This means, you will pay for the cost as long as the service lives. The typical model is to spin the service up whenever it’s needed, compute the data, store it in a permanent storage and kills the service.

This is as opposed to Databricks, which is another data warehouse solution offered by Azure, Databricks can be auto-scaled. Databricks, however, is less about ETL process and more of processing the data for analytics, machine learning and the likes. Needless to say, it has built-in library for this purpose.

The language support is also different. Language support in HDInsight depends on what cluster type you choose when you spin up the service, for example, Hive will support HiveQL (SQL-like language) in its Hive editor. Databricks supports Python, Scala, R, SQL and many others.

Reference

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning

https://docs.microsoft.com/en-us/azure/hdinsight/