RSS

Author Archives: stack247

Azure Resource Manager

Azure Resource Manager Overview

Azure Resource Manager, or ARM, is a way to manage resources in Azure. With ARM, we will be able to define our platform, or infrastructure, as code.

One of the benefits of using code to manage platform / infrastructure is we can check in to source control to see different changes over time. It also allow us to reuse part of the code for other deployment.

Azure Resource Manager also have the following benefits:

  • Manage resources. Deploy, add, remove resources at ease.
  • Resource grouping. Group resource into logical set that makes sense to you, i.e.: environment, location, etc.
  • Resource dependencies. Handle dependencies between different resources.
  • Repeatable deployments. ARM template can be used to perform same. repeatable deployments.
  • Template. Using template and code to define platform / infrastructure. Template is also reusable.

Architecture

With ARM architecture, there are few components. They are:

  • Resource providers. These are the functionalities of the resources, such as compute, storage, etc.
  • Resource types. The actual resource of Azure services that will be deployed.
  • ARM REST APIs. Allow invocation of ARM command thru its APIs.

In general, ARM template has the following structure:

  • Schema (required)
  • Content Version (required)
  • Parameter
  • Variables
  • Resources
  • Output

How it Works

You can also abstract out parameters for a ARM template to its own template parameter file. ARM template parameter must also have schema and version. The version doesn’t have to be the same with the template file.

In ARM template deployment, Azure automatically detect dependencies between resource types and deploy the dependent resource first. For example, if a template has specified to deploy App Service Plan and App Service Web Site, ARM REST API will deploy App Service Plan first, which Web Site depends on.

You can also specify dependency in ARM template as well as deploy resources simultaneously for services that don’t have dependency, i.e. deploying 4 App Service Websites.

Azure Resource Manager will only deploy resources in the template that are not yet exist. When you the specified resource already exists, it will be skipped.

Within ARM template, you can also specify to deploy the application itself, this is useful, for example, when the App Service Web Site deployment will include populating the website with web application from source control.

Reference

https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-authoring-templates

https://github.com/Azure/azure-resource-manager-schemas

Advertisements
 
Leave a comment

Posted by on January 25, 2019 in General

 

Tags: , , , , ,

Data Warehouse Solutions in Azure

Date Warehousing Solutions at a Glance

With today’s big data requirements where data could be structured, unstructured, batch, stream and come in many other forms and size, traditional data warehouse is not going to cut it.

Typically, there are 4 types of data stage:

  • Ingest
  • Store
  • Processing
  • Consuming

Different technology is required at different stage. This also depends heavily on size and form of data and the 4 Vs: Volume, Variety, Velocity, Veracity.

Consideration for the solutions sometime also depends on:

  • Ease of management
  • Team skill sets
  • Language
  • Cost
  • Specification / requirements
  • Integration with existing / others system.

Azure Services

Azure offers many services for data warehouse solutions. Traditionally, data warehouse has been ETL process + relational database storage like SQL Data Warehouse. Today, that may not always be the case.

Some of Azure services for data warehousing:

  • Azure HDInsight
    Azure offers various cluster types that comes with HDInsight, fully managed by Microsoft, but still require management from users. Also supports Data Lake Storage. More about HDInsight. HDInsight sits on “Processing” data stage.
  • Azure Databricks
    Its support for machine learning, AI, analytics and stream / graph processing makes it a go-to solution for data processing. It’s also fully integrated with Power BI and other source / destination tools. Notebooks in Databricks allows collaboration between data engineers, data scientist and business users. Compare to HDInsight.
  • Azure Data Factory
    The “Ingest” part of data stage. Its function is to bring data in and move them around different system. Azure Data Factory supports different pipelines across Azure services to connect the data and even on-premise data. Azure Data Factory can be used to control the flow of data.
  • Azure SQL Data Warehouse
    Typically the end destination of data and to be consumed by business users. SQL DW is platform as a service, require less management from users and great for team who already familiar with TSQL and SSMS (SQL Management Studio). You can also scale it dynamically, pause / resume the compute. SQL DW uses internal storage to store data and include the compute component. SQL Data Warehouse sits on “Consuming” stage.
  • Database services (RDBMS, Cosmos, etc)
    SQL database, or other relational database system, Cosmos are part of the storage solutions offered in Azure Services. This is typically more expensive than Azure Storage, but also offer other features. Database services are part of “Storage” stage.
  • Azure Data Lake Storage
    Build on top of Azure Storage, ADLS offers unlimited storage and file system based on HDFS, allowing optimization for analytics purpose, like Hadoop or HDInsight. ADLS is part of “Storage” stage.
  • Azure Data Lake Analytics
    ADLA is a high-level abstraction of HDInsight. Users will not need to worry about scaling and management of the clusters at all, it’s an instant scale per job. However, this also comes with some limitations. ADLA support USQL, a SQL-like language that allows custom user defined function in C#. The tooling is also what developers are already familiar with, Visual Studio.
  • Azure Storage
  • Azure Analysis Services
  • Power BI

Which one to use?

There’s no right or wrong answer. The right solution depends on many others things, technical and non-technical as well as the considerations mentioned above.

Simon Lidberg and Benjamin Wright Jones have a really good presentation around this topic. See the link at reference for their full talk. But, basically, the flowchart to make decision looks like this:

data-warehouse-solutions-in-azure

Reference

https://myignite.techcommunity.microsoft.com/sessions/66581

 
Leave a comment

Posted by on January 20, 2019 in General

 

Tags: , , , , , , , , , , , , , , , , , ,

What is Azure HDInsight?

Hadoop and Azure HDInsight

Azure HDInsight is Azure’s version of Hadoop as a service. It lives in the cloud, just like other Azure services, and it’s a managed service so we don’t have to worry about some of the maintenance that’s required with Hadoop cluster.

Underneath, Azure HDInsight uses Hortonworks Data Platform (HDP)’s Hadoop components.

Each Azure HDInsight version has its own cloud distribution of HDP along with other components. Different version of HDInsight will have different version of HDP. See the reference link for technology stack and its version.

When you create Azure HDInsight, you will be asked to choose the cluster type. The cluster type is the Hadoop technology you would want to use, Hive, Spark, Storm, etc. More cluster types are being added. To see what’s currently supported, see the reference link.

Azure HDInsight can be a great data warehouse solution that lives in the cloud.

Azure HDInsight and Databricks

While Azure HDInsight is a fully managed service, there are still some management we as a user have to do. HDInsight also supports Azure Data Lake Storage and Apache Ranger integration. The sort of downside to HDInsight is it doesn’t have auto-scale and you can’t pause the deployment. This means, you will pay for the cost as long as the service lives. The typical model is to spin the service up whenever it’s needed, compute the data, store it in a permanent storage and kills the service.

This is as opposed to Databricks, which is another data warehouse solution offered by Azure, Databricks can be auto-scaled. Databricks, however, is less about ETL process and more of processing the data for analytics, machine learning and the likes. Needless to say, it has built-in library for this purpose.

The language support is also different. Language support in HDInsight depends on what cluster type you choose when you spin up the service, for example, Hive will support HiveQL (SQL-like language) in its Hive editor. Databricks supports Python, Scala, R, SQL and many others.

Reference

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning

https://docs.microsoft.com/en-us/azure/hdinsight/

 
Leave a comment

Posted by on January 15, 2019 in General

 

Tags: , , , , , , , , , , ,

ECMAScript 2015 Destructuring and Spread

Destructuring

Destructuring is a way to unpack values from arrays, or properties from objects, into its own variables.

var o = {p: 42, q: true};
var {p: foo, q: bar} = o;

console.log(foo); // 42
console.log(bar); // true

Spread

Spread allows collection to be expanded in places where zero or more arguments are expected (in function), or elements are expected (in array literals), or key-value pairs are expected (in object literals).

function sum(x, y, z) {
  return x + y + z;
}

const numbers = [1, 2, 3];

console.log(sum(...numbers));
// expected output: 6

console.log(sum.apply(null, numbers));
// expected output: 6

Reference

 

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Destructuring_assignment

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax

 
Leave a comment

Posted by on January 10, 2019 in General, References

 

Tags: , , , , ,

Memoization, huh?

Memoization

To put it simply, it’s caching. In a fancy term.

Memoization is usually being used in function. In other words, memoization is caching in the scope of Function. The purpose of memoization is to avoid doing computation that’s needed to be done repeatedly. This essentially to save time and resources.

The way functions achieve memoization is by caching the result and returning the result if it has been computed previously.

Couple use cases of memoization

  • Pure functions that require intensive computation and called repeatedly.
  • Pure and recursive Function.

See Fibonacci example in the link reference.

Reference

https://medium.com/@chialunwu/wtf-is-memoization-a2979594fb2a

 
Leave a comment

Posted by on January 4, 2019 in General

 

Tags: , ,

Quick Glance at Hadoop Ecosystems

As mentioned in Hadoop post, the community around Hadoop has built tremendous tools and technology to support developers. This becomes Hadoop ecosystem. Some of the most popular ones are:

  • Hive
    Hadoop is based on Java language but not everyone can learn Java. Hive is a software built on top of Hadoop, it exposes SQL interface, allowing SQL developers to use powerful Hadoop system in their familiar language. If you know SQL, you don’t have to experience in Java in order to leverage Hadoop. Hive is using HiveQL language, very SQL-like.
  • HBase
    Basically a non-relational database on top of Hadoop. Even though it’s a non-relational, you can integrate with other system just like a traditional database.
  • Pig
    A tool in Hadoop ecosystem used to manipulate data, transforming unstructured data to structured data. It also has interface to query the data, just like Hive.
  • Storm
    Event stream processor that lives in Hadoop, used to process stream of data (as opposed to batch data). Example would be to process stream of IOT data, where data from an IOT device keep flowing through the system.
  • Oozie
    A workflow management system to coordinate between different Hadoop technologies.
  • Flume / Sqoop
    More of integration system that will tranfer data to and from Hadoop system. If you have data that live outside of Hadoop and need to be processed in Hadoop, Flume / Sqoop will do the job.
  • Spark
    A distribute compute engine within Hadoop. It’s used to process large amount of data, prep-ing them for analytics, machine learning, etc. Needless to say, it has a lot of built-in library for machine learning, artificial intelligence, analytics, stream processing and graph processing. Spark also support various different language, Scala, Python, R, etc.

This is definitely an oversimplified explanation of Hadoop ecosystem and there are lots of other technologies not covered here. But, this should give you quick explanation of each of them.

 
Leave a comment

Posted by on December 28, 2018 in General

 

Tags: , , , , , , , , , , , , , , , ,

NgRx Entity

NgRx Entity

NgRx Entity is a library to handle entity within NgRx framework. The entity in the context is defined as the application’s domain objects, like User, Employee, etc.

The purposes of NgRx Entity are basically:

  • Reduce boilerplate code.
  • Search and sort entities quickly.

The first thing we need to use NgRx Entity is to create an entity adapter, like so:

const adapter = createEntityAdapter();

NgRx Entity also defines `EntityState<>` interface. The interface is also extensible should the additional properties in the application state are needed,

The shape of EntityState is something like this:

interface EntityState {
  ids: string[];
  entities: { [id: string]: V };
}

What this allows us to do is:

  • Find entity quickly using `entities` dictionary.
  • Maintain order of the list, good for use in sorting.

Some boilerplate codes that are reduced when using NgRx Entity:

  • No need to specify properties of the state interface.
  • Add, remove, update of entities in the state are handled by entity adapter.
  • Entity adapter has most commonly used selectors.

Reference

https://medium.com/ngrx/introducing-ngrx-entity-598176456e15

 
Leave a comment

Posted by on December 20, 2018 in General, References

 

Tags: , , , , ,

 
%d bloggers like this: