Sunday Reading List

Quickstart: Deploy Linux containers to Service Fabric

Azure Kubernetes Service is really cool to deploy your container to. However, what most people don’t know is, you can deploy the same container to Service Fabric. The benefit? You don’t have VMs to manage in Service Fabric.


Optimize the performance of Azure Cosmos DB by using partitioning and indexing strategies

If you’re using Azure Cosmos DB, you probably understand how partitioning strategy could directly affect the query performance of your application. This learning module explains in great details how it actually works.


GitLab and The Forrester Wave™: Cloud-Native Continuous Integration Tools, Q3 2019

The Q3 2019 Forrester Wave’s Cloud-native continuous integrations tools report. Hint: Google, Microsoft, AWS CircleCI and GitLab are among the leaders.


Single Page Applications and ASP.NET Core 3.0

If you’re thinking about building a single page application with ASP.NET Core, definitely check this post. There are 3 different options you can go about building SPA using ASP.NET Core and they are discussed in details.


Effective Dictionary Usage(C#): Avoid If Statements

I don’t know how I feel about this post. If you’re going to read it, read it with a grain of salt. I like it because Muhammed presents a different approach to `if` statement, sounds to me like an implementation of strategy pattern. Read at your own risk. 🙂


What to Read this Week?

Integrating Cosmos DB with OData (Part 1)

Azure Cosmos DB is great, but what if you need expose it as OData standard, including the query in URI convention? Well, you integrate it with OData. Hassan covers the basic of Cosmos DB, what is it and how to set it up, and how to integrate it with OData.

40 Visual Studio Code Plugins I Have

One can never has enough Visual Studio Code plugins. Here’s another 40 for you.

Open Neural Network Exchange (ONNX)

Machine learning is on the rise and everyone develops their own standard. ONNX (read `onix`) is an open standard model for machine learning. This explains what is it in more details as well as where to get the pre-build ONNX models (model zoo).

Introduction to Big O Notation and Time Complexity (Data Structures & Algorithms #7)

CSDojo explains Big O Notation in a simple, easy to understand way and how time and space complexity are calculated. There are some maths involved, but they are pretty basic.

C# Data Structures

And along with Big O Notation, these articles go over the data structures that are available in C#. Learning and knowing when to use these data structures are important in building the fast algorithm.

Data Warehouse Solutions in Azure

Date Warehousing Solutions at a Glance

With today’s big data requirements where data could be structured, unstructured, batch, stream and come in many other forms and size, traditional data warehouse is not going to cut it.

Typically, there are 4 types of data stage:

  • Ingest
  • Store
  • Processing
  • Consuming

Different technology is required at different stage. This also depends heavily on size and form of data and the 4 Vs: Volume, Variety, Velocity, Veracity.

Consideration for the solutions sometime also depends on:

  • Ease of management
  • Team skill sets
  • Language
  • Cost
  • Specification / requirements
  • Integration with existing / others system.

Azure Services

Azure offers many services for data warehouse solutions. Traditionally, data warehouse has been ETL process + relational database storage like SQL Data Warehouse. Today, that may not always be the case.

Some of Azure services for data warehousing:

  • Azure HDInsight
    Azure offers various cluster types that comes with HDInsight, fully managed by Microsoft, but still require management from users. Also supports Data Lake Storage. More about HDInsight. HDInsight sits on “Processing” data stage.
  • Azure Databricks
    Its support for machine learning, AI, analytics and stream / graph processing makes it a go-to solution for data processing. It’s also fully integrated with Power BI and other source / destination tools. Notebooks in Databricks allows collaboration between data engineers, data scientist and business users. Compare to HDInsight.
  • Azure Data Factory
    The “Ingest” part of data stage. Its function is to bring data in and move them around different system. Azure Data Factory supports different pipelines across Azure services to connect the data and even on-premise data. Azure Data Factory can be used to control the flow of data.
  • Azure SQL Data Warehouse
    Typically the end destination of data and to be consumed by business users. SQL DW is platform as a service, require less management from users and great for team who already familiar with TSQL and SSMS (SQL Management Studio). You can also scale it dynamically, pause / resume the compute. SQL DW uses internal storage to store data and include the compute component. SQL Data Warehouse sits on “Consuming” stage.
  • Database services (RDBMS, Cosmos, etc)
    SQL database, or other relational database system, Cosmos are part of the storage solutions offered in Azure Services. This is typically more expensive than Azure Storage, but also offer other features. Database services are part of “Storage” stage.
  • Azure Data Lake Storage
    Build on top of Azure Storage, ADLS offers unlimited storage and file system based on HDFS, allowing optimization for analytics purpose, like Hadoop or HDInsight. ADLS is part of “Storage” stage.
  • Azure Data Lake Analytics
    ADLA is a high-level abstraction of HDInsight. Users will not need to worry about scaling and management of the clusters at all, it’s an instant scale per job. However, this also comes with some limitations. ADLA support USQL, a SQL-like language that allows custom user defined function in C#. The tooling is also what developers are already familiar with, Visual Studio.
  • Azure Storage
  • Azure Analysis Services
  • Power BI

Which one to use?

There’s no right or wrong answer. The right solution depends on many others things, technical and non-technical as well as the considerations mentioned above.

Simon Lidberg and Benjamin Wright Jones have a really good presentation around this topic. See the link at reference for their full talk. But, basically, the flowchart to make decision looks like this:

data-warehouse-solutions-in-azure

Reference

https://myignite.techcommunity.microsoft.com/sessions/66581

Optimistic Concurrency x Eventual Consistency

Optimistic Concurrency

Less strict locking to support more simultaneous access. In optimistic concurrency, multi users are able to perform actions on the same resources without locking each other, for example, one user can write without locking another user that’s reading the same resource. Some actions will still lock the resource exclusively, for example, a schema changes.

Pessimistic Concurrency

Is the opposite, a stricter locking is used. When a user is performing an action that requires lock, other users won’t be able to do anything that would conflict with the lock, until the lock is release from the owner (first user).

Eventual Consistency

Eventual consistency guarantees more of availability than consistent data. This is achieved by prioritize availability (not locking the resource) rather than replicating the data.

Strong Consistency

The opposite of eventual consistency where it’s prioritizing consistent data across the system rather than availability.

BASE

Eventual consistency is classified as BASE (Basically Available, Soft state, Eventual consistency) semantics, as oppose to ACID principle.

Conclusion

Eventual / strong consistency is similar to optimistic / pessimistic concurrency. The difference is the terms eventual / strong consistency is often used in a distributed system where optimistic / pessimistic concurrency is used more in lower level, single entity such as database.

Azure Cosmos DB consistency levels, strong to weakest consistency:
– Strong consistency
– Bounded staleness
– Session
– Consistent prefix
– Eventual consistency