AZ-305 - Databases

04 July, 2022
Back

Selecting an Azure DB

Types of DB on Azure

types of db on azure

  1. Select deployment option
  • SQL virtual machines consider when migrating apps that require OS level access.
  • Managed instance consider when lifting and shifting to the cloud.

    • Single instance
    • Instance pool. You need to forecast the number of vCores you need upfront and allocate them to the DBs.
  • SQL DB consider when building new apps.

    • Single instance
    • Elastic pool. You need to forecast the utilization for number multiple DBs. All DBs will consume resources from a common pool. Works for both DTU and vCore purchasing model. Read.
  1. Select purchasing model
  • DTU model when you are fine with bundling compute, storage and I/O together.

    • N/A for Managed Instance since you only specify vCores.
  • vCore model when you want to select compute and storage size independently.

Azure SQL HA

Azure SQL DB is a PaaS, as opposed to SQL Server which runs on VM.

Availability for SQL DB and MI comes in three offerings:

  1. General purposed - spare compute nodes for the DB are always on standby and data layer is stored in Blob storage.
get-tech
  1. Business critical - single primary and 3 replicas.
get-tech
  1. Hyperscale
get-tech

Hyperscale is designed with four components. Architecture diagram:

  • Compute - where OLTP and SSD-based cache resides. Made redundant with secondary compute nodes.
  • Page server - is also a compute component where data is sharded and served from across multiple pages of 128GB to 1TB each. LTR (long term retention) goes into Azure storage component.
  • Log service - Used for replicating data from primary compute to secondaries. Logs are also persisted in Azure storage. Also used for CDC.
  • Azure storage - where the data is really stored for LTR. Data is stored across regions and snapshots are used for backups.

Azure SQL MI

It's very hard for me to decide when to use SQL DB vs SQL MI. But the guiding principle is to compare both features and match them to the use case. There are generally more features in MI.

If features were not an issue, I'd go with SQL DB.

Azure DB optimization

  1. Automatic tuning

    • Uses AI to learn horizontally from all DBs.
    • Available for Azure SQL DB and SQL Managed Instance.
    • Recommended to be turned on only during low CPU, Data IO, and Log IO utilization periods.
  2. Query Performance Insights

    • Identifies top consuming and long-running queries.
    • Shows CPU, Data IO, Log IO, query duration, and execution count.
get-tech

Query Performance Insights dashboard

Protecting data in Azure

Dynamic Data Masking

  • Fully or partially masks data for privileged users (ie. credit card number XXXX-XXXX-XXXX-1956).
  • Allows users to query the DB directly without running additional expensive queries.
  • Available for:

    • Azure SQL DB
    • SQL MI
    • Azure Synapse Analytics
  • DDM policy:

    • SQL users are excluded from masking. They can see original data.
    • Masking rules define fields, columns, table name and schema name that should be masked.
    • Masking functions control exposure of data for different scenarios.

Transparent Data Encryption

  • Also known as data at rest encryption.
  • Uses multiple levels of encryption in a hierarchy:
get-tech
  • Protects from attempts to recover data from a stolen DB.

Always Encrypted

  • Encryption takes places in the client before being stored in the DB. Decryption also happens at the client after retrieving data from the DB.
  • Always Encrypted-enabled driver must be installed on the client side.
  • Two keys are involved:

    • Column encryption keys encrypts the data.
    • Column master key encrypts the column encryption key.
    • The columns master key is stored in KV, Windows Certificate Store or a HSM.
  • Available for:

    • SQL Server
    • Azure SQL DB
    • SQL MI
  • Protects data in-transit (in addition to TLS), because data is encrypted at the client before transit.
  • Always Encrypted comes in two selections:

    • Deterministic encryption - Encrypted values are always the same for a given plain text. Not suitable predictable texts (True/False, Male/Female).
    • Randomized encryption - Encrypted values are always different from a given plain text. Prevents searching, grouping, indexing and joining on encrypted values.

Compare TDE vs Always Encrypted.

T-SQL

  • Transact-SQL is a Microsoft query language that is built with added features on top of SQL for Microsoft and Azure DB.
  • Differences between SQL and T-SQL.

Azure Synapse Analytics

  1. SQL pool
  • A data warehouse and Big Data feature for Synapse Analytics.
  • Stores data in a relational table with columnar storage, which improves query performance.
  • Differences between Synapse Analytics and Azure Data Factory.
get-tech

Best practice for dedicated SQL pool.

get-tech

Azure Data Factory

  • Managed big data service.
  • Performs ETL, ELT and data integration from source to sink.
  • Pipeline is a logical group of activities that peforms a unit of work (so that you don't have to manage many individual units). They can be scheduled and run sequential or parallel.
  • Copy Activity copies data from supported source to sink.
  • Linked service is the connection to the data store to send data to Data Factory.
  • Dataset is the structure and destination of the data at the source and sink.
  • For example, copying data from Blob storage to SQL DB requires you to create two linked services (Blog Storage and SQL DB) and two datasets (for the Blog Storage and SQL Table dataset).
  • Pipeline run defines the execution of a pipeline.
  • Triggers for a pipeline can be a schedule, window, storage event or custom event.
  • Integration runtime (IR) is the compute infrastructure which can be in Azure (VM), self-hosted (on-premise Windows or Bring-You-Own-Cloud) or Azure-SSIS (SQL Server Integration Services).
  • Mapping data flows no-code GUI to visualize actions in ADF.

Cosmos DB consistency

The consistency of NoSQL dbs are different from that of ACID in SQL dbs.

  • Supports multi-master writes.
  • Supports multi-region writes.
  • Geo-redundancy follows region pairs (ie. Southeast Asia with East Asia, Hong Kong).

While Dynamodb has two levels of consistency (strong and eventual), CosmosDB has five.

cosmosdb-consistency
  1. Strong

    • Clients from different regions read exactly the same data.
    • Uncommitted writes are never read by the client.
    • Data consistency can tolerate region failures.
  2. Boundless state

    • Staleness in a different region is restricted by number of versions (K) or time (T). Multi-region default minimum is K=100 000 writes or T=300 sec.
    • Suitable for clients apps that require low latency but can tolerate staleness within a window. ie: File sharing, pub-sub apps.
  3. Session

    • Client apps with the same session tokens will receive strongly consistent reads.
    • Other clients will receive eventual reads, but in the same order.
  4. Consistent prefix

    • Order of data is guaranteed, but may arrive at different times.
    • Suitable for apps that prioritize writes over reads, and where read is not immediately required. ie: data warehousing
  5. Eventual

    • High write throughput, but data does not arrive in order.
    • Suitable storing non-serialized data. ie 'Likes', Friends list.

Other DB concepts

SSIS (SQL Server Integration Service) is not a service specific to Azure but can be used together with it for:

  • Copying and downloading files.
  • Loading data warehouses.
  • Data cleansing and mining.
  • Managing SQL Server objects and data.

Elastic pools

  • One elastic pool can have many dedicated database instances. Only one pool is needed for 30 apps.
  • One elastic pool and one SQL db server only serves one region. An SQL db server (not to be confused with SQL server on VM) needs to be created to host an elastic pool.
  • An SQL server hosts an elastic pool, which consists of multiple DB (three layers of abstraction).

SQL Transactional Replication

  • Transactional Replication is very similar to CDC. Performs a one-time full replication from Publisher to Subscriber and allows updates for subsequent changes.
  • Useful to perform replication while migration is happening.
  • SQL DB migration.

DB Migration

Transactional replication allows you to migrate an on-premise MS SQL Server on-premise to Azure SQL Server with minimal disruption to the PROD db.

Ref

DB Backup

Database backup allows you to restore your db in three ways:

  1. Full restore
  2. Differential restore
  3. Transactional logs backups
  4. Snapshot backups - for hyperscale DBs.

Long-term backup retention can extend up to 10 years by backing to Blob storage.

This goes beyond the 7 to 35 days backups for Azure SQL DB and Azure SQL Managed Instance.


Back