O'Reilly logo
live online training icon Live Online training

Designing and Implementing Big Data Solutions with Azure

Architect Data analytics solutions in the Cloud with Azure Data Lake, HDInsight, and Spark

Anindita Basak

This training aims to get you hands-on experience with Azure and its Advanced Data solutions for your day-to-day activities. You will explore ingestion of data for batch & interactive processing, Designing & provisioning of HDInsight clusters and applying security.

Moving further you will learn how to design Lambda Architecture and real-time big data processing solutions with use cases.

As we move ahead, you will learn to create alerts and monitor workflows while provisioning and developing Azure Data Factory complex pipeline. You will also learn how to perform extract, transform & load data for analysis with the help of a walkthrough of Azure ML experiment. Also, deployment of Cloud Analytics, with an end to end ARM real-time data processing pipeline template overview.

In all, the course will be a guide to gain effective knowledge with ample hands-on labs on Azure Advanced Data solutions. The course will provide you with an end to end overview of Azure Big Data solution services, starting from Batch processing to interactive data processing & finally operationalizing of data workflows on Azure to master your knowledge.

What you'll learn-and how you can apply it

  • Design principles of Big Data batch processing & implementation of interactive solutions
  • Data ingestion patterns from cloud-born or on-premise
  • Overview of various data storage mechanisms available on an advanced Azure analytics platform
  • Designing & provisioning of Azure Big Data Hadoop clusters
  • Optimizing Big Data security in Azure
  • Big Data real-time processing solutions on Azure
  • Design & provision of Azure real-time streaming Big Data clusters(e.g. HBase) Workflow on Lambda architecture design patterns
  • Architect Data analytics solutions workflow using Azure managed services
  • Orchestration of data pipelines, activities in a data-driven architecture
  • Build complex ETL solutions in Cloud-based solutions

This training course is for you because...

You are a Data Architect, Azure Architect, Data Management Engineer, Data Scientist & Big Data Developer who designs & implements Big Data solutions using the Microsoft Azure platform. If you wish to enhance your skills with refreshing the Azure Big Data solutions, this is the right course to understand & grasp knowledge on relevant Azure services & tools in a comprehensive manner.

Prerequisites

  • Working knowledge of Azure and Cloud

Resources

Materials, downloads, or Supplemental Content needed in advance

  • Valid Azure subscription
  • Software - Visual Studio 2017/VS code, SQL Server 2016/2017, Power BI Desktop, Azure PowerShell, Azure Storage Explorer.

About your instructor

  • Anindita Basak works as a Cloud & AI solution architect in data analytics and AI platforms and has been working with Microsoft Azure from its inception. With over a decade of experience, she helps enterprises to enable their digital transformation journey empowered with Cloud, Data, and AI.

    She has worked with various teams at Microsoft as FTE in the role of Azure Development Support Engineer, Pro-Direct Delivery Manager, and Technical Consultant. She has been a corporate trainer and a Cloud & Big Data consultant.

    She co-authored the book Stream Analytics with Microsoft Azure, authored a book Hands on machine Learning on Azure with Packt and was a technical reviewer for various technologies, including data-intensive applications, Azure HDInsight, SQL Server BI, IoT, and Decision Science for Packt. She has also authored two video courses on Azure Stream Analytics from Packt.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

DAY 1

Section 1: Ingestion of data for batch & interactive processing (30 mins)

  • Data Ingestion from cloud-born or on-premise to Azure
  • Various data storage capabilities using Azure services
  • Perform one-time data transfer to Azure
  • Design & provisioning of Azure Hadoop clusters

Demo/Lab 1: Data ingestion from on-premise to Azure using migration tools (10 min) Demo/Lab 2: Data storage using Azure data Lake store/Blob storage (10 min)

Break - 10 mins

Section 2: Design & provision of HDInsight clusters & apply security (30 mins)

  • Designing security in Azure HDInsight clusters
  • Data encryption & data masking, role-based & row-based security
  • Protection of PII data in Azure
  • Selection of appropriate batch processing tools, metadata definition

Demo/Lab 3 – Provisioning of Azure HDInsight cluster (10 mins) Demo/Lab 4 – Implementing data masking/role-based security on configured HDInsight cluster
(15 mins)

Break - 10 mins

Section 3: Data ingestion patterns for real-time processing (30 mins)

  • Selection of data ingestion sources
  • Design principles of row-key event tables in HBase HDI cluster
  • Azure data streaming services-Azure Stream Analytics
  • Overview of Azure HDI-Storm & Spark

Demo / Lab 5 – Provisioning Azure Stream Analytics job & streaming units (10 mins) Demo/Lab 6 – assigning of HBase resources (10 mins)

Break - 10 mins

Section 4: Designing of Lambda Architecture for Batch & Real-time processing (30 mins)

  • Lambda Architecture design pattern -Batch, speed & merging layers
  • Identification Azure analytics in lambda architecture
  • Utilization of streaming data
  • The utility of batch data for analysis

Demo /Lab 7: Azure Stream Analytics real-time data processing utilization & use cases (15 min) Demo /Lab 8: Batch data utilization (10 min)

DAY 2

Section 5: Orchestration data processing with Azure Data Factory (35 mins)

  • Design of Azure Stream analytics reference data streams, business logic & visualize the output
  • Overview of Azure data factory
  • Data Factory Activities, data sources & deployment of pipelines
  • Data-slicing & chaining multiple complex ADF activities

Demo/Lab 9: Provisioning of Azure Data Factory job & implementation of activity /pipeline (10 min) Demo/Lab 10: Development of ADF complex pipeline based on a schedule (10 min)

Break - 10 mins

Section 6: Monitor & Manage Azure Data Factory (35 mins)

  • Troubleshooting of ADF jobs-failures & root causes
  • Alerts of ADF jobs & monitoring of jobs
  • Start & stop of ADF jobs
  • Logging of ADF jobs

Demo / Lab 11: Creation of alerts in ADF & monitoring of workflows (10 mins)

Break - 10 mins

Section 7: Perform Extract, Transform & Load of data for analysis (35 Minutes)

  • Leverage of Azure HDI Pig, Hive & MapReduce for data processing
  • Copy data between on-premise & cloud data sources of HDInsight
  • Introduction to Azure Machine Learning experiment
  • Loading of data into Azure SQL DB & visualization with PowerBI

Demo/Lab 12: Walkthrough of Azure ML experiment for Batch processing data with scoring, retaining data sources (10 mins)

Break - 10 mins

Section 8: Deployment of Cloud Analytics (30 mins)

  • Azure PowerShell for automated deployment
  • Azure ARM template for an automation deployment strategy

Demo/Lab 13: An end to end ARM template overview on real-time data processing pipeline (15 mins)

Wrap up summary (15 mins)