Microsoft-70-774-ExamPrep
Perform Cloud Data Science with Azure Machine Learning
https://buildazure.com/2017/02/09/70-774-perform-cloud-data-science-with-azure-machine-learning-certification-exam/ https://azure.github.io/learnAnalytics-public/cert_guides/certification_70-774_community_guide.html
Prepare Data for Analysis in Azure Machine Learning and Export from Azure Machine Learning
-
Import and export data to and from Azure Machine Learning. Import and export data to and from Azure Blob storage, import and export data to and from Azure SQL Database, import and export data via Hive Queries, import data from a website, import data from on-premises SQL
- Import from file by clicking
New
in bottom left corner - Import from the other sources
Data Input and Output -> Import Data
- Export using
Data Input and Output -> Export Data
- Import from file by clicking
-
Explore and summarize data. Create univariate summaries, create multivariate summaries, visualize univariate distributions, use existing Microsoft R or Python notebooks for custom summaries and custom visualizations, use zip archives to import external packages for R or Python
-
Univariate/Multivariate Statistics and Visualizations
- 1: Click on the output connector below the data set.
- 2: Select
Visualize
. This will bring up a helpful screen which includes:- A view of the dataset.
- A histogram (or boxplot, your choice) for numeric variables
- A bar chart for categorical variables
- 3: Select a column by clicking anywhere in the column itself. This activates the
Statistics
andVisualizations
options on the right.- Statistics (for categorical values, only last three provided)
- Mean
- Median
- Min
- Max
- Standard Deviation
- Unique Values
- Missing Values
- Variable Type
- Visualizations
- Bar Chart for Categorical (can log scale)
- Histogram
- log scale x or y
- change bin number
- overlay cum dist or prob dens
- Box Plot (can log scale)
- Statistics (for categorical values, only last three provided)
- 4: Under
compare to
drop down, select another variable for multivariate analysis- If 2 quantitative variables, scatterplot shown (can log scale)
- If 2 categorical variables, cross tabulation shown (table of frequencies)
- If 1 categorical and 1 quantitative, then side-by-side boxplots shown
- 5: Can also use the
Summarize Data
module to get additional insight (kurtosis, percentiles, variance, range)
-
Notebooks
- Can explore a data set in an existing notebook by simply opening the notebook on the left side
- Can create a new notebook to explore the data set by clicking on the output connector below data set, and select
Open in a new Notebook
-
R - Under
R Language Modules
dragExecute R Script
onto workspace- Select data input(s) and connect to R node
- Can also connect a zipped package as a dataset, and install via R script
-
Python - Under
Python Language Models
dragExecute Python Script
onto workspace- Use
azureml_main
function to connect one or two datasets into Python node - Use third input port to bring in zipped script bundle
Import mymodule
- Use
-
-
Cleanse data for Azure Machine Learning. Apply filters to limit a dataset to the desired rows, identify and address missing data, identify and address outliers, remove columns and rows of datasets
- Filter
Split Data
and use regular or relative expressionApply SQL Transformation
- can use this to filter rows
- Missing Data
Clean Missing Data
node can be used to address issues of missing data- Select which columns to clean
- Choose Cleaning Mode (Mean, Median, Mode, PCA, Custom, Remove row/column, MICE)
- Outliers
Clip Values
can define constant or percentage to clip peaks or subpeaks(or both)- Replace with mean, median, threshold, or missing
- Add indicator column to see if value was clipped
- Remove Columns and Rows
Select Columns
Remove Duplicate Rows
- Filter
-
Perform feature engineering Merge multiple datasets by rows or columns into a single dataset by columns, merge multiple datasets by rows or columns into a single dataset by rows, add columns that are combinations of other columns, manually select and construct features for model estimation, automatically select and construct features for model estimation, reduce dimensions of data through principal component analysis (PCA), manage variable metadata, select standardized variables based on planned analysis
- Merge
Join Data
connect two datasets to join- Select key column(s) for both datasets
- Match case?
- Type of join?
- Feature Selection
- Filter-Based Feature Selection can select how many variables to select, and which metric will be used to determine
- PCA
- Select columns and number of dimensions.
- Specify whether to normalize to zero mean
- Variable Metadata
Edit Metadata
can be used to change variable type and/or role
- Standardized Variables
- Select variables to normalize and transformation method
- Merge
Develop Machine Learning Models
-
Select an appropriate algorithm or method
- Select an appropriate algorithm for predicting continuous label data, select an appropriate algorithm for supervised versus unsupervised scenarios, identify when to select R versus Python notebooks, identify an appropriate algorithm for grouping unlabeled data, identify an appropriate algorithm for classifying label data, select an appropriate ensemble
-
Initialize and train appropriate models
- Tune hyperparameters manually; tune hyperparameters automatically; split data into training and testing datasets, including using routines for cross-validation; build an ensemble using the stacking method
-
Validate models
- Score and evaluate models, select appropriate evaluation metrics for clustering, select appropriate evaluation metrics for classification, select appropriate evaluation metrics for regression, use evaluation metrics to choose between Machine Learning models, compare ensemble metrics against base models
Operationalize and Manage Azure Machine Learning Services
- Deploy models using Azure Machine Learning
- Publish a model developed inside Azure Machine Learning, publish an externally developed scoring function using an Azure Machine Learning package, use web service parameters, create and publish a recommendation model, create and publish a language understanding model
- Manage Azure Machine Learning projects and workspaces
- Create projects and experiments, add assets to a project, create new workspaces, invite users to a workspace, switch between different workspaces, create a Jupyter notebook that references an intermediate dataset
- Consume Azure Machine Learning models
- Connect to a published Machine Learning web service, consume a published Machine Learning model programmatically using a batch execution service, consume a published Machine Learning model programmatically using a request response service, interact with a published Machine Learning model using Microsoft Excel, publish models to the marketplace
- Consume exemplar Cognitive Services APIs
- Consume Vision APIs to process images, consume Language APIs to process text, consume Knowledge APIs to create recommendations
Use Other Services for Machine Learning
- Build and use neural networks with the Microsoft Cognitive Toolkit
- Use N-series VMs for GPU acceleration, build and train a three-layer feed forward neural network, determine when to implement a neural network
- Streamline development by using existing resources
- Clone template experiments from Cortana Intelligence Gallery, use Cortana Intelligence Quick Start to deploy resources, use a data science VM for streamlined development
- Perform data sciences at scale by using HDInsights
- Deploy the appropriate type of HDI cluster, perform exploratory data analysis by using Spark SQL, build and use Machine Learning models with Spark on HDI, build and use Machine Learning models using MapReduce, build and use Machine Learning models using Microsoft R Server
- Perform database analytics by using SQL Server R Services on Azure
- Deploy a SQL Server 2016 Azure VM, configure SQL Server to allow execution of R scripts, execute R scripts inside T-SQL statements