Introduction
In this article, we are going to discuss AutoML in Azure Machine Learning Service. During our previous discussions, we had discussed features of Azure Machine Learning (Classic) and in this article, we are moving our discussion to Azure Machine Learning service.
As you remember, to utilize features of Azure Machine Learning (Classic), you do not need an Azure Subscription. However, for the Azure Machine Learning service, you need to have Azure Subscription. To enable the Azure Machine learning service, you need to add this service from the Azure Portal as shown below.
Once the Azure Machine Learning services is created by providing the resource group’s details and other relevant details, you can launch the Azure Machine Learning services from Go to Resource option and Launch option.
After those options were selected, you will be taken into the following screen.
Before moving into the Azure Machine Learning service, let us look at the details of AutoML in details.
What is AutoML
If you remember, we have discussed different techniques that would be used to solve business problems such as Regression analysis, Classification Analysis, Clustering, Recommender Systems and Anomaly detection of Time Series in Azure Machine Learning in previous articles. However, things were not simply using any techniques as we had to perform a lot of other activities such as basic cleaning techniques, feature selection techniques, Principal component analysis, Comparing Models and Cross-Validation and Hyper Tune parameters to improve the accuracy. This amount of work may be too much for a developer who does not have much knowledge of statistics etc.
The following diagram explains the different processes that you need to follow during the Machine Learning process.
As you can see in the above diagram, standard Machine Learning practice is more complex as you have a lot of steps to be completed before the modelling. In the case of AutoML, it will perform all the tasks for you and provide the best model. You only need to deploy the provided best model and use it.
AutoML in Azure Machine Learning
There are many frameworks for AutoML and let us look at how Azure Machine Learning supports AutoML.
Once the New Automated ML run option is selected, you will be taken to a new screen and first, you need to configure a dataset for the Machine Learning Process.
We have selected the popular Vote dataset which describes how different senate members of the USA have voted for different acts.
In the above example, we have uploaded the vote CSV file. After the dataset is created, then you can select the dataset as shown in the below figure.
Next is to configure AutoML in Azure Machine Learning.
You need to provide the experiment name. In this classification problem, we need to configure the classification target column. In this example, Class is the target attribute. Please note that the class attribute indicates what is the party that each senator belongs to.
In AutoML in Azure Machine Learning, you have the option of running the Machine Learning process in separated hardware which is a unique feature. Since Machine Learning needs a large volume of data, you need scalable hardware to build Machine Learning models. You can define the required hardware by using Create a new compute option as shown in the below screenshot.
You can choose dedicated hardware with the necessary configurations. You will know the pricing before using the hardware. Further, you can increase or decrease the capacities of the hardware depending on your need.
Further, you configure the settings for the compute nodes as shown in the below screenshot. You can define the minimum and the maximum nodes that should be used during the AutoML process.
After the compute nodes are defined, you need to select the task type.
As you can see in the above screenshot, AutoML in Azure Machine Learning supports three types of tasks that is Classification, Regression and Time Series forecasting. We will discuss the Classification task in this article and will leave the Regression and Time Series forecasting discussion for a future article.
First, we need to configure the accuracy matrix such as accuracy, AUC (Area Under the Curve) weightage, recall, precision, F1 Measure and MCC. This means that the highest value with the selected matrix will be selected as the best model. In this example, AUC weightage is selected as the accuracy matrix.
Sometimes, you do not need some of the algorithms to consider for Machine Learning modelling. You can include those algorithms in the Blocked Algorithms so that those algorithms are excluded during the AutoML execution.
Since there are multiple iterations, we need to set what is the exit criteria, if not AUtoML will execute indefinitely. In this example, 24 hours was set as the exit criteria. We have used a validation split of 70/30 for train and testing and maximum concurrent iterations are set to 2.
With these configurations, now we are ready to execute AutoML in Azure Machine Learning.
Results
Now let us see the results of AutoML in Azure Machine Learning. First, it provides the details of the AutoML execution.
It provides the created and started time for the AutoML execution. Again, it provides the better algorithm out of the existing algorithms which are MaxAbsScaler, GradientBoosting. Further, this will indicate the parameter value of the selected accuracy parameter that was selected.
Then from the Models tab, you can see all the models related to the selected AutoML as shown in the following figure.
It will show what are the accuracy parameter values for each algorithm in descending order. This means the best algorithms are shown at the top.
One of the important tasks in AutoML in Azure Machine Learning is to perform the data preprocessing task. Those preprocessing tasks are performed automatically in AutoML.
Those tasks are shown under the Data Guardrails as shown in the following figure.
There are three preprocessing tasks, that are related to classification as shown in the above figure. Those options are:
- Class balancing detection -> Since we are splitting the data into train and test datasets, we need to split the data with the same percentage of the target attribute into the train and test dataset
- Missing feature values imputation -> In a large dataset, there can be missing values. For better results, we need to fix these missing values. In the case of numerical values, it will be the average while for the nominal values, it will be the most frequently used values
- High cardinality feature detection ->High cardinality values are detected if available
As you can see, all the rules were passed. The following are the details for the best algorithm for the selected data set in AutoML in Azure Machine Learning.
Though we have selected AUC weighted as the selected parameter, we need to look at how the other parameters look like as shown in the following figure.
You can deploy the selected model to either Azure Kubernetes services or Azure Container Services. Further, you can download the model script as well.
Conclusion
AutoML is a new concept that has emerged to facilitate the Machine Learning process. With AutoML in Azure Machine Learning, you need to provide the dataset with minimum configurations. In the selected dataset, few automatic preprocessing tasks are executed. In the AutoML, the best algorithm is selected for the given accuracy parameter and this can be deployed to production.
In the AutoML in Azure Machine Learning, it provides options for Classifications, Regression and Time Series Forecasting. This article has discussed how the classifications are done in AutoML.
References
- https://www.microsoft.com/en-us/research/project/automl/#!videos
- https://docs.microsoft.com/en-gb/azure/machine-learning/concept-automated-ml
- https://docs.microsoft.com/en-gb/azure/machine-learning/how-to-configure-auto-features
- https://docs.microsoft.com/en-gb/azure/machine-learning/concept-manage-ml-pitfalls
Table of contents
- Testing Type 2 Slowly Changing Dimensions in a Data Warehouse - May 30, 2022
- Incremental Data Extraction for ETL using Database Snapshots - January 10, 2022
- Use Replication to improve the ETL process in SQL Server - November 4, 2021