Azure Machine Learning Workspace
The workspace is the top-level centralized resource for your machine learning deployments. It holds all of the artifacts used to train and deploy your models.
It keeps a run configuration which is the set of instructions to execute your program in a compute target. It also stores standard run metrics as well as any arbitrary metrics you want to record during your training process.
The workspace provides the tools, components and triggers to develop, train, deploy and monitor a machine learning project. It also includes security, cost management and access control features.
Workspace also provides a unified model registry that stores snapshots and output from experiments, as well as logs. It encrypts these images with Microsoft-managed keys before storage. It also has a key vault that protects the managed identities used to communicate with other services.
It also tracks the run history of models, allowing you to easily compare them. In addition, it has a feature that allows you to trace a deployed model back to the experiment, dataset, and algorithm it came from. This is useful if your customers ever question the accuracy of a prediction they received. You can show them that the model was trained on turtles, not rifles, for example. You can also track what changes were made to the model in a deployment. This gives you complete visibility into your model’s evolution.
Azure Machine Learning offers a wide range of modules for Data Preparation, Feature Engineering and Machine Learning Algorithms. It also provides modules to easily evaluate a model’s performance using industry-standard metrics.
It supports various Datastores like Premium Blob Storage, which enables high-performance access during training by mounting blob storage containers on compute targets with local caching. It can also be used to store a dataset in tabular format, which makes it easier for developers to materialize it into Python-based analytics libraries such as Pandas and Spark DataFrame.
A default workspaceblobstoragedataset is set up during the authoring process and is a good place to temporarily save files used for ML experiments. However, it has file and storage size limitations that impact a large-scale project. It is recommended to use a different datastore or databricks cluster as the machine learning repository for such projects.
A workspace provides a fully managed environment for your machine learning workflows. It includes a Jupyter notebook service, Azure ML Python SDK and CLI, and other tools. It also provides a compute target for ML training. A compute target is an unmanaged VM that you create outside of the workspace.
A model can be trained inside the workspace using any popular ML framework, including scikit-learn, XGBoost, PyTorch, TensorFlow, and Microsoft Cognitive Toolkit (CNTK). Once you’ve finished training a model, it’s registered in the model registry. This makes it easier to find and manage your models.
A single workspace allows teams to share a single environment for experimentation. This helps improve productivity and reduces costs by separating non-project-related work from project-related costs. It can also help you manage your Azure footprint by limiting the number of resources per team. This approach, however, can make it harder to identify and filter assets. It’s recommended to develop naming and tagging conventions for this purpose.
The Azure Machine Learning service provides a set of capabilities to deploy models for online inference. MLflow supports both real-time inference and batch inference. Real-time inference is supported through managed online endpoints that provide a turnkey experience without the need for you to specify an environment or scoring script.
The online endpoints are deployed as a REST-service based HTTP endpoint. It is hosted on a compute cluster (Databricks, HDInsight or ML managed) or VM, depending on the workload. The clusters offer GPU-enabled compute choices for demanding workloads like Natural Language Processing.
The workspace is a top-level centralized resource that provides all the artifacts for the model development process including training runs, running scripts, and deployment images. The workspace also holds the list of computing targets used to train a model and a log of the training run execution, metrics, snapshots and outputs. You can access the workspace through the UI and the Python SDK and CLI.