23.02.22
6 Min read
Book a Demo
Deep learning is at the center of most artificial intelligence initiatives. It is based on the concept of a deep neural network, which passes inputs through multiple layers of connections. Neural networks can perform many complex cognitive tasks, improving performance dramatically compared to classical machine learning algorithms. However, they often require huge data volumes to train, and can be very computationally intensive.
Cloud computing services are helping make deep learning more accessible, making it easier to manage large datasets and train algorithms on distributed hardware.
Cloud services are an enabler for deep learning in four respects:
Let’s briefly review the deep learning offerings of major cloud providers — Amazon, Google Cloud, and Microsoft Azure.
In each of these clouds, it is possible to run deep learning workloads in a “do it yourself” model. This involves selecting machine images that come pre-installed with deep learning infrastructure, and running them in an IaaS model, for example as Amazon EC2 instances or Google Compute Engine VMs.
All the cloud providers we review below offer compute instances suitable for deep learning models, which provide specialized hardware such as graphical processing units (GPU), field-programmable gate arrays (FPGA) and TensorFlow Processing Units (TPU). To learn about the compute options offered by each cloud provider, refer to our articles about:
Below, we focus on the platform as a service (PaaS) offering each cloud provides for deep learning users. These PaaS offerings provide the hardware needed for deep learning workloads, as well as software services for managing deep learning pipelines, from data ingestion to production deployment and real-world inference.
Amazon Web Services provides the SageMaker service, which lets you build and manage machine learning models on the cloud, with a focus on deep learning.
Google’s set of machine learning services, together called Cloud AI, includes general purpose and dedicated services for specific use cases:
Azure Machine Learning is a complete environment for training, deploying, and managing machine learning models.
Key features of Azure Machine Learning:
Here are a few key considerations when selecting your cloud-based deep learning service.
Data preparation can be one of the heaviest and most sensitive parts of a deep learning project. There are two common ways to prepare large volumes of data for analytics, which are also used to create deep learning datasets from raw data:
Check which data services are provided by your cloud vendor and whether they support ETL, ELT, or both. Understand which data storage, database or data warehouse services you will use, and how they can make data preparation easier.
Data scientists typically start by developing a model on a local notebook, but it is not feasible to train most deep learning models on a local workstation. A key capability of a cloud deep learning service is the ability to integrate with notebooks and push training jobs seamlessly to cloud-based compute instances.
Evaluate the process and how easy it is to run training jobs on hardware like GPUs, TPUs, and FPGAs, manage these jobs across data science teams, visualize and interpret their results.
Each cloud machine learning service supports different frameworks. You can typically get the broadest framework support in an IaaS model, when deploying deep learning directly on compute instances. However, if you use a full ML Ops platform, you will be limited to the frameworks it supports.
Look for support of the following frameworks, which your data science team may need to use now or in the future:
Also evaluate the ability to integrate your own code and algorithms with the platform’s library of built-in algorithms. This can improve productivity, because you can draw on existing building blocks and only develop unique aspects of your model.
Most cloud platforms provide pre-trained, pre-optimized AI services for many applications including:
The advantage of these types of services is that they have been trained on massive data volumes that are not available to individual companies. They can provide very high accuracy for general use cases, and provide excellent performance and low latency in production. Best of all, they are ready to use out of the box.
Deploying a model is only the start, not the end point, of your AI journey. Data changes and user requirements change, and it is essential to monitor a model’s performance over time, tune it, augment it, and if necessary, replace it. Evaluate the tools a cloud service provides for monitoring model performance when it is already in production, and how easy it is to release updates and improvements to live deep learning models.
Some innovative MLOps solutions automate resource management and orchestration for machine learning infrastructure. You can automatically run as many compute intensive experiments as needed.
For example, AI Orchestration Platform for GPU-based computers running AI/ML workloads provides:
While this solution is not meant for cost reduction (as it won’t necessarily result in decreasing the number of on-prem GPUs) it helps utilize the full capacity of existing GPUs. This simplifies machine learning infrastructure orchestration, helping data scientists accelerate their productivity and the quality of their models.
Contact GlobalDots today to get the most out of your resources with today’s latest MLOps solutions.
Originally published by our friends at Run:AI
Schedule a call with our experts. Discover new technology and get recommendations to improve your performance.