Shubhajit-Ming

Monday, May 02, 2022

Machine Learning: Development and Deployment Workflow

 There is significant financial upside and business benefit in understanding how to avoid the

potential pitfalls of an AI development initiative so that you can quickly capture positive ROI. To get

a sense of the challenges in developing and deploying AI applications at scale—and why the right

expertise, partners, and development platform are critical—let’s look at what’s involved in a

machine learning development process. In this section, I outline the sequential workflow in

developing and deploying a machine learning AI application. This process is well understood by

machine learning experts.

1. Data Assembly and Preparation

The first step is to identify the required and relevant data sets, and then assemble the data in a

unified image that is useful for machine learning. Because the data come from multiple disparate

sources and software systems, there are often issues with data quality such as data duplication, gaps

in data, unavailable data, and data out of sequence. The development platform must therefore

provide tools to address those issues, including capabilities to automate the process of ingesting,

integrating, normalizing, and federating data into a unified image suitable for machine learning.

2. Feature Engineering

The next step is feature engineering. This involves going through the data and crafting individual

signals that the data scientist and domain experts think will be relevant to the problem being solved.

In the case of AI-based predictive maintenance, signals could include the count of specific fault

alarms over the trailing 7 days, 14 days, and 21 days; the sum of the specific alarms over the same

trailing periods; and the maximum value of certain sensor signals over those trailing periods.

3. Labeling the Outcomes
This step involves labeling the outcomes the model tries to predict (e.g., “engine failure”). Often the
specific outcomes are not clearly defined in the data since the original source data sets and business
processes were not originally defined with AI in mind. For example, in AI-based predictive
maintenance applications, source data sets rarely identify actual failure labels. Instead, practitioners
have to infer failure points based on combinations of factors such as fault codes and technician work
orders.
4. Setting Up the Training Data
Now comes the process of setting up the data set for training the algorithm. There are a number of
nuances to this process that may require outside expertise. For classification tasks, data scientists
need to ensure that labels are appropriately balanced with positive and negative examples to provide
the classifier algorithm enough balanced data. Data scientists also need to ensure the classifier is not
biased by artificial patterns in the data. For example, in a recent fraud detection deployment for a
utility, a classifier trained on historical cases on a large country-wide data set incorrectly identified a
number of suspected fraud cases on a remote island. Further examination revealed that because the
island is so remote and hard to access, investigators traveled there only if they were certain of fraud.
All historical cases investigated on the island were therefore true positive labels. Consequently, the
classifier always correlated the island location with incidence of fraud, so the algorithm had to be
adjusted.
5. Choosing and Training the Algorithm
The next step is to choose the actual algorithm and then train it with the training data set.
Numerous algorithm libraries are available to data scientists today, created by companies,
universities, research organizations, government agencies, and individual contributors. Many are
available as open source software from repositories like GitHub and Apache Software Foundation.
AI practitioners typically run specialized searches across these libraries to identify the right
algorithm and build the best-trained model. Experienced data scientists know how to narrow their
searches to focus on the right classes of algorithms to test for a specific use case.
6. Deploying the Algorithm into Production
The machine learning algorithm then must be deployed to operate in a production environment: It
needs to receive new data, generate outputs, and have some action or decision be made based on
those outputs. This may mean embedding the algorithm within an enterprise application used by
humans to make decisions—for example, a predictive maintenance application that identifies and
prioritizes equipment requiring maintenance to provide guidance for maintenance crews. This is
where the real value is created—by reducing equipment downtime and servicing costs through more
accurate failure prediction that enables proactive maintenance before the equipment actually fails.
In order for the machine learning algorithm to operate in production, the underlying compute
infrastructure needs to be set up and managed. This includes elastic scale-out and big data
management abilities (e.g., ingestion, integration, etc.) necessary for large data sets.
7. Closed-Loop Continuous Improvement
Once in production, the performance of the AI algorithm needs to be tracked and managed.
Algorithms typically require frequent retraining by data science teams as market conditions change,
business objectives and processes evolve, and new data sources are identified. Organizations need to
maintain technical agility so they can rapidly develop, retrain, and deploy new models as
circumstances change.
The science of AI has evolved and matured over the last several decades. We are now at a point
where not only are the underlying technologies available, but also organizations now have access to
domain experts, data scientists, and professional services providers that can help them harness the
power of AI for competitive advantage.

Business Benefits of AI
AI technologies deliver real business benefits today. In particular, technology companies like
Google, LinkedIn, Netflix, and Amazon use AI at large scale. McKinsey Global Institute (MGI)
estimates that technology companies spent $20 billion to $30 billion on AI in 2016.24 Some of the
most established applications for AI delivering concrete business benefits are in online search,
advertising placement, and product or service recommendations.
In addition to technology companies, sophisticated industries advanced in digitization, such as
financial services and telecom, are starting to use AI technologies in meaningful ways. For example,
banks use AI to detect and intercept credit card fraud; to reduce customer churn by predicting when
customers are likely to switch; and to streamline new customer acquisition.
The health care industry is just starting to unlock value from AI. Significant opportunities exist for
health care companies to use machine learning to improve patient outcomes, predict chronic
diseases, prevent addiction to opioids and other drugs, and improve disease coding accuracy.
Industrial and manufacturing companies have also started to unlock value from AI applications as
well, including using AI for predictive maintenance and advanced optimization across entire supply
chains.
Energy companies have transformed operations using AI. Utility companies use advanced AI
applications to identify and reduce fraud, forecast electricity consumption, and maintain their
generation, transmission, and distribution assets.
There are several emerging applications of AI in defense. Already, the U.S. military uses AI-based
predictive maintenance to improve military readiness and streamline operations. Other use cases
include logistics optimization, inventory optimization, and recruiting and personnel management
(e.g., matching new recruits to jobs).



0 Comments:

Post a Comment

<< Home