This is the first of a three-part series. This article covers the basics of machine learning. The second part will dive deeper into Microsoft Azure Machine Learning and how to access it via web services. Finally, the third part will go through some real-world examples.
Data science is one of the hottest jobs these days, and that’s no surprise given the wealth of information it provides us. Data science involves using and manipulating data to gain useful insight or knowledge. The kind of information we get from data science can be applied to several vital areas like fraud detection, sales forecasting, and image and language recognition, to name just a few.
Despite its growing popularity as a career choice, being a data scientist isn’t easy. It’s a role that draws from various disciplines, including mathematics, data visualization, programming and domain knowledge—skills that can take years to master.
Classical rule-based programming approaches fail to process larger and more unstructured pieces of data. The machine learning models built by data scientists empower companies to now automate decision-making processes by moving away from rule-based approaches, allowing the algorithms to learn from the data itself.
What exactly is machine learning?
Machine learning (ML) is a subfield of artificial intelligence (AI). Instead of writing and relying on code to find and exploit patterns in data, ML makes it possible for us to simply supply the data and let the computer system find those patterns for us.
Contrary to popular belief, advanced degrees aren’t necessary for building good machine learning models and machine learning enabled applications. Azure Machine Learning (Azure ML) is a platform that allows developers and data scientists to build regression, classification, clustering and other types of machine learning models without the need to understand the complex math behind the algorithms.
How can a computer learn from data?
Before we tackle how Azure ML works, it’s important to first discuss machine learning basics. A simple explanation of machine learning is that machine learning uses features to make a prediction on a datapoint’s label.
Features are the attributes of the dataset used to make a prediction on a label. For example, when attempting to predict the price of a house, the algorithm could look at features such as the size in square meters, the number of bedrooms or the number of bathrooms.
The label, on the other hand, is the attribute the model is trying to predict. In the housing example, the label is simply the price of the house. A machine learning model looks at the features of a particular house, learns what the labels are for those features and when the model sees new data, it can make a prediction on what the label is for that new piece of data.
There are two important types of machine learning algorithms: supervised learning and unsupervised learning.
Supervised learning
Supervised learning algorithms require pre-labeled, sometimes called a priori, data. Domain experts or internal business analysts should label the data before feeding it into the machine learning model. Regression and classification are two examples of supervised learning.
A situation when supervised learning may be used is for image classification. For example, if a developer wants to build a model to identify if a picture is a cat or dog, the algorithm will need several pictures of cats and several pictures of dogs to build the machine learning model and identify patterns in the dataset. This process is also called training the machine learning model. Once the model is created and sees a new picture of a cat or dog, it should be able to accurately identify the picture.
Unsupervised learning
Unsupervised learning does not require labeled data. The idea of unsupervised learning is that the algorithm can divide the dataset into smaller groups of data depending on the distance between the data points. Anomaly detection and clustering are both examples of unsupervised learning.
An example of when unsupervised learning is used is during fraud detection. Algorithms such as random cut forest can baseline “normal behavior” in a dataset, and when something is considered unusual, it can be flagged as a potentially fraudulent transaction.
Machine learning operations
Supervised and unsupervised learning cover the basics of machine learning. What Azure ML provides is not only a method for building and training a model, but also a method for productionalizing the machine learning model, often known as inferencing.
When preparing a dataset for training, in most cases, data scientists cannot just give the dataset to an algorithm and have it take care of the processing. The dataset must first undergo a process called feature engineering to ensure all features are numerical, and even the numerical values are optimized for training the machine learning model. For example, in image classification, the algorithm will not understand what a picture is. The picture will need to be converted into a series of numbers before it’s fed to the algorithm to be trained.
Even after the model is trained, the process is still not complete. That model also needs to somehow be usable by an application. Azure ML also provides managed infrastructure that allows data scientists to deploy their machine learning models as a working endpoint. Any application can then easily remotely trigger the endpoint by simply connecting to its application programming interface (API).
Azure ML allows data scientists to prepare, train and deploy a machine learning model in a single pipeline. It provides experienced developers with methods of customizing every stage of this pipeline, while also allowing those who aren’t familiar with code to build the model with a simple visual interface.
Conclusion
Machine learning is an incredibly complex topic, so it is important to start with the very basics of machine learning. Understanding the fundamentals such as “What is machine learning?” will take you far. Learning about basic machine learning paradigms such as supervised and unsupervised learning, and familiarizing yourself with machine learning algorithms such as classification, regression, and clustering are essential to starting your journey to become a qualified data scientist.
In this article, we discussed machine learning basics, answered basic questions such as, “What is machine learning,” and introduced some basic machine learning algorithms and examples. Now that you understand the basics of what is going on when building, training, and deploying machine learning models, you can start implementing it yourself. In the next part of this series, there will be a hands-on machine learning tutorial using AzureML.
5 keys to successful organizational design
How do you create an organization that is nimble, flexible and takes a fresh view of team structure? These are the keys to creating and maintaining a successful business that will last the test of time.
Read moreWhy your best tech talent quits
Your best developers and IT pros receive recruiting offers in their InMail and inboxes daily. Because the competition for the top tech talent is so fierce, how do you keep your best employees in house?
Read moreTechnology in 2025: Prepare your workforce
The key to surviving this new industrial revolution is leading it. That requires two key elements of agile businesses: awareness of disruptive technology and a plan to develop talent that can make the most of it.
Read more