Artificial Intelligence | Lesson 1.3

Machine Learning (ML)

Most of the success of AI has been owed to a specific type of AI called ML. We hope you start thinking about how ML might apply to your organization or industry.

First, we embrace a broader definition that includes less flashy applications.

This is our definition of AI:

Software that solves a problem without explicit human instruction.

As you can see, the definition focuses on the outcome of the technology rather than the specific techniques used to build it. Some people will not agree with it because they believe this is the definition of ML. However, learning is a characteristic of an intelligent entity, and while ML is just a tool, it is the tool behind 99% of the successful applications we call AI today. This may change in the future, but no new approaches on the horizon hold the same promise as ML. It is simply the most accurate picture of the AI landscape of today and the near future. We found the following paragraph by Tanmay Bakshi, an ML engineer in IBM, quite interesting about the definition of AI:

“AI is really a complex series of layers of algorithms that do something with the information that’s coming into it. AI is a set of technologies that allows us to extract knowledge from data. So it’s any kind of system that learns or understands patterns within that data, and can identify them, and then reproduce them on new information. AI is not the kind of simulating human intelligence that people think it is. It’s really not about intelligence at all. But I think another word that describes AI more accurately today is ML. The reason I say that is because ML technology is all about using mathematics essentially on computers in order to find patterns in data. Now, this data can be structured or unstructured. So, AI is a set of mathematical algorithms that enable us to have computers find very deep patterns that we may not have known existed, without us having to hard code them manually”


What is Machine Learning? by 365 Data Science

 

Supervised Learning: Input to Output

The most commonly used type of ML is to learn how to predict B based on the value of A or input to output mappings. If the input A is an email and the output B is whether the email is spam or not. If it is spam, we mark it with the value of 1; if it is not, we assign the value of 0 to its status label. This type of ML is called supervised learning.

There are two main categories of supervised learning, “Classification” and “Regression”. The spam detector is an example of classification.

  • In classification, we want to identify the class of samples based on their features. In the spam detector, the possible classes are (0): if it is not a spam email and (1): if it is a spam email. “The classifier” would receive the emails and output whether or not the email can be spam. The number of classes can be more than two, but it has to be finite.
  • “Regression” is a ML subset that estimates a real value based on the features of a sample. Later in the course, we talk about a price predictor for real estate. “The regressor” would receive information about a house, such as the number of bedrooms, location, the square footage, and estimates the selling price for the house and predicts the house value.

The difference between regression and classification is the output of classification is limited to a specific number of classes. However, in regression, the output can be any number greater than zero. Most of the AI applications you see in real life are ML examples. If the input is an audio clip, and the AI’s job is to output the text transcript, then this is speech recognition.

The most lucrative form of supervised learning, online advertising, is where many tech companies are making most of their profit. They have a piece of AI that inputs some information about an ad, and some information about you, and tries to figure out which ad you are most likely to click on. This application turns out to be very lucrative, one that might not be the most inspiring, but it certainly has a huge economic impact today. The following table shows a few examples of AI applications in today’s world:

Table 1.1 Examples of Machine Learning in Everyday Work 
Input Output Application
Email Spam? (0/1) Spam Detection
Audio Text Transcript Speech Recognition
English Text Chinese Text Machine Translation
Ad, User Info Did the User Click? (0/1) Online Advertising
Image, Radar Info Steering the Wheel, Accelerate, Break Self-Driving Car
Sample Image Defect? (0/1) Visual Inspection

 

Why Now?

The idea of supervised learning has been around for many decades. However, it is taken off in the last few years. To illustrate why this has been the case, take a look at Figure 1.2. The horizontal axis represents the amount of data you have for a task. For example, for speech recognition, this might be the amount of audio data as your input, and also transcripts are what you want to predict.

With the surge in the use of computers and the emergence of the internet, the amount of data in many industries has grown immensely. A lot of what used to be pieces of paper are now recorded on a digital computer. Therefore, we have been getting more and more data ready to be processed.

The vertical axis is the performance of an AI system. If you use the traditional AI methods, performance will grow like the following. Initially, As you increase the amount of data you use, the performance improves steadily, but at a certain point, it does not get significantly better (indicated by the red line in the figure below). This means that your AI did not get that much more accurate, even if you use more data. AI has taken off recently due to the rise of certain types of ML called Neural Networks (NN) and Deep Learning (DL). We will define these terms more precisely later in this section. But if you use a small NN, the performance looks like the blue line in Figure 1.2, where, as the more data you feed, the performance gets better for much longer. If you use an even slightly larger NN, say a medium-sized NN, then the performance may look like the purple color line. If you use a very large NN, the performance keeps improving (indicated by the green line in the figure). For applications like speech recognition and online advertising, with a high-performance AI module, your systems get much better, making your system’s performance much more acceptable to users and hence much more valuable to your company.

 

The performance comparison of traditional AI models and Neural Networks of different sizes based on the amount of available data.

Figure 1.2 | The performance comparison of traditional AI models and Neural Networks of different sizes based on the amount of available data

 

Summary

  • If you are looking for the best performance, you first need a lot of data. That is why the concept of big data has been developed.
  • Secondly, you want to be able to implement a very large neural network (NN), which requires a lot of computing power. With the advent of faster processing units, many companies, not just giant tech companies, can implement large NN on a large enough dataset to get better performance and drive business value.
  • However, NN are not perfect. Later in the course, we discuss why you may lean towards the more traditional models rather than the high-performing neural networks.