Artificial Intelligence | Lesson 2.5

Supervised Learning

It is the area of machine learning that has the most applications in industry and research today. Our example so far belongs to this subset: it uses a set of techniques that allow machines to learn a mapping between a set of information called features and a target value called a label. Machine learning methods will automatically find this formula, no matter how complex it is. A successful model will make predictions that are consistent with the labels, thus transferring the experience embedded in the dataset to new cases. Figure 2.4 shows how features and labels interact in supervised learning applications.

The main difference between AI and the traditional models is that the traditional models worked based on a formula that an expert had come up with. The core idea of machine learning is that we can use the features and the labels in the data to have the computer autonomously learn the relationship between them. Having stored this information, the goal is for the model to replicate the same relationship on unseen examples (new homes), predicting a label (the house price), given a set of features (the house characteristics) without the need for interference from any individual.

Fig 2.4. Refer to text description.

Figure 2.4 | The core concept of supervised learning: finding a mapping between a set of features and a label.
Attribution: Zero to AI, Figure 2.4. Nicolò Valigi and Gianluca Mauro. Link to fig 2.4 source. All rights reserved.

Fig 2.4 Text Description

Core concept of Supervised Learning:

  • Features – A set of parameters affecting a phenomenon
  • Supervised Learning algorithm – learned how to trap the features to predict the label
  • Label – A target parameter we want to predict, based on the features

AI for your organization

You will see several ways that an organization might approach AI problems.

This checklist is a great set of questions that one needs to answer before starting to implement AI

  • Meaningfulness: Is the problem meaningful and serves what the organization cares about? (Does it serve the organization’s goals?)
  • Well-boundedness: Is everything that needs to be developed clear? Do we know what the inputs and outputs for the AI module are?
  • Data richness: Can we obtain the data (inputs and outputs) required within the time frame of development?
  • Complexity: Is the problem complex enough so that it is not addressable with traditional methods efficiently?
  • At scale: Does the project’s benefit outweigh the development cost?

Another strategy is setting the first questions relative to AI: What are other enterprises in your domain choosing as their AI questions? They can oftentimes be a very good lead for you in thinking about AI problems. You may take a situation-based strategy or approach. All of us sit in meetings or see decisions that are taken in ways that you may leave a meeting or a decision-making session thinking, “There’s got to be a smarter way to do this using the data.” This is a good point to start thinking about AI. We tend to make sure that the AI decisions we make are driven more by core business processes. You may find that third-party vendors or modeling experts in other departments of the enterprise can easily be utilized to find the target that you do eventually pick.

Your organization may be awash in quantitative data, sometimes called “thick data.” It can be language data, video, audio, visual, IoT data or any of the streams that you might be able to get value out of them. The famous article in this area that has driven some of our thinking is the idea from Tricia Wang, the data ethnographer, that big data needs thick data. You should be thinking about, for example, the data that you have around you and the benefit that analyzing it may bring. And I highly recommend reading her work in particular, as you are thinking about the broadest set of data that you have available and zeroing in on your strategic needs relative to AI. We dive deeper into this discussion in the next section.

One of the ways that Dr. Wang’s research and her company have us thinking about data is that problems initially require discovery rather than optimization.

A variety of tools exist for dealing with all of the various types of collected data. Consider using ethnographic methods to collect the voice of the customer. For instance, surveys, interviews, or other sorts might be a great start to better understand how your business values are changing over time. One way to jumpstart AI in a particular area is using tools that have AI tasks embedded within them. Professional organizations such as the International Institute of Analytics (IIA) benchmark and carry out some of these discussions in addition to all of the major consultancies in the IT world, such as Gardener and so on. Remember, to use the information extracted by AI, you do not necessarily need to know the model and algorithms used. Just like a weather broadcaster, as long as you understand the output, you can get a lot of insight into how to utilize it in a way that matter to you and your organization.

AI for sales and marketing

At this point, we have explored the role of structured data in various business applications. We will cover various marketing problems and explore how you can use AI and data science to strengthen and improve the relationship between your organization and its customers.

Why AI for sales and marketing

One of the main goals of marketers is to find the best way to offer the right product to the right customer at the right time. We want to start by giving you a little insight into why AI changes everything. Every marketer knows that not all customers are alike and that they respond best when they are engaged with a personalized message. A common marketing strategy is to divide customers into segments according to demographics or similar aspects. A simple segment can be “wealthy women between 25 and 30 years old who spend more than $1,000 per year on entertainment.” A marketer can create a custom message to reach this category of people, which is different from what will be done for other segments. While this technique is as old as the marketplace, it really was the best we could do before AI came on the scene.

The problem with this approach is that no matter how specific you get with your segmentation (marketers talk about micro-segmentation), you will always end up in a situation where two customers get treated exactly the same even if they are fundamentally different, just because they fall into the same category. There is a limit to the number of categories a human brain can manage. Just think about how many of your friends have similar characteristics to you on paper (same age, neighborhood, education) but have completely different tastes.

As you can see in Figure 2.5, the traditional marketing segmentation approach cannot target Marc for being Marc. It will always target him as a “male between 25 and 30 years old who lives in a large city.” AI changes the game’s rules because it can process much more information. With AI, you can reach personalization at scale, learning about people from their specific actions and characteristics and targeting them for who they really are, not for the handcrafted bucket they fall into.

Fig 2.5. AI personalization versus traditional marketing segmentation as described in the content above.

Figure 2.5 | AI personalization versus traditional marketing segmentation.
Attribution: Zero to AI, Figure 3.1. Nicolò Valigi and Gianluca Mauro. Link to source. All rights reserved.

What does the ability of such fine-grained personalization mean for a business? Companies specializing in AI for marketing can show some astonishing metrics that would be every marketer’s dream. An example is Amplero, a US company specializing in AI-driven marketing. Here are some of the results it reports in its marketing material:

  • It helped a major European telco increase the first 30-day average revenue per user from 0.32% to 2.8%, an almost 800% increase.
  • It reduced the customer acquisition cost (CAC) of one of the top five North American mobile carriers by over 97%: from $40 per customer to just $1.
  • It managed to retarget the unhappiest customers of a major European mobile carrier three weeks before they were canceling their plans, created a more meaningful customer experience for re-engaging them, and increased retention rates from 2% to 10%.

Marketing is a complex function, so instead of listing all the possible applications, we will focus on three general problems that apply to most businesses:

  • Identifying which customers are likely to leave your service (churn)
  • Identifying which customers are likely to buy a new service (upselling)
  • Identifying similar customer groups (customer segmentation)

Predicting churning customers

One of the most important marketing metrics is customer churn (also known as attrition or customer turnover). The churn is defined as the percentage of customers leaving a business over a period of time. We use AI to know which customers are unhappiest and most likely to abandon a product or service in the near future. This is exactly how AI can help solve the problem of customer churn. Using machine learning and the organization’s data assets, we can find the customers who are most likely to leave a service and reach out to them with personalized messages to bring their engagement up again. Next, we will show how a churn predictor works, giving you the confidence to see opportunities for this application in your organization.

In this machine learning (ML) problem, we have two classes of customers: the ones who are likely to churn and the ones who are not. Therefore, the label that our ML model will have to learn to predict is whether the customer belongs to one class or the other (Say customers who are about to churn belong to class 1, and the others belong to class 0). For instance, a telephone company may label with “churned” all the customers who dropped out of its phone plan, while with “not churned” all the others who are still on their plan.

What we have just described is a supervised learning problem. An ML algorithm is asked to learn a mapping between a set of features (customer characteristics) and a label (churned/not churned) based on historical data. Let us recap the necessary steps to solve it, as visualized in Figure 2.6:

  1. Define an ML task starting from a business one (identifying customers who are likely to leave our service).
  2. Clearly identify a label: churned or not churned.
  3. Identify the features: elements of a customer that are likely to influence the likelihood of churning. You can come up with possible examples by thinking about what you would look at if you had to do this job by yourself:
    • Age
    • How long has the customer used the service
    • Money spent on the service
    • Time spent using the service in the last two months
  1. Gather historical data of churned and active customers.
  2. Train the model: the ML model will learn how to predict the label, given the features.
  3. Perform inference: use the model on new data to identify which of your current customers are likely to churn.

Notice that the label must be found retroactively by looking at past customer records. Consider the easiest situation first. Assume you have a subscription-based business model, like Netflix or Spotify. Subscriptions are usually renewed automatically, so customers have to actively pursue an option to cancel the subscription. That means to call customer service in case of a phone company or go to the website and turn off automatic renewal in the case of Netflix or Spotify. In these situations, finding your label is easy. There is no doubt about whether a customer is still on board or not, and a clear database table exists that can tell you exactly when that happened. Sometimes it is not as clear as Netflix. In those cases, you need to come up with a heuristic way of finding them (for example, a supermarket, it might give loyalty cards to customers and monitor if they have stopped coming).

Once you produce some labels to distinguish “happy customers” from churned ones, the situation becomes similar to the example of house-price prediction we have seen before. Luckily, the training data for churn prediction can be extracted from customer relationship management (CRM) data from up to, say, 18 months ago, and then label whether customers have churned in the past 6 months.

Figure 2.6 | The process of creating and using an ML model, from its definition to its usage in the inference phase.
Attribution: Zero to AI, Figure 3.2. Nicolò Valigi and Gianluca Mauro. Link to source. All rights reserved.

Fig. 2.6 Text Description

Step 1:

  1. Define the task
  2. Identify the label
  3. Identify the features

Step 2:

  1. Gather historical data
  2. Train the model
  3. Use the model (inference)

By now, you are already much more confident and effective in defining a label for a churn-prediction project than most business managers. Every data scientist will be thankful for that, but if you really want to help them, you need to put in the extra effort and help them select features (see Figure 6).

If this sounds to you like a technical detail, you are missing a great opportunity to let your experience and domain knowledge shine. In an ML problem, remember that a feature is an attribute of the phenomenon we are trying to model that affects its outcome. Assuming you are an expert in your business, no one in the world has better insights about the relevant features, and your expertise can help your data science team follow a path that leads to successful results.

To give you an idea of what your contribution may look like, ask yourself, “If I had to guess the likelihood of churn of just one customer, what parameters would I look at?” This can inform the conversation with an engineer:

Engineer: Do you know what is affecting the customer churn? I need to come up with some relevant features.

Marketer: Sure, we know that the payment setup is highly relevant to churn. Usually, someone who has a contract instead of a prepaid card is less likely to abandon the service because they have more lock-in. It’s also true that when we’re close to the expiration date of a contract, customers start looking at competitors, so that’s another factor.

Engineer: Interesting. For sure, I’ll use a feature in the model that expresses “contract” or “prepaid.” Another feature will be the number of days to the expiration of the contract. Anything else?

Marketer: Sure, we know that age plays a big role. These young millennials change companies all the time, while older people are more loyal. Also, if someone has been our client for a long time, that’s a good indicator of loyalty.

Engineer: Nice; we can look in the CRM and include a feature for “days since sign-up” and one for age. Is age the only interesting demographic attribute?

Marketer: I don’t think gender is; we never noticed any impact. The occupation is important: we know that the self-employed are less eager to change plans.

Engineer: OK, I’ll try to double-check whether gender has any correlation with churn. Regarding the occupation, that’s a good hint. Thanks!

Remember to keep communicating with your AI team. A conversation like this can go on for days, usually with a constant back-and-forth between the engineers and you. You will provide your experience and domain knowledge, and the engineer will translate that into something readable by a machine. Eventually, the engineer will come back with some insight or questions that came out of the data analysis and that require your help to interpret. As you can see, it is not a nerd exercise. It is a team effort between the business and the nerds.\

A graphical representation of the buying pattern behavior of churned and active customers as described in the content above.

Figure 2.7 | A graphical representation of the buying pattern behavior of churned and active customers.
Attribution: Zero to AI, Figure 3.3. Nicolò Valigi and Gianluca Mauro. Link to source. All rights reserved.

This scenario is common, and we would like to encourage you to look for such situations and think about whether there’s space to build an ML classifier for it.

To give you some inspiration, here are some other cases where you can apply this methodology:

    • You have a basic product and some upsells (accessories or additional services, which are common for telco companies). You can label customers with “has bought upsell X” or “hasn’t bought upsell X” and use their basic product usage to assess whether it may be worth proposing the upsell to your customer.
    • You have a newsletter and want to optimize its open rates. Your labels are “has opened the newsletter” or “hasn’t opened the newsletter.” The features you use for the classifier may be the time you sent the email (day of the week, hour, and so forth) and some user-related features, and you may also tag emails by their content (for example, “informative,” “product news,” or “whitepaper”).
    • You have a physical store with a fidelity card (to track which customer buys what). You can run marketing initiatives (newsletters again or also physical ads) and classify your users based on what brought them into your store and what did not.

As you can see, the method we just described of dividing users into two or more separate classes and building an ML classifier that can recognize the two is pretty flexible and can be used for a lot of problems. Pretty powerful, isn’t it?

Performing automated customer segmentation

Earlier in this section, we referenced one of the key activities that marketers have to perform when developing a marketing plan: customer segmentation. Segmenting a market means dividing customers with similar characteristics and behaviors into groups. The core idea behind this effort is that customers in the same group will be responsive to similar marketing actions. For example, a fashion retailer would likely benefit from having separate market segments for men versus women and teenagers versus young adults versus professionals.

Segments can be more or less specific and, therefore, more or less granular. Here are two examples:

    • Broad segment: Young males between 20 and 25 years old
    • Highly specific segment: Young males between 20 and 25 years old, studying in college, living in one of the top five largest US cities, and with a passion for first-person-shooter video games

Many marketers can intuitively perform this segmentation task in their brains as long as the amount of data is limited, both in terms of examples (number of customers) and features. This usually produces generic customer segments like the first one, which can be limiting considering the amount of variation that exists among these groups. A marketer could attempt to define a more specific segment like the second one, but how do they produce it? Here are the questions that could be raised during a typical brainstorming session:

    • Is it a good idea to use the 20- to a 25-year-old threshold, or is it better to use 20 to 28?
    • Are we sure that college students living in large cities are fundamentally different from the ones living in smaller ones? Can’t we put all of them into a single cluster?
    • Is there a fundamental difference between males and females? Do we really need to create two segments, or is this just a cliché?

Answering these questions can be done in three ways:

    • Follow your gut feeling. We were not in 1980, so do not do that.
    • Look at the data and use the marketer’s instinct to interpret it. This is better than a gut feeling, but marketers will likely project their biases into their analysis and see what they want to see. So, avoid this as well.
    • Let AI produce customer segments by itself, keeping a marketer in the loop to use their creativity and context knowledge.

Option 3 is most likely to outperform the others. Let us see why and how.