Supervised Learning: Theory & Practice

A primary part of machine learning is supervised learning, where the model should be trained to make predictions from training data that has been tagged. In this process, the machine gets the expected result after being trained. It is the same as a student who employs a teacher to facilitate learning. It means that during the supervised learning process, a label that is paired for each feature set is taught to the model. The model elaborates on understanding input-output mapping, trying to generalize this teaching to unseen data with accuracy. This algorithmized procedure is determined by algorithms that repeat and adjust to every iteration until the predicted and actual outputs are as similar as possible to the technique, something like loss minimization, as shown in Figure 1.

Figure 1

Important parts of Supervised Learning

Features: In machine learning, features represent particular data attributes that are employed in training a machine learning model. Furthermore, several different terms refer to a feature, including variable or column. These are crucial elements of the learning process because they will directly impact prediction performance.

Label: It is the value that a model tries to predict as the result or goal variable.

Training dataset: It is a part of the dataset chosen to teach the model.

Test dataset: This relates to evaluating the model and how it utilizes data that is not part of the training datasets. It is accomplished by providing it with data that it has not trained with previously.

Figure 2 illustrates a sample dataset that includes features and label.

Figure 2

Key algorithms and their theories

Four main types of supervised learning will be discussed in this article: linear regression, decision trees, neural networks, and support vector machines. Each maximizes the results depending on the specific data and tasks.

Linear Regression

It is one of the most straightforward techniques in supervised learning. It is a mathematical model that shows the dependence among the factors we try to forecast in case there is an interrelation between them. Supported by those data, a linear equation can be formed between dependent and independent variables, demonstrating their relationship, as shown in Figure 3. It is how the linear equation can be written:

𝑦 = 𝛽0+𝛽1𝑥1+⋯+𝛽𝑛𝑥𝑛

Where 𝑦 is the predicted value, 𝛽0 is the intercept, and 𝛽𝑖 are the coefficients representing the weight or influence of each independent variable 𝑥𝑖​​. This method is widely used for predictive analysis and trend forecasting, as it offers a simple yet effective way to model relationships within data.

Figure 3

Decision Trees

These are non-parametric supervised learning methods for classification and regression tasks. The algorithm resorts to the most prominent characteristics to split the data into two or more equal groups in which all items are closely related. It provides that the findings of every group are different from each other. The root node displays the result, which subsequently divides into branches to reach other decision nodes or leaves. The decisions or splits are based on the provisions that cause variations, as shown in Figure 4.

Figure 4

Neural Networks

It, specifically deep learning models, uses data similarly to how the brain does it by looking for patterns. The current layer changes data from the previous layer before sending it to the next layer. The branches of a tree are called nodes. A basic neural network has an input layer, possibly one or more hidden layers, and an output layer. It is effective to handle the complex interactions in relationships, as shown in Figure 5.

Figure 5

Support Vector Machines (SVM)

These are solid methods for classifying things into two groups. A process of the SVM algorithm is to find the best divide (hyperplane) between categories in feature space. The support vectors are the data points that are closest to the hyperplane. The gap between the hyperplane and the support vectors is at its maximum. This model works well for sorting things into groups that cannot be separated linearly. There are two types of SVM, linear and non-linear SVMs, in an easy-to-understand manner.

If a straight line can split up the data groups, linear SVM is used. Take a look at Figure 6; it shows two different kinds of points on a graph: red and blue. A linear SVM will carefully ensure that this line separates the red and blue data points as much as possible. This method is also relevant for many purposes, including determining whether the given email is spam or not or reading handwritten numbers. However, it often does not contain a linear dividing line that could separate the data points in real life. In cases like these, non-linear SVMs are used. The kernel trick is how they amend their data into a higher-dimensional space with a line that cuts the points into two parts. Thus, this technique might be applied to classify pictures (e.g., differentiate bird photos from tiger photos) and analyze biology information (e.g., protein classification).

Figure 6

How to perform supervised learning?

  1. Collecting Data: All this information, including features and label data, has to be added.
  2. Preprocessing the data: Cleaning, handling data filled with missing values and variables, etc.
  3. Choosing Features: Decide which features are relevant to the prediction process and the most suitable ones.
  4. Training model: Use the training dataset, which consists of features with labels, to train the model.
  5. Evaluating the model: This procedure is most important because the testing data should be different from the training dataset to validate how precise a model can be.
  6. Prediction: The model inserted is applied to predict based on new data that follows the patterns of the process.

Practical example:

Step-by-step:

  • Problem Definition: We aim to predict house price based on size and bedrooms. A sample dataset might look like Table 1.
  • Features: In our example, the features could include Size (sq. ft) and Bedrooms
  • Label: In this example, the label is Price.
Size (sq. ft)BedroomsPrice
15003300000
16003320000
17003340000
18004360000
19004380000
20004400000
21005420000
22005440000
23005460000
24005480000

Table 1

We will implement this using linear regression in Python with the scikit-learn library, a popular tool for simple and effective modeling.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Sample dataset
data = {
    'Size': [1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400],
    'Bedrooms': [3, 3, 3, 4, 4, 4, 5, 5, 5, 5],
    'Price': [300000, 320000, 340000, 360000, 380000, 400000, 420000, 440000, 460000, 480000]
}

# Creating DataFrame
df = pd.DataFrame(data)

# Features and target
X = df[['Size', 'Bedrooms']]
y = df['Price']

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initializing and training the model
model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions
y_pred = model.predict(X_test)

# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Displaying the results
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

# Sample prediction
sample_data = np.array([[2100, 5]])
predicted_price = model.predict(sample_data)
print(f"Predicted price for house with 2100 sq.ft and 5 bedrooms: ${predicted_price[0]:.2f}")

Explanation:

  • Making a Dataset: We will have a simple but practical data set with the bedroom count, house size, and house price as fields.
  • Preprocessing the data: Pandas data frame (in Python code) is employed to carry out the data frame and indicate features (size, bedroom) along with the label (price).
  • Model Training: We used the train_test_split process to separate the dataset into training and testing datasets. We then constituted training data for the linear regression model and trained it.
  • Test Set: The model will be tested, based on the test dataset for prediction accuracy.
  • Evaluating the model: Use two metrics (MSE and R²) to evaluate the model’s strength.
  • Ask question: What is the cost of a house with 2100 sq. ft and 5 bedrooms?