Unsupervised Learning: Theories & Practice

It is a part of machine learning known as “unsupervised learning”. The process of unsupervised learning is to look for structures or patterns in data without using labeled answers. It is as opposed to supervised learning, where the model is taken through a set dataset that has been labeled and the model denotes the procedure of transitioning from input to desired output. That is why unsupervised learning algorithms can make conclusions from sets of input data themselves without answers to them marked. It makes them very useful for data analysis, making exploratory data analysis, and identifying anomalies in the data, as shown in Figure 1.

Figure 1

The Essence of Unsupervised Learning

However, the fundamental question that an unsupervised learning algorithm intends to address is how the given data is structured. These algorithms can also relay information that can emphasize things that may not be easily identified by linking distinct data points or aggregating the information. Among the machine learning approaches, the most commonly applied to unsupervised learning include clustering, dimensionality reduction, and association rule learning.

Clustering techniques

The first phase of clustering is to try comprehending the data, which will reveal comparable spots. Cluster analysis is the segmentation of a given collection of information into many clusters such that components within a cluster are most similar to other elements in the same cluster, as shown in Figure 2.

In unsupervised deep learning, some popular clustering techniques are as follows:

  1. K-Means Clustering
  2. Hierarchical Clustering
  3. Gaussian Mixture Models (GMM)
  4. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
  5. Self-Organizing Maps (SOM)
  6. Spectral Clustering

Figure 2

Dimensionality Reduction

Dimensionality reduction in unsupervised learning implies decreasing the high-dimensional data into simpler forms so that a high percentage of information in each element of the data is retained. One technique for dimensionality reduction is shown in Figure 3. It makes it easier to work with models, helps identify cases where data is being overfitted, and simplifies the data visualization process. Below are popular methods of dimensionality reduction.

  1. Principal Component Analysis (PCA)
  2. t-Distributed Stochastic Neighbor Embedding (t-SNE)
  3. Autoencoders

Figure 3

Association rule learning

Association rule learning is a significant approach to data mining that can reveal correlations between the attributes in the big data set. Although it is a methodology different from others, such as dimensionality reduction, the objective is to extract or learn the features by themselves through a neural network. However, the core work to analyze the data pattern is the same in all these techniques. In this technique, rules, support, and confidence are some of the essential principles of discovering interdependency among the items of a dataset.

Here is what each term means:

Rules

In association rule mining, rules are statements that define relationships between items in a dataset. A rule consists of two parts: an antecedent (the left-hand side) and a consequent (the right-hand side), which are linked by an arrow (➞).

Example: In the rule {bread} ➞ {butter}, ‘bread’ refers to ‘butter’ and is therefore the antecedent, while ‘butter’ is the consequent. It means there is a high probability that butter will occur in a given transaction if bread is expected to occur.

Support

Support measures how frequently a particular dataset appears together in the dataset. The formula of support is calculated as the number of transactions based on this support containing the dataset divided by the total number of transactions.

Example: If 100 out of 500 transactions contain both “bread” and “butter,” then the support for the dataset {bread, butter} is (100/500) * 100 = 20%.

Confidence

Though it defines itself as the probability of the occurrence of the consequent, it assesses the accuracy with which the rule predicts the presence of the consequent based on the presence of the antecedent. There is a formula for calculating confidence. It has been defined as the number of transactions where the antecedent and consequent both appear divided by the number of transactions where only the antecedent appears.

Example: If among 100 transactions containing ‘bread’, we also have ‘butter’ in 75 of them, then the confidence of the rule {bread} ➞ {butter} is 75/100 * 100 = 75%.

Sample rule table

RuleSupportConfidence
If a customer buys bread, they are likely to buy butter.20%75%
If a customer buys a laptop, they are likely to buy a laptop bag.15%80%
If a customer views a smartphone, they are likely to view a smartphone case.10%60%
If a customer buys milk, they are likely to buy eggs.25%65%

The sections above make clear how unsupervised deep learning, which includes clustering, dimensionality reduction, and association rule learning, aids in comprehensive data analysis. In order to facilitate analysis, dimensionality reduction involves reducing the dataset’s dimensions. Conversely, clustering aids in locating the data’s innate similarities or group structures. The organizations that belong to these clusters are subsequently identified by association rule learning, which provides insight into the terms of action. Gaining insights is made simpler by this integration, which also improves the model’s accuracy by lowering data noise and makes it possible to spot patterns that are essential to decision-making across a variety of applications.

Practical example 1:

Step-by-Step:

  • Dimensionality Reduction: Rescale a high-dimensional dataset using PCA or autoencoders to ease its analysis and visualization.
  • Clustering: The clustering (e.g., K-means) was performed on the reduced data to identify distinct groups.
  • Association Rule Learning: Each cluster applies association rule mining to identify the relationships and rules describing the data.
import numpy as np
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd

# Step 1: Dimensionality Reduction
np.random.seed(42)
X = np.random.rand(100, 10)
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

# Step 2: Clustering
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(X_reduced)

# Step 3: Association Rule Learning
# Generating sample transaction data based on clusters
transactions = []
for cluster in range(3):
    cluster_data = X[clusters == cluster]
    transactions.append((cluster_data > 0.5).astype(int).tolist())

transactions = [item for sublist in transactions for item in sublist]

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.6)

print(rules)

Explanation:

  • Dimensionality Reduction: PCA is applied to keep only two dimensions representing the dataset we are dealing with.
  • Clustering: Applying K-means clustering to get 3 clusters in the reduced data.
  • Association Rule Learning: This data in each cluster is considered transactions to derive the frequent itemsets and association rules.

Practical example 2:

In the following example, I will use the Keras library to build an autoencoder and learn features with clusters. Autoencoders are among the most preferred for unsupervised learning since they can effectively learn a compressed data representation for clustering.

Step-by-Step:

  • Data Preparation: Import the dataset and prepare the training and test data. In this example, we will use the MNIST dataset.
  • Build the Autoencoder: Create the encoder model to compress the input data. Based on the compressed form, design an algorithm for decoding the original data.
  • Train the Autoencoder: Train the autoencoder to minimize the reconstruction error.
  • Extract Features: Feature extraction process with the use of the encoder part of the autoencoder.
  • Clustering: Perform an application of a clustering algorithm (e.g., KMeans).
  • Evaluate the Clustering: Visualize the clusters to understand the clustering quality.
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Input, Dense
from keras.optimizers import Adam
from sklearn.cluster import KMeans
from sklearn.manifold import TSNE

# Step 1: Load and preprocess the dataset
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

# Step 2: Build the autoencoder
input_dim = x_train.shape[1]
encoding_dim = 64  # Size of the encoded representation

input_img = Input(shape=(input_dim,))
encoded = Dense(encoding_dim, activation='relu')(input_img)
decoded = Dense(input_dim, activation='sigmoid')(encoded)

autoencoder = Model(input_img, decoded)
encoder = Model(input_img, encoded)

encoded_input = Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-1]
decoder = Model(encoded_input, decoder_layer(encoded_input))

autoencoder.compile(optimizer=Adam(), loss='binary_crossentropy')

# Step 3: Train the autoencoder
autoencoder.fit(x_train, x_train,
                epochs=50,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

# Step 4: Extract features
encoded_imgs = encoder.predict(x_train)

# Step 5: Apply KMeans clustering
n_clusters = 10
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
kmeans.fit(encoded_imgs)
y_kmeans = kmeans.predict(encoded_imgs)

# Step 6: Visualize the clusters using t-SNE
tsne = TSNE(n_components=2, random_state=42)
encoded_imgs_2d = tsne.fit_transform(encoded_imgs)

plt.figure(figsize=(8, 8))
for i in range(n_clusters):
    plt.scatter(encoded_imgs_2d[y_kmeans == i, 0], encoded_imgs_2d[y_kmeans == i, 1], label=f'Cluster {i}')
plt.legend()
plt.show()

Explanation:

  • Data Preparation: Data preprocessing is done on MNIST (Modified National Institute of Standards and Technology), which is the most frequently used dataset in the deep learning community concerning the image of handwritten digits from 0 to 9. It is mostly used for training and learning neural networks with image processing, particularly for handwritten number recognition.
  • Build the Autoencoder: This architecture usually comprises an encoder and a decoder network. The encoder compresses the input data to a lower dimension as compared to that of the input. The decoder takes in the encoded data and reconstructs the input data.
  • Train the Autoencoder: Autoencoder training is usually done using the Adam optimizer to minimize reconstruction errors.
  • Extract Features: The encoder transforms the input data into the encoded representation after training.
  • Clustering: K means clustering is performed on the encoded feature in order to obtain clusters of this data.
  • Evaluate the Clustering: The 2D format is achieved by using t-SNE to encode the features. The clusters are plotted to see if the clustering results are proper.

This example shows how deep learning can be used in clustering by extracting a compressed feature of a dataset through an autoencoder and then applying a clustering algorithm to the learned features.