Naive Bayes classification

1. Usage

• NBC is extremely fast for both training and prediction

• NBC is often very easily interpretable

• NBC has very few (if any) tunable parameters

• When the data match naive assumptions (very rare in practice)

• For very well-separated categories, and simple model is needed

• For very high-dimensional data, and simple model is needed

2. Implementation

Classes c₁, c₂, c₃

Features x₁, x₂

The result of a classifier is

p(c) is the probability (frequencies) that class c is observed in the labeled dataset.

With assumption x₁, x₂ are independent

how to model p(x₁|c₁), p(x₂|c₁), p(x₁|c₂), p(x₂|c₂), p(x₁|c₃) and p(x₂|c₃)?

If the features are 0 and 1 only, you could use a Bernoulli distribution.

If the features are integers, a Multinomial distribution.

If the features are real values, a Gaussian distribution.

With a class cⱼ from the data, estimates μᵢ,ⱼ (the mean) and σᵢ,ⱼ (the standard deviation) for each feature i.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=20, centers=[(0,0), (4,4), (-4, 4)], random_state=2)
print(X.shape)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='RdBu');
plt.show()

class GNB:
    def __init__(self):
        pass
    def fit(self, X, y):
        print(y)
        total = len(y)
        self.unique_y = np.unique(y)
        self.params = {}
        for j in self.unique_y:
            id_class_j = np.where(y==j)
            prob_class_j = len(id_class_j[0])/total
            x_class_j = X[id_class_j]
            mean_class_j = np.mean(x_class_j, axis=0)
            std_class_j = np.std(x_class_j, axis=0)
            self.params[j] = [prob_class_j, mean_class_j, std_class_j]

    def find_prob(self, X):
        probs = []
        for x in X:
            prob = []
            for j in self.unique_y: 
                prob_class_j, mean_class_j, std_class_j = self.params[j]
                pij = (1/np.sqrt(2 * np.pi * std_class_j **2)) * np.exp((-1/2) * ((np.array(x) - mean_class_j)/std_class_j) **2)
                pij = np.prod(pij)
                pij *= prob_class_j
                prob.append(pij)
            prob = np.array(prob)
            pij_sum = np.sum(prob)
            prob /= pij_sum
            probs.append(prob)

        return probs

my_gauss = GNB()
my_gauss.fit(X, y)
rrs = my_gauss.find_prob([[-2, 5], [0,0], [6, -0.3]])

for r in rrs:
    print(r)

Tech It Yourself

Naive Bayes classification

Post a Comment

0 Comments

Latest Posts

Popular Posts

Collection of points in Computer Vision

Note 2: Linear regression - Python demos

Principal Component Analysis PCA using Singular Value Decomposition SVD

new data augmentation methods

Visualize the heatmap - GradCAM - Keras

Popular Posts

Collection of points in Computer Vision

Note 2: Linear regression - Python demos

Principal Component Analysis PCA using Singular Value Decomposition SVD

new data augmentation methods

Visualize the heatmap - GradCAM - Keras

What is a batch-norm in machine learning?

Robot Operating System - ROS tutorial

Information Entropy - Shannon Entropy

Tags

Naive Bayes classification

Post a Comment

0 Comments

Follow us

Latest Posts

Popular Posts

Popular Posts

Tags