## Introduction

The goal of any classification algorithm is to predict a value *c* given the value of a vector *x* of input features. In classification problems, c is a discrete class label, whereas in a regression problem it is a continuous variable.

From a probabilistic point of view, the goal is to find the *conditional probability distribution* p(c|x). The traditional approach consists of representing this distribution using a parametric model. The parameters are then estimated using a training set consisting of input/output pair values.

The resulting *conditional distribution* can be used to predict the class *c* for new values of input *x*. As this distribution directly discriminates between the different classes c, this approach is as **discriminative**. The alternative approach is to find the joint distribution p(x, c), and then use it to evaluate the conditional p(c|x) and then make prediction for new values of x. This is the **generative** approach.

To intuitively understand the difference between them let’s take a simple example. Suppose that we want to learn how to distinguish between several languages. We basically have 2 ways of doing this.

The first way is to learn these languages. Next time, when we hear a language, we can identify the language is because we know it.

The second way is to lean only the differences between languages. For example, specific sounds and musicality that differentiate between languages, and use these learnings to classify a language when we hear it.

This second category of classifiers are called discriminative, they make fewer assumptions on the distributions but depend heavily on the quality of the observations. For example, given a set of languages such as French and English, discriminative models will be matching a new, unlabeled recording to a most similar learned language and then give out the label class, French or English.

However, generative models, the first category, will develop models which should be able to output a class label to the unlabeled recording from the assumption they made. For example, all English sounds have a certain tone.

#### Generative classifiers

- Naïve Bayes
- Bayesian networks
- Markov random fields
- Hidden Markov Models (HMM)

Typically, the generalization performance of the generative models is often found to be poorer than of discriminative models due to differences between the model and the true distribution of the data.

#### Discriminative Classifiers

- Logistic regression
- Scalar Vector Machine
- Traditional neural networks
- Nearest neighbours
- Conditional Random Fields (CRF)s
- Logistic regression, SVMs • Traditional neural networks, Nearest neighbour • Conditional Random Fields (CRF)

**How to get the best of the 2 worlds**