Classification is a technique used to categorize our data into a desired and distinct number of classes where we can assign labels to each class.
Applications of Classification are: speech recognition, handwriting recognition, biometric identification, document classification, etc.
Binary classifiers: Classification with only 2 distinct classes or with 2 possible outcomes. Multi-Class classifiers: Classification with more than two distinct classes.
Naive Bayes is a probabilistic classifier inspired by the Bayes theorem. Under a simple assumption which is the attributes are conditionally independent. Naive Bayes is a very simple algorithm to implement and good results have obtained in most cases. It can be easily scalable to larger datasets since it takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers. Naive Bayes can suffer from a problem called the zero probability problem. When the conditional probability is zero for a particular attribute, it fails to give a valid prediction. This needs to be fixed explicitly using a Laplacian estimator.
Support vector machine is a representation of the training data as points in space separated into categories by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap they fall.
kNN classified an object by a majority vote of the object’s neighbours, in the space of input parameter. The object is assigned to the class which is most common among its k (an integer specified by human) nearest neighbor.
It is a non-parametric, lazy algorithm. It’s non-parametric since it does not make any assumption on data distribution (the data does not have to be normally distributed). It is lazy since it does not really learn any model and make generalization of the data (it does not train some parameters of some function where input X gives output y).
So, this is not really a learning algorithm- It simply classifies objects based on feature similarity (feature = input variables).
Decision Tree, as the name mentions, makes decision with tree-like model. It splits the sample into two or more homogeneous sets (leaves) based on the most significant differentiators in your input variables. To choose a differentiator (predictor), the algorithm considers all features and does a binary split on them (for categorical data, split by cat; for continuous, pick a cut-off threshold). It will then choose the one with the least cost (i.e. highest accuracy), and repeats recursively, until it successfully splits the data in all leaves (or reaches the maximum depth).
Random forest is an ensemble model that grows multiple trees and classifies objects based on the “votes” of all the trees. i.e., an object is assigned to a class that has the most votes from all the trees. By doing so, the problem with a high bias (over fitting) could be alleviated.
Random forest classifier is a meta-estimator that fits a number of decision trees on various sub-samples of datasets and uses average to improve the predictive accuracy of the model and controls over-fitting. The sub-sample size is always the same size as the original input sample but the samples are drawn with replacement.