Ensemble Methods

Thomaskutty Reji
3 min readMay 5, 2021

Ensemble Techniques

Ensemble Techniques means combining multiple models to get more stable models. In ensemble techniques we have two usual methods; Bagging and Boosting.

Note: Bias -Variance Trade -off

First of all understand what is bias and variance in terms of model complexity. In the following figure you can see three models, First one is highly complex so it got high variance. second one very low variance( which automatically leads to high bias) which means that the model is not performing well. These are the two extremes in machine learning models. There is a trade-off between bias and variance. So we have to find a model with optimal complexity.

Bias and variance are the two different sources of error in machine learning model: bias measures the expected deviation from the true value of the function or parameter, and variance measures the deviation from the expected estimator value that any given sampling of from the data generating distribution is likely to cause.

Bagging and Voting

  • Bagging stands for Bootstrap aggregating, is a way to decrease the variance of your predictions by generating additional data fro training from your original dateset using combinations with repetitions to produce multi-sets of the same cardinality/size as your original data.
  • Parallel ensemble : Each model is built independently
  • Aim is to decrease the variance, not the bias
  • Suitable for high variance and low bias models (complex models)

Initially we have a data set for training. And since this is ensemble learning we have multiple models here. lets say (M1, M2, M3,M4, … Mn) . For each model we provide a different samples of data and make predictions. Here we are using row sampling with replacement. So, for each base learners we have different samples.

Finally when we given the test data to the model each base learners will predict some output. Then we apply voting classifier which means majority of the votes given by the classifier will be taken as the final prediction. Why this method is known as bootstrap aggregation ? The step we use row sampling with replacement is known as bootstrapping and the step where we applying the voting classifier is known as the aggregation. An example of a tree based method is random forest, which develop fully grown tree.

Stacking or Stacked Generalisation

Stacking is a meta-learning approach in which an ensemble is used to extract features that will be used by another layer of the ensemble.

First we train several classifiers with the training data and we use the outputs ( probabilities ) to train the next layer (middle) , finally the outputs of classifiers in the second layer are combined using the average.

We should not use stacking with small datasets and should use the diverse classifiers so that they can complement each other.

  • stacking combines results from heterogeneous model types.
  • Unlike bagging stacking used same training data set (instead of samples of training data set )
  • unlike boosting , in stacking a single model is used to learn how to best combine the predictions from the contributing models.

--

--