smote_cd.random_undersampling#

smote_cd.random_undersampling(y, method='majority')#

Perform a random undersampling on compositional data.

Parameters:

yarray-like, shape (n,q): Array containing the compositional labels of the dataset to be undersampled.
method{‘majority’,’all’,int}: If ‘majority’, undersamples only the majority class. If ‘all’, undersamples all classes except the minority. If an int n, undersamples the first n classes (default is ‘majority’).

Returns:

list: The list containing the indexes of the elements to be removed.

Examples

The random undersampling algorithm can be tried on synthetic generated dataset.

>>> import numpy as np
>>> import smote_cd

We first generate the synthetic dataset and keep only 20 points on one of the classes to make it imbalanced.

>>> X,y,_ = smote_cd.dataset_generation.generate_dataset(n_features=2,n_classes=2,size=500,random_state=1)
>>> y = np.concatenate((y[np.argmax(X,axis=1)==0][:20],y[np.argmax(X,axis=1)==1]))
>>> X = np.concatenate((X[np.argmax(X,axis=1)==0][:20],X[np.argmax(X,axis=1)==1]))
>>> print(sum(y)/np.sum(y))
[0.29337655 0.70662345]

We then applied the random undersampling to retrieve the indexes to remove, and remove them from the original dataset.

>>> indexes_to_remove = smote_cd.random_undersampling(y)  
>>> y_us=np.delete(y,indexes_to_remove,axis=0)
>>> X_us=np.delete(X,indexes_to_remove,axis=0)
>>> print(sum(y_us)/np.sum(y_us))
[0.48177862 0.51822138]

Your obtained results will not be exactly similar, as no random seed is given here.