smote_cd.random_undersampling#
- smote_cd.random_undersampling(y, method='majority')#
Perform a random undersampling on compositional data.
- Parameters:
- yarray-like, shape (n,q)
Array containing the compositional labels of the dataset to be undersampled.
- method{‘majority’,’all’,int}
If ‘majority’, undersamples only the majority class. If ‘all’, undersamples all classes except the minority. If an int n, undersamples the first n classes (default is ‘majority’).
- Returns:
- list
The list containing the indexes of the elements to be removed.
Examples
The random undersampling algorithm can be tried on synthetic generated dataset.
>>> import numpy as np >>> import smote_cd
We first generate the synthetic dataset and keep only 20 points on one of the classes to make it imbalanced.
>>> X,y,_ = smote_cd.dataset_generation.generate_dataset(n_features=2,n_classes=2,size=500,random_state=1) >>> y = np.concatenate((y[np.argmax(X,axis=1)==0][:20],y[np.argmax(X,axis=1)==1])) >>> X = np.concatenate((X[np.argmax(X,axis=1)==0][:20],X[np.argmax(X,axis=1)==1])) >>> print(sum(y)/np.sum(y)) [0.29337655 0.70662345]
We then applied the random undersampling to retrieve the indexes to remove, and remove them from the original dataset.
>>> indexes_to_remove = smote_cd.random_undersampling(y) >>> y_us=np.delete(y,indexes_to_remove,axis=0) >>> X_us=np.delete(X,indexes_to_remove,axis=0) >>> print(sum(y_us)/np.sum(y_us)) [0.48177862 0.51822138]
Your obtained results will not be exactly similar, as no random seed is given here.