Features

There are four running types of HyperGBM：

Single node：running in a single machine and using Pandas and Numpy datatype
Single node with NVIDIA GPU device：running in a single machine with NVIDIA GPU devices and using cuDF and cupy datatype
Distributed with single node：running in a single machine and using Dask datatype which requires creating Dask collections before using HyperGBM
Distributed with multi nodes：running in multiple machines and using Dask datatype which requires creating Dask collections to manage resources for multiple machines before using HyperGBM

The overview of supported features for different running types are displayed in the following table:

	Features	Single node	Single node with GPU	Distributed with single node	Distributed with multi nodes
Data Cleaning	Empty characters handling	√	√	√	√
	Recognizing columns types automatically	√	√	√	√
	Columns types correction	√	√	√	√
	Constant columns cleaning	√	√	√	√
	Repeated columns cleaning	√	√	√	√
	Deleting examples without targets	√	√	√	√
	Illegal characters replacing	√	√	√	√
	id columns cleaning	√	√	√	√
Dataset splitting	Splitting by ratio	√	√	√	√
	Adversarial validation	√	√	√	√
Feature engineering	Feature generation	√		√	√
	Feature dimension reduction	√	√	√	√
Data preprocessing	SimpleImputer	√	√	√	√
	SafeOrdinalEncoder	√	√	√	√
	TargetEncoder	√	√
	SafeOneHotEncoder	√	√	√	√
	TruncatedSVD	√	√	√	√
	StandardScaler	√	√	√	√
	MinMaxScaler	√	√	√	√
	MaxAbsScaler	√	√	√	√
	RobustScaler	√	√	√	√
Imbalanced data handling	ClassWeight	√	√	√	√
	UnderSampling(Nearmiss,Tomekslinks,Random)	√
	OverSampling(SMOTE,ADASYN,Random)	√
Search algorithms	MCTS	√	√	√	√
	Evolution	√	√	√	√
	Random search	√	√	√	√
	Play back	√	√	√	√
Early stopping	time limit	√	√	√	√
	no improvements are made after n trials	√	√	√	√
	expected_reward	√	√	√	√
	trail discriminator	√	√	√	√
Modeling algorithms	XGBoost	√	√	√	√
	LightGBM	√	√	√	√
	CatBoost	√	√	√
	HistGridientBoosting	√
Evaluation	Cross-Validation	√	√	√	√
	Train-Validation-Holdout	√	√	√	√
Advanced	Automatica task type inference	√	√	√	√
	Data adaption	√	√
	Collinearity detection	√		√	√
	Data drift detection	√	√	√	√
	Feature selection	√	√	√	√
	Feature selection(Two-stage)	√	√	√	√
	Pseudo label(Two-stage)	√	√	√	√
	Pre-searching with UnderSampling	√	√	√	√
	Model ensemble	√	√	√	√