Namespace: impurity
Functions
- getInformationGainRatioForSplit
- getInformationGainForSplit
- getGiniIndexForSplit
- getScoreForRegressionTreeSplit
Functions
getInformationGainRatioForSplit
getInformationGainRatioForSplit(parentSet
, childrenSets
, config
, splitFn
): number
Split quality scoring function for classification trees
Information gain ratio is similar like information gain, but penalizes splits that have many distinct values (like dates, IDs or names)
Remarks
Higher score - better split!!!
Parameters
Name | Type |
---|---|
parentSet |
TreeGardenDataSample [] |
childrenSets |
Object |
config |
TreeGardenConfiguration |
splitFn |
(currentSample : TreeGardenDataSample ) => any |
Returns
number
Defined in
impurity/entropy.ts:63
getInformationGainForSplit
getInformationGainForSplit(parentSet
, childrenSets
, config
, _splitFn
): number
Split quality scoring function for classification trees.
It measures decrease of entropy of child data set compared to parent data set. Low entropy == pure data set. Decrease in entropy means raise of purity, thus larger decrease, better split
Remarks
Higher score - better split!!!
Parameters
Name | Type |
---|---|
parentSet |
TreeGardenDataSample [] |
childrenSets |
Object |
config |
TreeGardenConfiguration |
_splitFn |
(currentSample : TreeGardenDataSample ) => any |
Returns
number
Defined in
impurity/entropy.ts:36
getGiniIndexForSplit
getGiniIndexForSplit(parentSet
, childrenSets
, config
, _splitter
): number
Split quality scoring function for classification trees
See gini impurity
Remarks
lower score - better split!!!
Parameters
Name | Type |
---|---|
parentSet |
TreeGardenDataSample [] |
childrenSets |
Object |
config |
TreeGardenConfiguration |
_splitter |
(currentSample : TreeGardenDataSample ) => any |
Returns
number
Defined in
impurity/gini.ts:42
getScoreForRegressionTreeSplit
getScoreForRegressionTreeSplit(parentDataSet
, childDataSets
, config
, splitter
): number
Split quality scoring function for regression trees
It is based on sum of residuals, residual is distance of particular value from average value tree-garden uses absolute distance, not squared. Lower sum means that values are closer together - data set is more pure.
Remarks
lower score - better split!!!
Parameters
Name | Type |
---|---|
parentDataSet |
TreeGardenDataSample [] |
childDataSets |
Object |
config |
TreeGardenConfiguration |
splitter |
(currentSample : TreeGardenDataSample ) => any |
Returns
number
Defined in
algorithmConfiguration/buildAlgorithmConfiguration.ts:49