Namespace: prune
This namespace holds implementations of pre-pruning and post-pruning see tree pruning docs
Functions
- getPrunedTreeByReducedErrorPruning
- getPrunedTreeByPessimisticPruning
- getPrunedTreeByCostComplexityPruning
- getPrunedTreeScore
- stopRules
- stopIfDepthIs
- stopIfMinimalNumberOfSamplesInNode
Functions
getPrunedTreeByReducedErrorPruning
getPrunedTreeByReducedErrorPruning(treeRoot
, pruningDataSet
, configuration
): TreeGardenNode
Reduced error pruning (REP) is simplest pre-pruning algorithm tree-garden implements. Even so simple there exists multiple variants of this algorithm.
tree-garden`s version is iterative algorithm.
- takes all inner nodes (all nodes except leaves) in each iteration
- tries to turn each inner node into leaf and then measures accuracy against validation data set
- keeps change, which leads to highest accuracy gain against validation data set, run new iteration
- keeps pruning until there is node after which removal accuracy against validation dataset raised
Pros:
- easily understandable
- high quality trees on output if validation set is big enough
- computationally effective
Cons:
- need for validation data set, you cannot use all your data for training model
- sometimes special pruning data set is used (trained on one part of your data set, pruned on another and validated on last one) this, if your data are expensive (like outcome of some expensive experiment) you probably do not want use this pruning method
Parameters
Name | Type |
---|---|
treeRoot |
TreeGardenNode |
pruningDataSet |
TreeGardenDataSample [] |
configuration |
TreeGardenConfiguration |
Returns
Defined in
pruneTree/reducedErrorPrunning.ts:48
getPrunedTreeByPessimisticPruning
getPrunedTreeByPessimisticPruning(tree
): TreeGardenNode
Pessimistic pruning is post pruning strategy employed by c4.5 algorithm. Based just on statistics of tree itself.
Pros:
- You do not need separate pruning data set - you do just need tree itself!
- Computationally cheap compared to cost complexity pruning
Cons:
- Sometimes under-prune tree - resulting tree is unnecessary large
Parameters
Name | Type |
---|---|
tree |
TreeGardenNode |
Returns
Defined in
pruneTree/pessimisticPrunning.ts:49
getPrunedTreeByCostComplexityPruning
getPrunedTreeByCostComplexityPruning(treeRoot
, fullTrainingData
, configuration
): TreeGardenNode
Cost complexity pruning (also known as weakest link pruning) is one of three post-pruning methods tree-garden implements.
Pros:
- You do not need separate pruning data set - optimal tree is calculated from whole data you have
- Quality of pruning is good
Cons:
- As it has internal parameter
alpha
which is found by cross validation - it is computationally expensive.
Parameters
Name | Type |
---|---|
treeRoot |
TreeGardenNode |
fullTrainingData |
TreeGardenDataSample [] |
configuration |
TreeGardenConfiguration |
Returns
Defined in
pruneTree/costComplexityPruning.ts:122
getPrunedTreeScore
getPrunedTreeScore(accuracyBeforePruning
, accuracyAfterPruning
, _numberOfNodesInPrunedTree
): number
Simple implementation of scoring trees before and after pruning. Used by reduced error pruning. Simply prefer trees with best accuracy - do not count in number of nodes in new tree (feel fre to implement custom solution)
Parameters
Name | Type |
---|---|
accuracyBeforePruning |
number |
accuracyAfterPruning |
number |
_numberOfNodesInPrunedTree |
number |
Returns
number
Defined in
pruneTree/reducedErrorPrunning.ts:16
stopRules
stopRules(...stopFunctions
): (currentNode
: TreeGardenNode
, configuration
: TreeGardenConfiguration
) => boolean
Implementation of pre-pruning.
This function will compose stopper functions into single one, see pre-pruning example.
Parameters
Name | Type |
---|---|
...stopFunctions |
StopperFn [] |
Returns
fn
(currentNode
, configuration
): boolean
Parameters
Name | Type |
---|---|
currentNode |
TreeGardenNode |
configuration |
TreeGardenConfiguration |
Returns
boolean
Defined in
pruneTree/prePrunning.ts:16
stopIfDepthIs
stopIfDepthIs(maxDepth
): (currentNode
: TreeGardenNode
, _configuration
: TreeGardenConfiguration
) => boolean
Implementation of pre-pruning.
Stop growth of tree if depth reaches given maxDepth
. see pre-pruning example
Parameters
Name | Type | Description |
---|---|---|
maxDepth |
number |
depth starts with zero - that means if you set depth 1 root node will have one row of children |
Returns
fn
(currentNode
, _configuration
): boolean
Parameters
Name | Type |
---|---|
currentNode |
TreeGardenNode |
_configuration |
TreeGardenConfiguration |
Returns
boolean
Defined in
pruneTree/prePrunning.ts:52
stopIfMinimalNumberOfSamplesInNode
stopIfMinimalNumberOfSamplesInNode(nSamples?
): (currentNode
: TreeGardenNode
, _configuration
: TreeGardenConfiguration
) => boolean
Implementation of pre-pruning.
Parameters
Name | Type | Default value | Description |
---|---|---|---|
nSamples |
number |
5 |
if number of samples in node is nSamples or lower do not allow further splitting |
Returns
fn
(currentNode
, _configuration
): boolean
Parameters
Name | Type |
---|---|
currentNode |
TreeGardenNode |
_configuration |
TreeGardenConfiguration |
Returns
boolean
Defined in
pruneTree/prePrunning.ts:38