Skip to content

Namespace: prune

This namespace holds implementations of pre-pruning and post-pruning see tree pruning docs

Functions

Functions

getPrunedTreeByReducedErrorPruning

getPrunedTreeByReducedErrorPruning(treeRoot, pruningDataSet, configuration): TreeGardenNode

Reduced error pruning (REP) is simplest pre-pruning algorithm tree-garden implements. Even so simple there exists multiple variants of this algorithm.

tree-garden`s version is iterative algorithm.

  • takes all inner nodes (all nodes except leaves) in each iteration
  • tries to turn each inner node into leaf and then measures accuracy against validation data set
  • keeps change, which leads to highest accuracy gain against validation data set, run new iteration
  • keeps pruning until there is node after which removal accuracy against validation dataset raised

Pros:

  • easily understandable
  • high quality trees on output if validation set is big enough
  • computationally effective

Cons:

  • need for validation data set, you cannot use all your data for training model
  • sometimes special pruning data set is used (trained on one part of your data set, pruned on another and validated on last one) this, if your data are expensive (like outcome of some expensive experiment) you probably do not want use this pruning method

Parameters

Name Type
treeRoot TreeGardenNode
pruningDataSet TreeGardenDataSample[]
configuration TreeGardenConfiguration

Returns

TreeGardenNode

Defined in

pruneTree/reducedErrorPrunning.ts:48


getPrunedTreeByPessimisticPruning

getPrunedTreeByPessimisticPruning(tree): TreeGardenNode

Pessimistic pruning is post pruning strategy employed by c4.5 algorithm. Based just on statistics of tree itself.

Pros:

  • You do not need separate pruning data set - you do just need tree itself!
  • Computationally cheap compared to cost complexity pruning

Cons:

  • Sometimes under-prune tree - resulting tree is unnecessary large

Parameters

Name Type
tree TreeGardenNode

Returns

TreeGardenNode

Defined in

pruneTree/pessimisticPrunning.ts:49


getPrunedTreeByCostComplexityPruning

getPrunedTreeByCostComplexityPruning(treeRoot, fullTrainingData, configuration): TreeGardenNode

Cost complexity pruning (also known as weakest link pruning) is one of three post-pruning methods tree-garden implements.

Pros:

  • You do not need separate pruning data set - optimal tree is calculated from whole data you have
  • Quality of pruning is good

Cons:

  • As it has internal parameter alpha which is found by cross validation - it is computationally expensive.

Parameters

Name Type
treeRoot TreeGardenNode
fullTrainingData TreeGardenDataSample[]
configuration TreeGardenConfiguration

Returns

TreeGardenNode

Defined in

pruneTree/costComplexityPruning.ts:122


getPrunedTreeScore

getPrunedTreeScore(accuracyBeforePruning, accuracyAfterPruning, _numberOfNodesInPrunedTree): number

Simple implementation of scoring trees before and after pruning. Used by reduced error pruning. Simply prefer trees with best accuracy - do not count in number of nodes in new tree (feel fre to implement custom solution)

Parameters

Name Type
accuracyBeforePruning number
accuracyAfterPruning number
_numberOfNodesInPrunedTree number

Returns

number

Defined in

pruneTree/reducedErrorPrunning.ts:16


stopRules

stopRules(...stopFunctions): (currentNode: TreeGardenNode, configuration: TreeGardenConfiguration) => boolean

Implementation of pre-pruning.

This function will compose stopper functions into single one, see pre-pruning example.

Parameters

Name Type
...stopFunctions StopperFn[]

Returns

fn

(currentNode, configuration): boolean

Parameters
Name Type
currentNode TreeGardenNode
configuration TreeGardenConfiguration
Returns

boolean

Defined in

pruneTree/prePrunning.ts:16


stopIfDepthIs

stopIfDepthIs(maxDepth): (currentNode: TreeGardenNode, _configuration: TreeGardenConfiguration) => boolean

Implementation of pre-pruning.

Stop growth of tree if depth reaches given maxDepth. see pre-pruning example

Parameters

Name Type Description
maxDepth number depth starts with zero - that means if you set depth 1 root node will have one row of children

Returns

fn

(currentNode, _configuration): boolean

Parameters
Name Type
currentNode TreeGardenNode
_configuration TreeGardenConfiguration
Returns

boolean

Defined in

pruneTree/prePrunning.ts:52


stopIfMinimalNumberOfSamplesInNode

stopIfMinimalNumberOfSamplesInNode(nSamples?): (currentNode: TreeGardenNode, _configuration: TreeGardenConfiguration) => boolean

Implementation of pre-pruning.

see pre-pruning example

Parameters

Name Type Default value Description
nSamples number 5 if number of samples in node is nSamples or lower do not allow further splitting

Returns

fn

(currentNode, _configuration): boolean

Parameters
Name Type
currentNode TreeGardenNode
_configuration TreeGardenConfiguration
Returns

boolean

Defined in

pruneTree/prePrunning.ts:38