index
Here you can browse all top level functions, types, variables and namespaces of tree-garden.
Type Aliases
- TreeGardenConfiguration
- TreeGardenDataSample
- TreeGardenNode
- SplitCriteriaFn
- SplitCriteriaDefinition
- SplitOperator
Functions
- buildAlgorithmConfiguration
- growTree
- growRandomForest
- getTreePrediction
- getRandomForestPrediction
- getTreeAccuracy
- getDividedSet
Variables
Namespaces
- split
- configuration
- dataSet
- impurity
- prune
- sampleTrees
- sampleDataSets
- statistics
- tree
- predict
- constants
Type Aliases
TreeGardenConfiguration
TreeGardenConfiguration: Object
TreeGardenConfiguration is somehow central object of tree-garden it holds every options regarding growing trees, growing forests and their usage on unknown data. It can also be used for dependency injection of custom implementations.
You do not want to write your configuration by hand, see buildAlgorithmConfiguration.
growMissingValueReplacement
,evaluateMissingValueReplacement
, missingValue
,getAllPossibleSplitCriteriaForDiscreteAttribute
and getAllPossibleSplitCriteriaForContinuousAttribute
can be defined differently for particular attribute.
Type declaration
Name | Type | Description |
---|---|---|
treeType |
"classification" | "regression" |
tree-garden supports also regression trees and forests, here you can switch ;) Default Value classification |
attributes |
{ [key: string] : typeof defaultAttributeConfiguration ; } |
Key is attribute id, value is attribute meta object. Filled by buildAlgorithmConfiguration |
includedAttributes |
string [] |
Only these attributes are considered for building decision tree |
excludedAttributes |
string [] |
These attributes are not considered for building decision tree |
getScoreForSplit |
(parentDataSet : TreeGardenDataSample [], childDataSets : { [key: string] : TreeGardenDataSample []; }, config : TreeGardenConfiguration , splitter : SplitCriteriaFn ) => number |
Impurity scoring function. You can switch on gini, information gain or regression tree score in case of regression trees. You can also implement your own. Default Value getInformationGainRatioForSplit |
biggerScoreBetterSplit |
boolean |
Depends on split scoring function you choose, entropy based methods have higher score, better split, but gini index has lower score better split! Default Value true |
shouldWeStopGrowth |
(node : TreeGardenNode , configuration : TreeGardenConfiguration ) => boolean |
You can configure pre-pruning. |
numberOfSplitsKept |
number |
How many of considered splits in each node should be stored, it can be seen in tree-garden-visualization upon clicking on node. Default Value 3 |
growMissingValueReplacement |
(dataSet : TreeGardenDataSample [], attributeId : string , configuration : TreeGardenConfiguration ) => (sample : TreeGardenDataSample ) => any |
How to deal with missing values during growth phase. Default Value getMostCommonValueFF |
evaluateMissingValueReplacement |
(dataSet : TreeGardenDataSample [], attributeId : string , configuration : TreeGardenConfiguration ) => (sample : TreeGardenDataSample ) => any |
How to deal with missing values during evaluate phase. Default Value getMostCommonValueFF |
getClassFromLeafNode |
(node : TreeGardenNode , sample? : TreeGardenDataSample ) => string |
Function that will retrieve class from node of classification tree for given sample Default Value getMostCommonClassForNode |
getValueFromLeafNode |
(node : TreeGardenNode , sample? : TreeGardenDataSample ) => number |
Function that will retrieve value from node of regression tree for given sample Default Value getValueForNode |
onlyBinarySplits |
boolean |
If true only binary splits are allowed - this is restriction implemented in CART algorithm - possible splits are designed in way that it has always boolean outcome - two child nodes leads from each parent Default Value false if true it will perform very slowly on data sets with attributes like date, name - plenty of possible discrete values. |
missingValue |
any |
What value is considered as missing value Default Value undefined |
keepFullLearningData |
boolean |
If true all data partitions in each node are kept - data of tree will be huge suitable just for small training sets Default Value false |
getAllPossibleSplitCriteriaForDiscreteAttribute |
(attributeId : string , dataSet : TreeGardenDataSample [], configuration : TreeGardenConfiguration ) => SplitCriteriaDefinition [] |
Strategy, how to generate all possible splits for given discrete attribute Default Value getPossibleSpitCriteriaForDiscreteAttribute |
getAllPossibleSplitCriteriaForContinuousAttribute |
(attributeId : string , dataSet : TreeGardenDataSample [], configuration : TreeGardenConfiguration ) => SplitCriteriaDefinition [] |
Strategy, how to generate all possible splits for given continuous attribute Default Value getPossibleSpitCriteriaForContinuousAttribute |
costComplexityPruningKFold |
number |
If you use cost complexity pruning alpha parameter is internally found by cross-validation, you can change how many datasets are used. Default Value 5 |
reducedErrorPruningGetScore |
(accuracyBeforePruning : number , accuracyAfterPruning : number , numberOfNodesInPrunedTree : number ) => number |
Function used for scoring of reduced error pruning Default Value getPrunedTreeScore |
getTreeAccuracy |
(treeRootNode : TreeGardenNode , dataSet : TreeGardenDataSample [], configuration : TreeGardenConfiguration ) => number |
Function that will calculate how precise tree is Default Value getTreeAccuracy |
numberOfTrees |
number |
How many trees do we want in random forest - Default Value 27 |
getAttributesForTree |
(algorithmConfiguration : TreeGardenConfiguration , _dataSet : TreeGardenDataSample []) => string [] |
Function for gathering subset of attributes for random forest Default Value getSubsetOfAttributesForTreeOfRandomForest |
numberOfBootstrappedSamples |
number |
How many samples are bootstrapped for each tree of random forest, Default Value 0 which means same amount as number of samples in training data set |
calculateOutOfTheBagError |
boolean |
Should we calculate out of the bag error for random forest? Default Value true |
majorityVoting |
(treeRoots : TreeGardenNode [], dataSample : TreeGardenDataSample , config : TreeGardenConfiguration ) => SingleSamplePredictionResult |
Majority voting function for random forests, Default Value getResultFromMultipleTrees |
mergeClassificationResults |
(values : string []) => string |
Function for merging classification results (from multiple trees) Default Value getMostCommonValue |
mergeRegressionResults |
(values : number []) => number |
Function for merging regression results (from multiple trees) Default Value getMedian |
getTagOfSampleWithMissingValueWhileClassifying? |
(sample : TreeGardenDataSample , attributeId : string , nodeWhereWeeNeedValue : TreeGardenNode , config : TreeGardenConfiguration ) => any |
If there is missing value while classifying (reference data set for replacement was not provided) this function will gather tag for given node. See default implementation. |
allClasses? |
string [] |
All classes of training data set - filled by buildAlgorithmConfiguration |
buildTime? |
number |
Timestamp - when buildAlgorithmConfig was called - filled automatically |
Defined in
algorithmConfiguration/buildAlgorithmConfiguration.ts:21
TreeGardenDataSample
TreeGardenDataSample: Object
For more information, see tree-garden data sample.
Index signature
▪ [key: string
]: any
Type declaration
Name | Type |
---|---|
_class? |
string | number |
_label? |
string | number |
_id? |
string | number |
Defined in
dataSet/set.ts:8
TreeGardenNode
TreeGardenNode: Object
TreeGardenNode is object representing one node of tree, under childNodes, you can see tags of split and child nodes.
Type declaration
Name | Type | Description |
---|---|---|
id |
string |
Every node have unique identifier. |
isLeaf |
boolean |
Is node leaf or not? |
depth |
number |
Depth of node in tree - it starts from zero - root node have depth = 0 . |
alreadyUsedSplits |
SplitCriteriaDefinition [] |
Split definitions used from root up to this node. |
chosenSplitCriteria |
SplitCriteriaDefinition |
Best scoring split criteria. |
bestSplits |
ReturnType <typeof getBestScoringSplits > |
Array of best scoring splits and respective scores - amount of kept split can be set in configuration. |
dataPartitionsCounts |
ReturnType <typeof dataPartitionsToDataPartitionCounts > |
Counts of samples behind each tag, divided by classes, it should look like: {tag:{classOne:3, classTwo:3}, anotherTag:{classOne:1, classTwo:6}} |
classCounts |
ReturnType <typeof dataPartitionsToClassCounts > |
count of samples by class, should look like: {classOne:8, classTwo:7} |
parentId? |
string |
Unique identifier of parent node. |
childNodes? |
{ [key: string] : TreeGardenNode ; } |
Object of split tags and child nodes. |
impurityScore? |
number |
Score of chosen best split criteria. |
dataPartitions? |
ReturnType <typeof splitDataSet > |
Basically split function product - tags and samples - it is thrown away if no longer needed to change this behaviour, see keepFullLearningData in configuration. It should look, like that: {'tag':[sample,anotherSample],'anotherTag':[sample,anotherSample,nextSample]} |
regressionTreeAverageOutcome? |
number |
Average outcome of samples of regression tree in this node. |
regressionTreeStandardDeviation? |
number |
Standard deviation calculated from values of samples of regression tree in this node. |
Defined in
treeNode.ts:23
SplitCriteriaFn
SplitCriteriaFn: ReturnType
<typeof getSplitCriteriaFn
>
See return value of getSplitCriteriaFn
Defined in
split.ts:22
SplitCriteriaDefinition
SplitCriteriaDefinition: [string
, SplitOperator
, any?]
This represents split criteria in serializable way:
Array of [attributeId, operator, value?]
See split module
Example
['color', '==', 'black']
['age','>',10]
Defined in
split.ts:34
SplitOperator
SplitOperator: typeof supportedMathOperators
extends Set
<infer K> ? K
: never
Split operator is of supported mathematical operators - check current code for supported choices.
Defined in
split.ts:18
Functions
buildAlgorithmConfiguration
buildAlgorithmConfiguration(dataSet
, configuration?
): TreeGardenConfiguration
This function will help you to create configuration for your decision tree or forest. If you have at least part of data set with all classes present, you can create configuration automatically (see examples - every training/evaluating needs configuration), if you have just one sample check example.
See defaultConfiguration to see default values.
Parameters
Name | Type | Description |
---|---|---|
dataSet |
TreeGardenDataSample [] |
Array of tree-garden samples, be sure you have all classes included |
configuration |
Partial <Omit <TreeGardenConfiguration , "attributes" > & { attributes? : { [key: string] : Partial <typeof defaultAttributeConfiguration >; } }> |
override default configuration with your own. |
Returns
Defined in
algorithmConfiguration/buildAlgorithmConfiguration.ts:214
growTree
growTree(algorithmConfiguration
, dataSet
): TreeGardenNode
Grow (train) your decision tree on your configuration and data set. See examples in getting started
Parameters
Name | Type |
---|---|
algorithmConfiguration |
TreeGardenConfiguration |
dataSet |
TreeGardenDataSample [] |
Returns
Defined in
growTree.ts:11
growRandomForest
growRandomForest(algorithmConfiguration
, dataSet
): Object
Grow (train) your random forest on your configuration and data set. See random forest example.
Parameters
Name | Type |
---|---|
algorithmConfiguration |
TreeGardenConfiguration |
dataSet |
TreeGardenDataSample [] |
Returns
Object
Name | Type |
---|---|
trees |
TreeGardenNode [] |
oobError |
undefined | number |
treesAndOobSets |
readonly [TreeGardenNode , undefined | Set <undefined | string | number >][] |
Defined in
growRandomForest.ts:17
getTreePrediction
getTreePrediction<T
>(samplesToPredict
, decisionTreeRoot
, algorithmConfiguration
, referenceDataSetForReplacing?
): PredictionReturnValue
<T
>
Get outcome of your trained decision tree on unknown samples. See examples to see it in action.
Type parameters
Name | Type |
---|---|
T |
extends TreeGardenDataSample | TreeGardenDataSample [] |
Parameters
Name | Type | Description |
---|---|---|
samplesToPredict |
T |
- |
decisionTreeRoot |
TreeGardenNode |
- |
algorithmConfiguration |
TreeGardenConfiguration |
- |
referenceDataSetForReplacing? |
TreeGardenDataSample [] |
Provide data set to replace missing values in your unknown samples you want to classify. |
Returns
PredictionReturnValue
<T
>
Defined in
predict.ts:125
getRandomForestPrediction
getRandomForestPrediction<T
>(samplesToPredict
, trees
, algorithmConfiguration
, referenceDataSetForReplacing?
): PredictionReturnValue
<T
>
Get outcome of your trained random forest on unknown samples. See random forest example to see it in action.
Type parameters
Name | Type |
---|---|
T |
extends TreeGardenDataSample | TreeGardenDataSample [] |
Parameters
Name | Type | Description |
---|---|---|
samplesToPredict |
T |
- |
trees |
TreeGardenNode [] |
- |
algorithmConfiguration |
TreeGardenConfiguration |
- |
referenceDataSetForReplacing? |
TreeGardenDataSample [] |
Provide data set to replace missing values in your unknown samples you want to classify. |
Returns
PredictionReturnValue
<T
>
Defined in
predict.ts:146
getTreeAccuracy
getTreeAccuracy(treeRootNode
, dataSet
, configuration
): number
Calculate accuracy for tree (classification and regression) on given data set.
See getMissClassificationRateRaw and getRAbsErrorRaw for more information
Parameters
Name | Type |
---|---|
treeRootNode |
TreeGardenNode |
dataSet |
TreeGardenDataSample [] |
configuration |
TreeGardenConfiguration |
Returns
number
Defined in
statistic/treeStats.ts:93
getDividedSet
getDividedSet(dataSet
, portionGoesToFirst?
): TreeGardenDataSample
[][]
Function that randomly distributes samples of data set into two data sets.
Example
// 70% goes to training, rest to validation
const [trainingDataSet,validationDataSet] = getDividedSet(originalDataSet,0.7)
Parameters
Name | Type | Default value | Description |
---|---|---|---|
dataSet |
TreeGardenDataSample [] |
undefined |
- |
portionGoesToFirst |
number |
0.5 |
portion of samples that will go to first one, rest goes to second one 0 - 1 |
Returns
Defined in
dataSet/dividingAndBootstrapping.ts:15
Variables
defaultConfiguration
Const
defaultConfiguration: TreeGardenConfiguration
Default configuration. See code for more information.
Defined in
algorithmConfiguration/algorithmDefaultConfiguration.ts:27
defaultAttributeConfiguration
Const
defaultAttributeConfiguration: Object
Default configuration for attribute.
Type declaration
Name | Type |
---|---|
dataType |
"discrete" | "continuous" | "automatic" |
growMissingValueReplacement |
undefined | (dataSet : TreeGardenDataSample [], attributeId : string , configuration : TreeGardenConfiguration ) => (sampleWithMissingValue : TreeGardenDataSample ) => string | number |
evaluateMissingValueReplacement |
undefined | (dataSet : TreeGardenDataSample [], attributeId : string , configuration : TreeGardenConfiguration ) => (sampleWithMissingValue : TreeGardenDataSample ) => string | number |
missingValue |
any |
getAllPossibleSplitCriteriaForDiscreteAttribute |
undefined | (attributeId : string , dataSet : TreeGardenDataSample [], configuration : TreeGardenConfiguration ) => SplitCriteriaDefinition [] |
getAllPossibleSplitCriteriaForContinuousAttribute |
undefined | (attributeId : string , dataSet : TreeGardenDataSample [], configuration : TreeGardenConfiguration ) => SplitCriteriaDefinition [] |
Defined in
algorithmConfiguration/attibuteDefaultConfiguration.ts:12