Namespace: split
With help of function defined inside this namespace you can define your own spit criteria. If these, automatically generated from attribute definitions are not enough for you. This should not be generally needed, for some advanced usages, you can create some pseudo fields in your data set.
// WARNING: code is just for demonstration, how to use functions inside split namespace
// you can achieve same effect with data set preprocessing and it will be probably easier (recreate dataset to your needs)
import {
growTree,
buildAlgorithmConfiguration,
split,
impurity,
prune,
sampleDataSets,
TreeGardenDataSample,
TreeGardenConfiguration
} from 'tree-garden';
// lets use build-in titanic data set
const { titanicSet } = sampleDataSets;
// lets speedup training phase and have just two age categories - by default all age ranges are generated
const myOwnContinuousAttributeGatheringFn = (attributeId:string, dataSet:TreeGardenDataSample[], configuration:TreeGardenConfiguration) => {
if (attributeId === 'age') {
// see we need to return array of arrays as you usually generate more criteria from single attribute
return [
['age', '<', 10], // chick
['age', '>', 40] // elder
];
}
return split.getPossibleSpitCriteriaForContinuousAttribute(attributeId, dataSet, configuration);
};
const algorithmConfig = buildAlgorithmConfiguration(titanicSet, {
// removed attributes with many values - i can use information gain, which is cheaper for calculation
excludedAttributes: ['ticket', 'embarked', 'name', 'cabin'],
getScoreForSplit: impurity.getInformationGainForSplit,
getAllPossibleSplitCriteriaForContinuousAttribute: myOwnContinuousAttributeGatheringFn
});
const rawTree = growTree(algorithmConfig, titanicSet);
const prunedTree = prune.getPrunedTreeByPessimisticPruning(rawTree);
// output result - put it to visualization tool
console.log(JSON.stringify(prunedTree));
If you put output to visualization tool, you can see our split criteria was used couple of times ;)
Functions
- getPossibleSpitCriteriaForContinuousAttribute
- getPossibleSpitCriteriaForDiscreteAttribute
- getSplitCriteriaFn
Functions
getPossibleSpitCriteriaForContinuousAttribute
getPossibleSpitCriteriaForContinuousAttribute(attributeId
, dataSet
, configuration
): SplitCriteriaDefinition
[]
Parameters
Name | Type |
---|---|
attributeId |
string |
dataSet |
TreeGardenDataSample [] |
configuration |
Object |
Returns
Defined in
split.ts:181
getPossibleSpitCriteriaForDiscreteAttribute
getPossibleSpitCriteriaForDiscreteAttribute(attributeId
, dataSet
, configuration
): SplitCriteriaDefinition
[]
Parameters
Name | Type |
---|---|
attributeId |
string |
dataSet |
TreeGardenDataSample [] |
configuration |
Object |
Returns
Defined in
split.ts:167
getSplitCriteriaFn
getSplitCriteriaFn(attributeId
, operator
, value?
): (currentSample
: TreeGardenDataSample
) => any
Factory function that returns function that accepts sample and decides what is its tag for splitting useful for unit tests.
Example
const splitDefinition = ['weight', '>', 30] as const;
const splitter = split.getSplitCriteriaFn(...splitDefinition);
console.log(splitter({ weight: 40 })); // should show true
Parameters
Name | Type |
---|---|
attributeId |
string |
operator |
"==" | ">=" | "<=" | ">" | "<" |
value? |
string | number | Function | (string | number )[] |
Returns
fn
(currentSample
): any
Parameters
Name | Type |
---|---|
currentSample |
TreeGardenDataSample |
Returns
any
Defined in
split.ts:47