Skip to content

Namespace: split

With help of function defined inside this namespace you can define your own spit criteria. If these, automatically generated from attribute definitions are not enough for you. This should not be generally needed, for some advanced usages, you can create some pseudo fields in your data set.

// WARNING: code is just for demonstration, how to use functions inside split namespace
// you can achieve same effect with data set preprocessing and it will be probably easier (recreate dataset to your needs)

import {
  growTree,
  buildAlgorithmConfiguration,
  split,
  impurity,
  prune,
  sampleDataSets,
  TreeGardenDataSample,
  TreeGardenConfiguration
} from 'tree-garden';


// lets use build-in titanic data set
const { titanicSet } = sampleDataSets;

// lets speedup training phase and have just two age categories - by default all age ranges are generated
const myOwnContinuousAttributeGatheringFn = (attributeId:string, dataSet:TreeGardenDataSample[], configuration:TreeGardenConfiguration) => {
  if (attributeId === 'age') {
    // see we need to return array of arrays as you usually generate more criteria from single attribute
    return [
      ['age', '<', 10], // chick
      ['age', '>', 40] // elder
    ];
  }
  return split.getPossibleSpitCriteriaForContinuousAttribute(attributeId, dataSet, configuration);
};


const algorithmConfig = buildAlgorithmConfiguration(titanicSet, {
  // removed attributes with many values - i can use information gain, which is cheaper for calculation
  excludedAttributes: ['ticket', 'embarked', 'name', 'cabin'],
  getScoreForSplit: impurity.getInformationGainForSplit,
  getAllPossibleSplitCriteriaForContinuousAttribute: myOwnContinuousAttributeGatheringFn
});

const rawTree = growTree(algorithmConfig, titanicSet);
const prunedTree = prune.getPrunedTreeByPessimisticPruning(rawTree);

// output result - put it to visualization tool
console.log(JSON.stringify(prunedTree));

If you put output to visualization tool, you can see our split criteria was used couple of times ;)

Functions

Functions

getPossibleSpitCriteriaForContinuousAttribute

getPossibleSpitCriteriaForContinuousAttribute(attributeId, dataSet, configuration): SplitCriteriaDefinition[]

Parameters

Name Type
attributeId string
dataSet TreeGardenDataSample[]
configuration Object

Returns

SplitCriteriaDefinition[]

Defined in

split.ts:181


getPossibleSpitCriteriaForDiscreteAttribute

getPossibleSpitCriteriaForDiscreteAttribute(attributeId, dataSet, configuration): SplitCriteriaDefinition[]

Parameters

Name Type
attributeId string
dataSet TreeGardenDataSample[]
configuration Object

Returns

SplitCriteriaDefinition[]

Defined in

split.ts:167


getSplitCriteriaFn

getSplitCriteriaFn(attributeId, operator, value?): (currentSample: TreeGardenDataSample) => any

Factory function that returns function that accepts sample and decides what is its tag for splitting useful for unit tests.

Example

const splitDefinition = ['weight', '>', 30] as const;
const splitter = split.getSplitCriteriaFn(...splitDefinition);
console.log(splitter({ weight: 40 })); // should show true

Parameters

Name Type
attributeId string
operator "==" | ">=" | "<=" | ">" | "<"
value? string | number | Function | (string | number)[]

Returns

fn

(currentSample): any

Parameters
Name Type
currentSample TreeGardenDataSample
Returns

any

Defined in

split.ts:47