Skip to content

Configuration from data sample

Let`s obtain algorithm configuration just from single data sample.

This may be handy, if we do not have access to full training data set - for instance if we have build classification service which uses pretrained tree/forest.

code

// in case of prediction we do not have whole data set available, but we still need algorithmConfiguration for prediction.
// we can inherit configuration just with single complete sample and knowledge of all classes in case of classification tree
// we do not need to write it by hand...

import {
  buildAlgorithmConfiguration,
  getTreePrediction,
  sampleTrees
} from 'tree-garden';

// Let`s use pretrained tree, which is bundled with tree-garden
const { tennisTree } = sampleTrees;

// we need configuration in order to be able to predict some unknown samples
// we will buildConfiguration using just single complete (without missing values) [sample for config]
// sample and knowledge of all classes
const singleSample = {
  _label: '5', outlook: 'Rain', temp: 'Cool', humidity: 'Normal', wind: 'Weak', _class: 'Yes'
};

// full configuration that can be used for predictions
const config = buildAlgorithmConfiguration(
  [singleSample],
  {
    allClasses: ['Yes', 'No'] // [important]
  }
);


// sample of interest - based on today`s weather ;)
const shouldIGoToPlayTennisTodaySample = {
  outlook: 'Sunny',
  temp: 'Mild',
  humidity: 'Normal',
  wind: 'Weak'
};

// prediction from our imported tree
const shouldIStayOrShouldIGo = getTreePrediction(shouldIGoToPlayTennisTodaySample, tennisTree, config);

// lets see if I should go
console.log(`Hey mighty tree, should i go play tennis today?\nMighty tree says: ${shouldIStayOrShouldIGo}`);

comments

In this example we imported bundled tree, take one data sample and used it to build configuration.
This configuration is then used for predicting our unknown sample.

[sample for config]
As wee used just single data sample to create configuration. We need sample without missing values - all fields from learning phase where whole data set was presented must be included in this single sample.

[important] We also need to provide all classes presented in our data set in our case it is 'Yes' and 'No'