Skip to content

Namespace: dataSet

References

Functions

References

getDividedSet

Re-exports getDividedSet

Functions

getBootstrappedDataSet

getBootstrappedDataSet(dataSet, howMany?): TreeGardenDataSample[]

Implementation of bootstrap aggregating for random forests. It randomly pulls samples from original data set, samples can repeat. There should be around 63.2% unique samples, rest are copies

Parameters

Name Type Description
dataSet TreeGardenDataSample[] -
howMany? number if not defined, same amount as length of original data set is returned.

Returns

TreeGardenDataSample[]

Defined in

dataSet/dividingAndBootstrapping.ts:35


getBootstrappedDataSetAndOutOfTheBagRest

getBootstrappedDataSetAndOutOfTheBagRest(dataSet, howMany?): readonly [TreeGardenDataSample[], Set<undefined | string | number>]

Function, that returns bootstrapped data sample and also out of the bag sample ids in Set. If samples of data set do not have their own unique _id, they are generated.

Parameters

Name Type Description
dataSet TreeGardenDataSample[] -
howMany? number if not defined, same amount as length of original data set is returned.

Returns

readonly [TreeGardenDataSample[], Set<undefined | string | number>]

Defined in

dataSet/dividingAndBootstrapping.ts:45


getKFoldCrossValidationDataSets

getKFoldCrossValidationDataSets(dataSet, kFold?): { validation: TreeGardenDataSample[] ; training: TreeGardenDataSample[] }[]

Function that will return data sets for cross validation. If you set kFold on data sample length-1, you will run leave one out cross validation

Parameters

Name Type Default value Description
dataSet TreeGardenDataSample[] undefined -
kFold number 10 how many data sets should be generated.

Returns

{ validation: TreeGardenDataSample[] ; training: TreeGardenDataSample[] }[]

Defined in

dataSet/dividingAndBootstrapping.ts:61


getMostCommonTagOfSamplesInNode

getMostCommonTagOfSamplesInNode(sample, attributeId, nodeWhereWeeNeedValue, _config): string

Get most common tag among samples that landed in given node. See not class, but tag of split.

Parameters

Name Type
sample TreeGardenDataSample
attributeId string
nodeWhereWeeNeedValue TreeGardenNode
_config TreeGardenConfiguration

Returns

string

Defined in

dataSet/replaceMissingValues.ts:133


getDataSetWithReplacedValues

getDataSetWithReplacedValues(__namedParameters): TreeGardenDataSample[]

Get data set with replaced missing values according to reference dataset or itself if referenceDataSet is not provided

Parameters

Name Type
__namedParameters ReplaceOptions

Returns

TreeGardenDataSample[]

Defined in

dataSet/replaceMissingValues.ts:96


getMostCommonValueFF

getMostCommonValueFF(dataSet, attributeId, configuration): (sampleWithMissingValue: TreeGardenDataSample) => string | number

closure WARNING :D FF stands for Function Factory - on algorithm start it is called and replacer function is produced. replacer function takes only sample and return sample copy with replaced values

What it really does? If replacer meets sample with missing attirbute color, it will check color of all samples in dataset and replaces color of sample with most common color

Parameters

Name Type
dataSet TreeGardenDataSample[]
attributeId string
configuration TreeGardenConfiguration

Returns

fn

(sampleWithMissingValue): string | number

Parameters
Name Type
sampleWithMissingValue TreeGardenDataSample
Returns

string | number

Defined in

dataSet/replaceMissingValues.ts:20


getMostCommonValueAmongSameClassFF

getMostCommonValueAmongSameClassFF(dataSet, attributeId, configuration): (sampleWithMissingValue: TreeGardenDataSample) => string | number

closure WARNING :D FF stands for Function Factory see getMostCommonValueFF this is usable only in induction time, evaluation samples do not have _class!

What it really does? It works like getMostCommonValueFF but also take into account sample class. So if class was for instance yes it will find most common value for attribute just among samples with same class.

Parameters

Name Type
dataSet TreeGardenDataSample[]
attributeId string
configuration TreeGardenConfiguration

Returns

fn

(sampleWithMissingValue): string | number

Parameters
Name Type
sampleWithMissingValue TreeGardenDataSample
Returns

string | number

Defined in

dataSet/replaceMissingValues.ts:58


getClassesOfDataSet

getClassesOfDataSet(dataSet): any[]

Extracts all possible classes from data set.

Parameters

Name Type
dataSet TreeGardenDataSample[]

Returns

any[]

Defined in

dataSet/set.ts:18


getTypeOfAttribute

getTypeOfAttribute(dataSet, attributeId, missingValue?): "discrete" | "continuous"

decide if values under given attributeId of dataset are continuous or discrete

Parameters

Name Type Default value
dataSet TreeGardenDataSample[] undefined
attributeId string undefined
missingValue any undefined

Returns

"discrete" | "continuous"

Defined in

dataSet/set.ts:53


getAllAttributeIds

getAllAttributeIds(dataSet): string[]

Extract all possible attributes from data set (except metadata starting with underscore)

Parameters

Name Type
dataSet TreeGardenDataSample[]

Returns

string[]

Defined in

dataSet/set.ts:35


getAllUniqueValuesOfAttribute

getAllUniqueValuesOfAttribute(attributeId, dataSet): any[]

Parameters

Name Type
attributeId string
dataSet TreeGardenDataSample[]

Returns

any[]

Defined in

dataSet/set.ts:62


getAllValuesOfAttribute

getAllValuesOfAttribute(attributeId, dataSet): any[]

Parameters

Name Type
attributeId string
dataSet TreeGardenDataSample[]

Returns

any[]

Defined in

dataSet/set.ts:61