Namespace: dataSet
References
Functions
- getBootstrappedDataSet
- getBootstrappedDataSetAndOutOfTheBagRest
- getKFoldCrossValidationDataSets
- getMostCommonTagOfSamplesInNode
- getDataSetWithReplacedValues
- getMostCommonValueFF
- getMostCommonValueAmongSameClassFF
- getClassesOfDataSet
- getTypeOfAttribute
- getAllAttributeIds
- getAllUniqueValuesOfAttribute
- getAllValuesOfAttribute
References
getDividedSet
Re-exports getDividedSet
Functions
getBootstrappedDataSet
getBootstrappedDataSet(dataSet
, howMany?
): TreeGardenDataSample
[]
Implementation of bootstrap aggregating for random forests. It randomly pulls samples from original data set, samples can repeat. There should be around 63.2% unique samples, rest are copies
Parameters
Name | Type | Description |
---|---|---|
dataSet |
TreeGardenDataSample [] |
- |
howMany? |
number |
if not defined, same amount as length of original data set is returned. |
Returns
Defined in
dataSet/dividingAndBootstrapping.ts:35
getBootstrappedDataSetAndOutOfTheBagRest
getBootstrappedDataSetAndOutOfTheBagRest(dataSet
, howMany?
): readonly [TreeGardenDataSample
[], Set
<undefined
| string
| number
>]
Function, that returns bootstrapped data sample and also out of the bag sample ids in Set.
If samples of data set do not have their own unique _id
, they are generated.
Parameters
Name | Type | Description |
---|---|---|
dataSet |
TreeGardenDataSample [] |
- |
howMany? |
number |
if not defined, same amount as length of original data set is returned. |
Returns
readonly [TreeGardenDataSample
[], Set
<undefined
| string
| number
>]
Defined in
dataSet/dividingAndBootstrapping.ts:45
getKFoldCrossValidationDataSets
getKFoldCrossValidationDataSets(dataSet
, kFold?
): { validation
: TreeGardenDataSample
[] ; training
: TreeGardenDataSample
[] }[]
Function that will return data sets for cross validation. If you set kFold
on data sample length-1, you will run
leave one out cross validation
Parameters
Name | Type | Default value | Description |
---|---|---|---|
dataSet |
TreeGardenDataSample [] |
undefined |
- |
kFold |
number |
10 |
how many data sets should be generated. |
Returns
{ validation
: TreeGardenDataSample
[] ; training
: TreeGardenDataSample
[] }[]
Defined in
dataSet/dividingAndBootstrapping.ts:61
getMostCommonTagOfSamplesInNode
getMostCommonTagOfSamplesInNode(sample
, attributeId
, nodeWhereWeeNeedValue
, _config
): string
Get most common tag among samples that landed in given node. See not class, but tag of split.
Parameters
Name | Type |
---|---|
sample |
TreeGardenDataSample |
attributeId |
string |
nodeWhereWeeNeedValue |
TreeGardenNode |
_config |
TreeGardenConfiguration |
Returns
string
Defined in
dataSet/replaceMissingValues.ts:133
getDataSetWithReplacedValues
getDataSetWithReplacedValues(__namedParameters
): TreeGardenDataSample
[]
Get data set with replaced missing values according to reference dataset or itself if referenceDataSet is not provided
Parameters
Name | Type |
---|---|
__namedParameters |
ReplaceOptions |
Returns
Defined in
dataSet/replaceMissingValues.ts:96
getMostCommonValueFF
getMostCommonValueFF(dataSet
, attributeId
, configuration
): (sampleWithMissingValue
: TreeGardenDataSample
) => string
| number
closure WARNING :D FF stands for Function Factory - on algorithm start it is called and replacer function is produced. replacer function takes only sample and return sample copy with replaced values
What it really does? If replacer meets sample with missing attirbute color, it will check color of all samples in dataset and replaces color of sample with most common color
Parameters
Name | Type |
---|---|
dataSet |
TreeGardenDataSample [] |
attributeId |
string |
configuration |
TreeGardenConfiguration |
Returns
fn
(sampleWithMissingValue
): string
| number
Parameters
Name | Type |
---|---|
sampleWithMissingValue |
TreeGardenDataSample |
Returns
string
| number
Defined in
dataSet/replaceMissingValues.ts:20
getMostCommonValueAmongSameClassFF
getMostCommonValueAmongSameClassFF(dataSet
, attributeId
, configuration
): (sampleWithMissingValue
: TreeGardenDataSample
) => string
| number
closure WARNING :D FF stands for Function Factory see getMostCommonValueFF this is usable only in induction time, evaluation samples do not have _class!
What it really does?
It works like getMostCommonValueFF but also take into account sample class. So if class was
for instance yes
it will find most common value for attribute just among samples with same class.
Parameters
Name | Type |
---|---|
dataSet |
TreeGardenDataSample [] |
attributeId |
string |
configuration |
TreeGardenConfiguration |
Returns
fn
(sampleWithMissingValue
): string
| number
Parameters
Name | Type |
---|---|
sampleWithMissingValue |
TreeGardenDataSample |
Returns
string
| number
Defined in
dataSet/replaceMissingValues.ts:58
getClassesOfDataSet
getClassesOfDataSet(dataSet
): any
[]
Extracts all possible classes from data set.
Parameters
Name | Type |
---|---|
dataSet |
TreeGardenDataSample [] |
Returns
any
[]
Defined in
dataSet/set.ts:18
getTypeOfAttribute
getTypeOfAttribute(dataSet
, attributeId
, missingValue?
): "discrete"
| "continuous"
decide if values under given attributeId of dataset are continuous or discrete
Parameters
Name | Type | Default value |
---|---|---|
dataSet |
TreeGardenDataSample [] |
undefined |
attributeId |
string |
undefined |
missingValue |
any |
undefined |
Returns
"discrete"
| "continuous"
Defined in
dataSet/set.ts:53
getAllAttributeIds
getAllAttributeIds(dataSet
): string
[]
Extract all possible attributes from data set (except metadata starting with underscore)
Parameters
Name | Type |
---|---|
dataSet |
TreeGardenDataSample [] |
Returns
string
[]
Defined in
dataSet/set.ts:35
getAllUniqueValuesOfAttribute
getAllUniqueValuesOfAttribute(attributeId
, dataSet
): any
[]
Parameters
Name | Type |
---|---|
attributeId |
string |
dataSet |
TreeGardenDataSample [] |
Returns
any
[]
Defined in
dataSet/set.ts:62
getAllValuesOfAttribute
getAllValuesOfAttribute(attributeId
, dataSet
): any
[]
Parameters
Name | Type |
---|---|
attributeId |
string |
dataSet |
TreeGardenDataSample [] |
Returns
any
[]
Defined in
dataSet/set.ts:61