TODO:
* Current implementation grows full tree based on the minimum number of
observations per tree leaf setting and generates representation based on
the maximum depth which is not reasonable computationally. Depth level
is planned to be dropped. This way tree size will be controlled by single
parameter.
* Patterns are learned using ensemble of regression trees. Approach is
embarrassingly parallel. A combine routine that can combine multiple trees
in an ensemble is required to be implemented to benefit from parallelism.
* The probability of each observation being used by the model is not
equal because of the sampling scheme. The observations towards the start
and end of the time series are less likely to appear in the segments
A fair sampling strategy is needed for segment selection.
* Identifying common patterns and plotting them should be improved for
interpretability purposes.
* Currently the package requires UCR time series database format (each time
series is a row and columns are the observations over time). Therefore,
time series are assumed to be the same length although LPS can handle
time series of different length which requires changes in input format.
* LPS can work for multivariate time series similarity. Current version
needs modifications for multivariate time series.
* LPS can work for categorical time series (i.e. DNA sequences). Current version
requires modification for categorical variables.
========================================================================
Changes in 1.0-3:
* Error (mean square error) for each individual tree is generated.
This is useful if some trees do not generate valuable information.
Filtering based on the error rates may improve the results.
* Trees to be used for representation generation and similarity computation
can be selected by the modification of 'which.tree' argument in the related
functions.
* Introduced totally random splitting strategy when building trees through
argument 'random.split' in function 'learnPattern'. This is computationally
faster compared to regression splits. Totally random splitting generates a
split from uniform distribution between minimum and maximum observations.
Also kd-tree type split is introduced (random.split=2) for which split value
is the median of the observations at each node.
* Introduced prediction capabilities. When "nodes" argument of
"predict.learnPattern" is set to FALSE, average predictions over all trees
for each time point are returned. Maybe useful for denoising.
Changes in 1.0-2:
* learnPattern 'replace' argument was returning a segmentation fault, fixed
* learnPattern now uses segment.factor=c(0.1,0.9) as default for random
segment generation and ntree=200 as default for number of trees in the
ensemble
* Prediction of observed values is now enabled. This can be used for
different purposes such as time series modeling and denoising
* If replace is set to TRUE, randomly selected time series are left
out-of-bag during training of each tree and predictions are made over
the segments. OOB predictions are now returned by learnPattern and
also OOB error is computed over the trees. Mean square error (MSE) is
returned. (added the option oob.pred to determine whether predictions
are returned.
* MSE based on OOB predictions can be plotted by simply running plot function
on learnPattern objects (if sampling is done). This estimate can be used
to set the number of trees.
* getTreeInfo is introduced for extracting the tree structures from the
ensemble (learnPattern object)
* Transformation to matrix is done internally for representing single time
series
* plotMDS is introduced for transforming similarity information to latent
variables using traditional multidimensional scaling for plotting purposes
* Identifying common patterns and plotting them is enabled for interpretability
purposes.
Change in 1.0-1:
* Fixed some minor problems with memory allocations.