Title: | Machine Learning Algorithms for Multivariate Time Series |
---|---|
Description: | An implementation of several machine learning algorithms for multivariate time series. The package includes functions allowing the execution of clustering, classification or outlier detection methods, among others. It also incorporates a collection of multivariate time series datasets which can be used to analyse the performance of new proposed algorithms. Some of these datasets are stored in GitHub data packages 'ueadata1' to 'ueadata8'. To access these data packages, run 'install.packages(c('ueadata1', 'ueadata2', 'ueadata3', 'ueadata4', 'ueadata5', 'ueadata6', 'ueadata7', 'ueadata8'), repos='<https://anloor7.github.io/drat/>')'. The installation takes a couple of minutes but we strongly encourage the users to do it if they want to have available all datasets of mlmts. Practitioners from a broad variety of fields could benefit from the general framework provided by 'mlmts'. |
Authors: | Angel Lopez-Oriona [aut, cre], Jose A. Vilar [aut] |
Maintainer: | Angel Lopez-Oriona <[email protected]> |
License: | GPL-2 |
Version: | 1.1.2 |
Built: | 2025-02-15 03:57:45 UTC |
Source: | https://github.com/cran/mlmts |
Multivariate time series (MTS) of movements of tongue and lips during speech. The data were collected from multiple native English speakers producing 25 words.
data(ArticularlyWordRecognition)
data(ArticularlyWordRecognition)
A list
with two elements, which are:
data
A list with 575 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 144 rows (time points) indicating movement and 9 columns (variables) indicating sensors. The first 275 elements
correspond to the training set, whereas the last 300 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 25, indicating that there are 25 different classes in the database. Each class is associated with a different
word produced by the speaker. For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata1", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata1::ArticularyWordRecognition".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) of two-channel ECG recordings of atrial fibrillation. The database has been created from data used in the Computers in Cardiology Challenge 2004.
data(AtrialFibrillation)
data(AtrialFibrillation)
A list
with two elements, which are:
data
A list with 30 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 640 rows (time points) indicating ECG measures and 2 columns (variables) indicating ECG leads. The first 15 elements
correspond to the training set, whereas the last 15 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 3, indicating that there are 3 different classes in the database. Each class is associated with a different
type of atrial fibrillation. For more information, see Bagnall et al. (2018).
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) of four students performing four activities while wearing a smart watch.
data(BasicMotions)
data(BasicMotions)
A list
with two elements, which are:
data
A list with 80 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 100 rows (time points) indicating movement and 6 columns (variables). The first 40 elements
correspond to the training set, whereas the last 40 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 4, indicating that there are 4 different classes in the database. Each class is associated with a different
physical activity. For more information, Bagnall et al. (2018).
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) of character samples, captured using a WACOM tablet. Data was recorded at 200Hz.
data(CharacterTrajectories)
data(CharacterTrajectories)
A list
with two elements, which are:
data
A list with 80 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 182 rows (time points) indicating velocity trajectory and 3 columns (variables) indicating spatial dimension. The first 1422 elements
correspond to the training set, whereas the last 1436 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 20, indicating that there are 20 different classes in the database. Each class is associated with a different
alphabetical character. For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata1", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata1::CharacterTrajectories".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) of four cricket umpires performing twelve signals, each with ten repetitions.
data(Cricket)
data(Cricket)
A list
with two elements, which are:
data
A list with 180 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 1197 rows (time points) indicating acceleration and 6 columns (variables) indicating spatial dimension
with regards to two accelerometers. The first 108 elements
correspond to the training set, whereas the last 72 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 12, indicating that there are 12 different classes in the database. Each class is associated with a different
event signaled by the umpire. For more information, see Bagnall et al. (2018).
Run install.packages("ueadata1", repos="https://anloor7.github.io/drat")
to access this dataset and use the syntax ueadata1::Cricket.
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
dis_2dsvd
returns a pairwise distance matrix based on the 2dSVD
distance measure proposed by Weng and Shen (2008).
dis_2dsvd(X, var_u = 0.9, var_v = 0.9, features = FALSE)
dis_2dsvd(X, var_u = 0.9, var_v = 0.9, features = FALSE)
X |
A list of MTS (numerical matrices). |
var_u |
Rate of retained variability concerning the row-row covariance matrix. |
var_v |
Rate of retained variability concerning the column-column covariance matrix. |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
where and
are the
th columns of matrices
and
, which are obtained by
decomposing the time series
and
, respectively,
by means of the 2dSVD procedure (average row-row and column-column covariance matrices
are taken into account), and
is the number of first retained eigenvectors
concerning the average column-column covariance matrices.
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
Weng X, Shen J (2008). “Classification of multivariate time series using two-dimensional singular value decomposition.” Knowledge-Based Systems, 21(7), 535–539.
toy_dataset <- BasicMotions$data[1 : 10] # Selecting the first 10 MTS from the # dataset BasicMotions distance_matrix <- dis_2dsvd(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_2dsvd feature_dataset <- dis_2dsvd(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- BasicMotions$data[1 : 10] # Selecting the first 10 MTS from the # dataset BasicMotions distance_matrix <- dis_2dsvd(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_2dsvd feature_dataset <- dis_2dsvd(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_cor
returns a pairwise distance matrix based on a generalization of the
dissimilarity introduced by D'Urso and Maharaj (2009).
dis_cor(X, lag_max = 1, features = FALSE)
dis_cor(X, lag_max = 1, features = FALSE)
X |
A list of MTS (numerical matrices). |
lag_max |
The maximum lag considered to compute the auto and cross-correlations. |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
where and
are vectors
containing the estimated autocorrelations within
and
, respectively, and
and
are vectors
containing the estimated cross-correlations within
and
, respectively.
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
D'Urso P, Maharaj EA (2009). “Autocorrelation-based fuzzy clustering of time series.” Fuzzy Sets and Systems, 160(24), 3565–3589.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_cor(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_cor distance_matrix <- dis_cor(toy_dataset, lag_max = 5) # Considering # auto and cross-correlations up to lag 5 in the computation of the distance feature_dataset <- dis_cor(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_cor(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_cor distance_matrix <- dis_cor(toy_dataset, lag_max = 5) # Considering # auto and cross-correlations up to lag 5 in the computation of the distance feature_dataset <- dis_cor(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_dtw_1
returns a pairwise distance matrix based on one of the multivariate
extensions of the well-known dynamic time warping distance (Shokoohi-Yekta et al. 2017).
dis_dtw_1(X, normalization = FALSE, ...)
dis_dtw_1(X, normalization = FALSE, ...)
X |
A list of MTS (numerical matrices). |
normalization |
Logical. If |
... |
Additional parameters for the function. See |
Given a collection of MTS, the function returns the pairwise distance matrix, where the distance between two MTS is defined as the sum of the standard dynamic time warping distances between each corresponding pair of dimensions (univariate time series)
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
Shokoohi-Yekta M, Hu B, Jin H, Wang J, Keogh E (2017). “Generalizing DTW to the multi-dimensional case requires an adaptive approach.” Data mining and knowledge discovery, 31(1), 1–31.
dis_dtw_2
, dis_mahalanobis_dtw
toy_dataset <- AtrialFibrillation$data[1 : 5] # Selecting the first 5 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_dtw_1(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_dtw_1 without normalization distance_matrix_normalized <- dis_dtw_1(toy_dataset, normalization = TRUE) # Computing the pairwise distance matrix based # on the distance dis_dtw_1 with normalization
toy_dataset <- AtrialFibrillation$data[1 : 5] # Selecting the first 5 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_dtw_1(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_dtw_1 without normalization distance_matrix_normalized <- dis_dtw_1(toy_dataset, normalization = TRUE) # Computing the pairwise distance matrix based # on the distance dis_dtw_1 with normalization
dis_dtw_2
returns a pairwise distance matrix based on one of the multivariate
extensions of the well-known dynamic time warping distance (Shokoohi-Yekta et al. 2017).
dis_dtw_2(X, normalization = FALSE, ...)
dis_dtw_2(X, normalization = FALSE, ...)
X |
A list of MTS (numerical matrices). |
normalization |
Logical. If |
... |
Additional parameters for the function. See |
Given a collection of MTS, the function returns the pairwise distance matrix, where the distance between two MTS is defined as the multivariate extension of the dynamic time warping distance which forces all dimensions to warp identically, in a single warping matrix.
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
Shokoohi-Yekta M, Hu B, Jin H, Wang J, Keogh E (2017). “Generalizing DTW to the multi-dimensional case requires an adaptive approach.” Data mining and knowledge discovery, 31(1), 1–31.
dis_dtw_2
, dis_mahalanobis_dtw
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_dtw_2(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_dtw1 without normalization distance_matrix_normalized <- dis_dtw_2(toy_dataset, normalization = TRUE) # Computing the pairwise distance matrix based # distance matrix based on the distance dis_dtw1 with normalization
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_dtw_2(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_dtw1 without normalization distance_matrix_normalized <- dis_dtw_2(toy_dataset, normalization = TRUE) # Computing the pairwise distance matrix based # distance matrix based on the distance dis_dtw1 with normalization
dis_eros
returns a pairwise distance matrix based on the Eros distance
proposed by Yang and Shahabi (2004).
dis_eros(X, method = "mean", normalization = FALSE, cor = TRUE)
dis_eros(X, method = "mean", normalization = FALSE, cor = TRUE)
X |
A list of MTS (numerical matrices). |
method |
The aggregated function to compute the weights. |
normalization |
Logical indicating whether the raw eigenvalues or the
normalized eigenvalues should be used to compute the weights. Default is
|
cor |
Logical indicating whether the Singular Value Decomposition is
applied over the covariance matrix or over the correlation matrix. Default
is |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
,
where
where ,
are sets of eigenvectors concerning the covariance or correlation matrix of series
and
, respectively,
is the inner product of
and
,
is a vector of weights which is based on the eigenvalues of the MTS dataset with
and
is the angle between
and
.
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
Yang K, Shahabi C (2004). “A PCA-based similarity measure for multivariate time series.” In Proceedings of the 2nd ACM international workshop on Multimedia databases, 65–74.
toy_dataset <- BasicMotions$data[1 : 10] # Selecting the first 10 MTS from the # dataset BasicMotions distance_matrix <- dis_eros(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_eros distance_matrix <- dis_eros(toy_dataset, method = 'max', normalization = TRUE) # Considering the function max as aggregation function and the normalized # eigenvalues for the computation of the weights
toy_dataset <- BasicMotions$data[1 : 10] # Selecting the first 10 MTS from the # dataset BasicMotions distance_matrix <- dis_eros(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_eros distance_matrix <- dis_eros(toy_dataset, method = 'max', normalization = TRUE) # Considering the function max as aggregation function and the normalized # eigenvalues for the computation of the weights
dis_eucl
returns a pairwise distance matrix based on the Euclidean distance
between MTS
dis_eucl(X)
dis_eucl(X)
X |
A list of MTS (numerical matrices). |
Given a collection of MTS, the function returns the pairwise distance matrix, where the distance between two MTS is defined as the sum of the standard Euclidean distances between each corresponding pair of dimensions (univariate time series)
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_eucl(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_eucl
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_eucl(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_eucl
dis_frechet
returns a pairwise distance matrix based on the Frechet distance
between MTS
dis_frechet(X, ...)
dis_frechet(X, ...)
X |
A list of MTS (numerical matrices). |
... |
Additional parameters for the function. See |
Given a collection of MTS, the function returns the pairwise distance matrix, where the distance between two MTS is defined as the sum of the standard Frechet distances between each corresponding pair of dimensions (univariate time series)
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
toy_dataset <- Libras$data[1 : 5] # Selecting the first 5 MTS from the # dataset Libras distance_matrix <- dis_frechet(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_frechet
toy_dataset <- Libras$data[1 : 5] # Selecting the first 5 MTS from the # dataset Libras distance_matrix <- dis_frechet(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_frechet
dis_gcc
returns a pairwise distance matrix based on the generalized
cross-correlation measure introduced by Alonso and Pena (2019).
dis_gcc(X, lag_max = 1, features = FALSE)
dis_gcc(X, lag_max = 1, features = FALSE)
X |
A list of MTS (numerical matrices). |
lag_max |
The maximum lag considered to compute the generalized cross-correlation. |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
where and
are the
th dimensions (univariate time series) of
and
, respectively, and
is the estimated genelarized cross-correlation
measure between univariate series proposed by Alonso and Pena (2019).
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
Alonso AM, Pena D (2019). “Clustering time series by linear dependency.” Statistics and Computing, 29(4), 655–676.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_gcc(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_cor feature_dataset <- dis_gcc(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_gcc(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_cor feature_dataset <- dis_gcc(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_hwl
returns a pairwise distance matrix based on the feature
extraction procedure proposed by Hyndman et al. (2015).
dis_hwl(X, features = FALSE)
dis_hwl(X, features = FALSE)
X |
A list of MTS (numerical matrices). |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix, where the distance between two MTS is defined as the Euclidean distance between the corresponding feature vectors
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
Hyndman RJ, Wang E, Laptev N (2015). “Large-scale unusual time series detection.” In 2015 IEEE international conference on data mining workshop (ICDMW), 1616–1619. IEEE.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_hwl(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_hwl #' feature_dataset <- dis_hwl(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_hwl(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_hwl #' feature_dataset <- dis_hwl(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_lpp
returns a pairwise distance matrix based on the
dissimilarity introduced by Weng and Shen (2008).
dis_lpp(X, approach = 1, k = 2, t = 1, features = FALSE)
dis_lpp(X, approach = 1, k = 2, t = 1, features = FALSE)
X |
A list of MTS (numerical matrices). |
approach |
Parameter indicating whether the feature vector representing
each MTS is constructed by means of Li's first ( |
k |
Number of neighbors determining the construction of the local
structure matrix |
t |
Parameter determining the construction of the local
structure matrix |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined as
where and
are the feature
vectors constructed from Li's first (
approach=1
) or Li's second (approach=2
)
approach with respect to series
and
, respectively
and
is the matrix of locality preserving projections
whose columns are eigenvectors solving the generalized eigenvalue problem defined
by matrix
.
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features
resulting from applying Li's first (
approach=1
) or Li's second (approach=2
).
Ángel López-Oriona, José A. Vilar
Weng X, Shen J (2008). “Classification of multivariate time series using locality preserving projections.” Knowledge-Based Systems, 21(7), 581–587.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_lpp(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_lpp feature_dataset <- dis_lpp(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_lpp(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_lpp feature_dataset <- dis_lpp(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_mahalanobis
returns a pairwise distance matrix based on the
Mahalanobis divergence introduced by Singhal and Seborg (2005).
dis_mahalanobis(X)
dis_mahalanobis(X)
X |
A list of MTS (numerical matrices). |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined as
with
where and
are vectors containing the column-wise means concerning series
and
, respectively,
is the covariance matrix of
and
is the pseudo-inverse of
calculated using SVD.
In the computation of
, MTS
is assumed to be the reference series.
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
Singhal A, Seborg DE (2005). “Clustering multivariate time-series data.” Journal of Chemometrics: A Journal of the Chemometrics Society, 19(8), 427–438.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_mahalanobis(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_mahalanobis.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_mahalanobis(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_mahalanobis.
dis_mahalanobis_dtw
returns a pairwise distance matrix based on a
dynamic time warping distance in which the local cost matrix is computed
by using the Mahalanobis distance (Mei et al. 2015).
dis_mahalanobis_dtw(X, M = NULL, ...)
dis_mahalanobis_dtw(X, M = NULL, ...)
X |
A list of MTS (numerical matrices). |
M |
The matrix with respect to compute the Mahalanobis distance (default is the covariance matrix of concatenation of all MTS objects by rows). |
... |
Additional parameters for the function. See |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined as
a dynamic time warping-type distance in which the local cost matrix is
constructed by using the Mahalanobis distance.
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
Mei J, Liu M, Wang Y, Gao H (2015). “Learning a mahalanobis distance-based dynamic time warping measure for multivariate time series classification.” IEEE transactions on Cybernetics, 46(6), 1363–1374.
dis_dtw_1
, dis_dtw_2
, dis_mahalanobis_dtw
toy_dataset <- Libras$data[1 : 10] # Selecting the first 10 MTS from the # dataset Libras distance_matrix <- dis_mahalanobis_dtw(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_mahalanobis_dtw
toy_dataset <- Libras$data[1 : 10] # Selecting the first 10 MTS from the # dataset Libras distance_matrix <- dis_mahalanobis_dtw(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_mahalanobis_dtw
dis_mcc
returns a pairwise distance matrix based on an extension of
the procedure proposed by Egri et al. (2017). The
function can also be used for dimensionality reduction purposes.
dis_mcc(X, max_lag = 20, delta = 0.7, features = F)
dis_mcc(X, max_lag = 20, delta = 0.7, features = F)
X |
A list of MTS (numerical matrices). |
max_lag |
The maximum number of lags for the computation of the cross-correlations (default is 20). |
delta |
The threshold value concerning the maximal cross-correlations (default is 0.7). |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
where and
are matrices containing pairwise estimated maximal cross-correlations
(in absolute value) for series
and
, respectively,
and the operator
creates a vector by concatenating the columns
of the matrix received as input. If we use the function to perform dimensionality
reduction (
features = TRUE
), then for a given series ,
a new matrix
is
constructed by keeping the entries of matrix
which are above
(and setting all the remaining entries to zero).
The connected components of the graph defined by matrix
are computed
along with their corresponding centers (variables). Function
dis_mcc
returns the reduced counterpart of , which is constructed
from
by removing all the variables which were not
selected as centers of the corresponding components.
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
Egri A, Horváth I, Kovács F, Molontay R, Varga K (2017). “Cross-correlation based clustering and dimension reduction of multivariate time series.” In 2017 IEEE 21st International Conference on Intelligent Engineering Systems (INES), 000241–000246. IEEE.
reduced_dataset <- dis_mcc(RacketSports$data[1], features = TRUE) # Reducing # the dimensionality of the first MTS in dataset RacketSports reduced_dataset distance_matrix <- dis_mcc(Libras$data) # Computing the # corresponding distance matrix for all MTS in dataset Libras # (by default, features = F)
reduced_dataset <- dis_mcc(RacketSports$data[1], features = TRUE) # Reducing # the dimensionality of the first MTS in dataset RacketSports reduced_dataset distance_matrix <- dis_mcc(Libras$data) # Computing the # corresponding distance matrix for all MTS in dataset Libras # (by default, features = F)
dis_modwt
returns a pairwise distance matrix based on the dissimilarity
introduced by D'Urso and Maharaj (2012).
dis_modwt(X, wf = "d4", J = floor(log(nrow(X[[1]]))) - 1, features = FALSE)
dis_modwt(X, wf = "d4", J = floor(log(nrow(X[[1]]))) - 1, features = FALSE)
X |
A list of MTS (numerical matrices). |
wf |
The wavelet filter (default is 'd4'). |
J |
The maximum allowable number of scales. |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
where and
are vectors
containing the estimated wavelet variances within
and
, respectively, and
and
are vectors
containing the estimated wavelet correlations within
and
, respectively.
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
D'Urso P, Maharaj EA (2012). “Wavelets-based clustering of multivariate time series.” Fuzzy Sets and Systems, 193, 33–61.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_modwt(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_cor feature_dataset <- dis_modwt(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_modwt(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_cor feature_dataset <- dis_modwt(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_eros
returns a pairwise distance matrix based on the
PCA similarity factor proposed by Singhal and Seborg (2005).
dis_pca(X, retained_components = 3)
dis_pca(X, retained_components = 3)
X |
A list of MTS (numerical matrices). |
retained_components |
Number of retained principal components. |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
, with
where is the angle between the
th eigenvector of
and the
th eigenvector of series
,
respectively, and
and
are the
th eigenvalues of
and the
th eigenvalues of series
respectively.
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
Singhal A, Seborg DE (2005). “Clustering multivariate time-series data.” Journal of Chemometrics: A Journal of the Chemometrics Society, 19(8), 427–438.
toy_dataset <- BasicMotions$data[1 : 10] # Selecting the first 10 MTS from the # dataset BasicMotions distance_matrix <- dis_pca(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_pca
toy_dataset <- BasicMotions$data[1 : 10] # Selecting the first 10 MTS from the # dataset BasicMotions distance_matrix <- dis_pca(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_pca
dis_ppca
returns a pairwise distance matrix based on an extension of
the procedure proposed by Wan et al. (2022). The
function can also be used for dimensionality reduction purposes.
dis_ppca(X, w = 2, var_rate = 0.9, features = F)
dis_ppca(X, w = 2, var_rate = 0.9, features = F)
X |
A list of MTS (numerical matrices). |
w |
The number of segments (in the time dimension) in which we want to divide the MTS (default is 2). |
var_rate |
Rate of retained variability concerning the dimensionality-reduced MTS samples (default is 0.90). |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
where and
are estimates of the covariance matrices based on a piecewise representation for which the
original MTS
and
, respectively,
are divided into a number of
w
local segments (in the time dimension).
If we use the function to perform dimensionality reduction (features = TRUE
),
then for a given series , matrix
is decomposed by executing the standard PCA and a certain number of
principal components are retained (according to the parameter
var_rate
).
Function dis_ppca
returns the reduced counterpart of ,
which is constructed from
by considering the
matrix of scores with respect to the retained principal components.
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
Wan X, Li H, Zhang L, Wu YJ (2022). “Dimensionality reduction for multivariate time-series data mining.” The Journal of Supercomputing, 78(7), 9862–9878.
reduced_dataset <- dis_ppca(RacketSports$data[1], features = TRUE) # Reducing # the dimensionality of the first MTS in dataset RacketSports reduced_dataset distance_matrix <- dis_ppca(RacketSports$data) # Computing the # corresponding distance matrix for all MTS in dataset RacketSports # (by default, features = F)
reduced_dataset <- dis_ppca(RacketSports$data[1], features = TRUE) # Reducing # the dimensionality of the first MTS in dataset RacketSports reduced_dataset distance_matrix <- dis_ppca(RacketSports$data) # Computing the # corresponding distance matrix for all MTS in dataset RacketSports # (by default, features = F)
dis_qcd
returns a pairwise distance matrix based on the
dissimilarity introduced by Lopez-Oriona and Vilar (2021).
dis_qcd(X, levels = c(0.1, 0.5, 0.9), freq = NULL, features = FALSE, ...)
dis_qcd(X, levels = c(0.1, 0.5, 0.9), freq = NULL, features = FALSE, ...)
X |
A list of MTS (numerical matrices). |
levels |
The set of probability levels. |
freq |
Vector of frequencies in which the smoothed CCR-periodograms
must be computed. If |
features |
Logical. If |
... |
Additional parameters for the function. See |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined as
where and
are estimates of the quantile cross-spectral densities (so-called smoothed CCR-periodograms)
with respect to the variables
and
and probability levels
and
for
series
and
, respectively, and
and
denote the real part and imaginary part operators, respectively.
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
Lopez-Oriona A, Vilar JA (2021). “Quantile cross-spectral density: A novel and effective tool for clustering multivariate time series.” Expert Systems with Applications, 185, 115677.
toy_dataset <- AtrialFibrillation$data[1 : 4] # Selecting the first 4 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_qcd(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_qcd distance_matrix <- dis_qcd(toy_dataset, levels = c(0.4, 0.8)) # Changing # the probability levels to compute the QCD-based estimators distance_matrix <- dis_qcd(toy_dataset, freq = 0.5) # Considering only # a single frequency for the computation of d_qcd feature_dataset <- dis_qcd(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 4] # Selecting the first 4 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_qcd(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_qcd distance_matrix <- dis_qcd(toy_dataset, levels = c(0.4, 0.8)) # Changing # the probability levels to compute the QCD-based estimators distance_matrix <- dis_qcd(toy_dataset, freq = 0.5) # Considering only # a single frequency for the computation of d_qcd feature_dataset <- dis_qcd(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_qcf
returns a pairwise distance matrix based on a generalization of the
dissimilarity introduced by Lafuente-Rego and Vilar (2016).
dis_qcf(X, levels = c(0.1, 0.5, 0.9), max_lag = 1, features = FALSE)
dis_qcf(X, levels = c(0.1, 0.5, 0.9), max_lag = 1, features = FALSE)
X |
A list of MTS (numerical matrices). |
levels |
The set of probability levels. |
max_lag |
The maximum lag considered to compute the cross-covariances. |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined as
where and
are estimates of the quantile cross-covariances
with respect to the variables
and
and probability levels
and
for
series
and
, respectively.
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
Lafuente-Rego B, Vilar JA (2016). “Clustering of time series using quantile autocovariances.” Advances in Data Analysis and classification, 10(3), 391–415.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_qcf(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_qcf feature_dataset <- dis_qcf(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_qcf(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_qcf feature_dataset <- dis_qcf(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_spectral
returns a pairwise distance matrix based on the
dissimilarities introduced by Kakizawa et al. (1998).
dis_spectral(X, method = "j_divergence", alpha = 0.5, features = FALSE)
dis_spectral(X, method = "j_divergence", alpha = 0.5, features = FALSE)
X |
A list of MTS (numerical matrices). |
method |
Parameter indicating the method to be used for the computation
of the distance. If |
alpha |
If |
features |
Logical. If |
Given a collection of MTS, the function returns a pairwise distance matrix. If method="j_divergence"
then the distance between two MTS and
is defined as
where and
are the estimated
spectral density matrices from the series
and
, respectively, evaluated at frequency
,
and
denotes the trace of a square matrix. If
method="chernoff_divergence"
, then the distance between two MTS
and
is defined as
where .
If features = FALSE
(default), returns a distance matrix based on the distance
as long as we set
method="j_divergence"
, and based on the alternative distance as long as we set
method=
"chernoff_divergence"
.
Otherwise, if features = TRUE
, the function returns a dataset of feature vectors, i.e., each row in the dataset
contains the features employed to compute either or
. These vectors
are vectorized versions of the estimated spectral matrices.
Ángel López-Oriona, José A. Vilar
Kakizawa Y, Shumway RH, Taniguchi M (1998). “Discrimination and clustering for multivariate time series.” Journal of the American Statistical Association, 93(441), 328–340.
toy_dataset <- Libras$data[1 : 10] # Selecting the first 10 MTS from the # dataset Libras distance_matrix_j <- dis_spectral(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_jspec distance_matrix_c <- dis_spectral(toy_dataset, method = 'chernoff_divergence') # Computing the pairwise # distance matrix based on the distance dis_cspec feature_dataset <- dis_qcd(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features for d_cpec
toy_dataset <- Libras$data[1 : 10] # Selecting the first 10 MTS from the # dataset Libras distance_matrix_j <- dis_spectral(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_jspec distance_matrix_c <- dis_spectral(toy_dataset, method = 'chernoff_divergence') # Computing the pairwise # distance matrix based on the distance dis_cspec feature_dataset <- dis_qcd(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features for d_cpec
dis_swmd
returns a pairwise distance matrix based on variable-based
principal component analysis (VPCA) and a spatial weighted matrix distance
(SWMD) (He and Tan 2018).
dis_swmd(X, var_rate = 0.9, features = FALSE)
dis_swmd(X, var_rate = 0.9, features = FALSE)
X |
A list of MTS (numerical matrices). |
var_rate |
Rate of retained variability concerning the dimensionality-reduced MTS samples (default is 0.90). |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
where and
are the dimensionality-
reduced MTS samples associated with
and
, respectively, the operator
creates a vector by concatenating the columns of the matrix received as input
and
is a matrix integrating the spatial dimensionality
difference between the corresponding elements.
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
He H, Tan Y (2018). “Unsupervised classification of multivariate time series using VPCA and fuzzy clustering with spatial weighted matrix distance.” IEEE transactions on cybernetics, 50(3), 1096–1105.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_swmd(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_swmd feature_dataset <- dis_swmd(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_swmd(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_swmd feature_dataset <- dis_swmd(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_cor
returns a pairwise distance matrix based on a generalization of the
dissimilarity introduced by Piccolo (1990).
dis_var_1(X, max_p = 1, criterion = "AIC", features = FALSE)
dis_var_1(X, max_p = 1, criterion = "AIC", features = FALSE)
X |
A list of MTS (numerical matrices). |
max_p |
The maximum order considered with respect to the fitting of VAR models. |
criterion |
The criterion used to determine the VAR order. |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
where and
are vectors
containing the estimated VAR parameters for
and
, respectively. If VAR models of
different orders are fitted to
and
, then the shortest
vector is padded with zeros until it reaches the length of the longest vector.
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
Piccolo D (1990). “A distance measure for classifying ARIMA models.” Journal of time series analysis, 11(2), 153–164.
toy_dataset <- Libras$data[1 : 2] # Selecting the first 2 MTS from the # dataset Libras distance_matrix <- dis_var_1(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_var_1 feature_dataset <- dis_var_1(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- Libras$data[1 : 2] # Selecting the first 2 MTS from the # dataset Libras distance_matrix <- dis_var_1(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_var_1 feature_dataset <- dis_var_1(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_var_2
returns a pairwise distance matrix based on testing whether
each pair of series are or not generated from the same VARMA model
(Maharaj 1999).
dis_var_2(X, max_p = 2, criterion = "BIC")
dis_var_2(X, max_p = 2, criterion = "BIC")
X |
A list of MTS (numerical matrices). |
max_p |
The maximum order considered with respect to the fitting of VAR models. |
criterion |
The criterion used to determine the VAR order. |
Given a collection of MTS, the function returns the pairwise distance matrix,
where the distance between two MTS and
is defined
as
, where
is the
-value of the test of hypothesis proposed
by . This test is based on checking the equality of the underlying VARMA models
of both series. The VARMA structures are approximated by truncated VAR(
models with a common order
, where
and
are determined by the BIC or AIC criterion. The VAR coefficients are automatically fitted.
The dissimilarity between both series is given by
because this quantity
is expected to take larger values the more different both generating processes are.
The procedure is able to compare two dependent MTS.
The computed pairwise distance matrix.
Ángel López-Oriona, José A. Vilar
Maharaj EA (1999). “Comparison and classification of stationary multivariate time series.” Pattern Recognition, 32(7), 1129–1138.
toy_dataset <- Libras$data[c(1, 2)] # Selecting the first two MTS from the # dataset Libras distance_matrix <- dis_var_2(toy_dataset, max_p = 1) # Computing the pairwise # distance matrix based on the distance dis_var_2
toy_dataset <- Libras$data[c(1, 2)] # Selecting the first two MTS from the # dataset Libras distance_matrix <- dis_var_2(toy_dataset, max_p = 1) # Computing the pairwise # distance matrix based on the distance dis_var_2
dis_www
returns a pairwise distance matrix based on the feature
extraction procedure proposed by Wang et al. (2007).
dis_www(X, h = 20, features = FALSE)
dis_www(X, h = 20, features = FALSE)
X |
A list of MTS (numerical matrices). |
h |
Maximum lag for the computation of the Box-Pierce statistic. |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix, where the distance between two MTS is defined as the Euclidean distance between the corresponding feature vectors
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
Wang X, Wirth A, Wang L (2007). “Structure-based statistical features and multivariate time series clustering.” In Seventh IEEE international conference on data mining (ICDM 2007), 351–360. IEEE.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_www(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_www feature_dataset <- dis_www(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_www(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_www feature_dataset <- dis_www(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
dis_zagorecki
returns a pairwise distance matrix based on the feature
extraction procedure proposed by Zagorecki (2015).
dis_zagorecki(set, features = FALSE)
dis_zagorecki(set, features = FALSE)
set |
A list of MTS (numerical matrices). |
features |
Logical. If |
Given a collection of MTS, the function returns the pairwise distance matrix, where the distance between two MTS is defined as the Euclidean distance between the corresponding feature vectors
If features = FALSE
(default), returns a distance matrix based on the distance . Otherwise, the function
returns a dataset of feature vectors, i.e., each row in the dataset contains the features employed to compute the
distance
.
Ángel López-Oriona, José A. Vilar
Zagorecki A (2015). “A versatile approach to classification of multivariate time series data.” In 2015 Federated Conference on Computer Science and Information Systems (FedCSIS), 407–410. IEEE.
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_zagorecki(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_zagorecki feature_dataset <- dis_zagorecki(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
toy_dataset <- AtrialFibrillation$data[1 : 10] # Selecting the first 10 MTS from the # dataset AtrialFibrillation distance_matrix <- dis_zagorecki(toy_dataset) # Computing the pairwise # distance matrix based on the distance dis_zagorecki feature_dataset <- dis_zagorecki(toy_dataset, features = TRUE) # Computing # the corresponding dataset of features
Multivariate time series (MTS) of five species of geese.
data(DuckDuckGeese_1)
data(DuckDuckGeese_1)
A list
with two elements, which are:
data
A list with 50 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 270 rows (time points) indicating frequency and 1345 columns (variables) indicating recording.
The first 50 elements
of the whole dataset are stored here. All these elements pertain to the training set. The numeric vector classes
is formed
by integers from 1 to 5, indicating that there are 5 different classes in the database. Each class is associated with a different
species of geese. For more information, Bagnall et al. (2018).
Run "install.packages("ueadata3", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata3::DuckDuckGeese_1".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) of five species of geese.
data(DuckDuckGeese_2)
data(DuckDuckGeese_2)
A list
with two elements, which are:
data
A list with 50 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 270 rows (time points) indicating frequency and 1345 columns (variables) indicating recording.
The last 50 elements
of the whole dataset are stored here. All these elements pertain to the test set. The numeric vector classes
is formed
by integers from 1 to 5, indicating that there are 5 different classes in the database. Each class is associated with a different
species of geese. For more information, Bagnall et al. (2018).
Run "install.packages("ueadata4", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata4::DuckDuckGeese_2".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating the movement of the worm Caenorhabditis elegans. The motion of worms in an agar plate is recorded as a combination of six base shapes.
data(EigenWorms_1)
data(EigenWorms_1)
A list
with two elements, which are:
data
A list with 130 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 17984 rows (time points) indicating velocity trajectory and 3 columns (variables) indicating spatial dimension. The first 1422 elements
correspond to the training set, whereas the last 1436 elements correspond to the test set.
The first 130 elements
of the whole dataset are stored here. All these elements but the last two pertain to the training set. The numeric vector classes
is formed
by integers from 1 to 20, indicating that there are 20 different classes in the database. Each class is associated with a different
alphabetical character. For more information, see Bagnall et al. (2018).
To access this dataset, run "install.packages("ueadata5", repos="https://anloor7.github.io/drat")"
and use the syntax "ueadata5::EigenWorms_1".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating the movement of the worm Caenorhabditis elegans. The motion of worms in an agar plate is recorded as a combination of six base shapes.
data(EigenWorms_2)
data(EigenWorms_2)
A list
with two elements, which are:
data
A list with 129 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 17984 rows (time points) indicating velocity trajectory and 3 columns (variables) indicating spatial dimension. The first 1422 elements
correspond to the training set, whereas the last 1436 elements correspond to the test set.
The last 129 elements
of the whole dataset are stored here. All these elements pertain to the test set. The numeric vector classes
is formed
by integers from 1 to 20, indicating that there are 20 different classes in the database. Each class is associated with a different
alphabetical character. For more information, see Bagnall et al. (2018).
To access this dataset, run "install.packages("ueadata6", repos="https://anloor7.github.io/drat")"
and use the syntax "ueadata6::EigenWorms_2".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) of some participants simulating several activities. In particular, data was collected from 6 participants using a tri-axial accelerometer on the dominant wrist while conducting 4 different activities
data(Epilepsy)
data(Epilepsy)
A list
with two elements, which are:
data
A list with 275 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 206 rows (time points) indicating acceleration trajectory and 3 columns (variables) indicating the axis in the accelerometer. The first 137 elements
correspond to the training set, whereas the last 138 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 4, indicating that there are 4 different classes in the database. Each class is associated with a different
activity. For more information, see Bagnall et al. (2018).
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating the movement of the worm Caenorhabditis elegans. The motion of worms in an agar plate is recorded as a combination of six base shapes.
data(ERing)
data(ERing)
A list
with two elements, which are:
data
A list with 300 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 65 rows (time points) indicating time measurements and 4 columns (variables) indicating electrodes. The first 30 elements
correspond to the training set, whereas the last 270 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 6, indicating that there are 6 different classes in the database. Each class is associated with a different
posture of the hand. For more information, see Bagnall et al. (2018).
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating the concentration of ethanol of several water-and-ethanol solutions in 44 distinct, real-whisky bottles.
data(EthanolConcentration)
data(EthanolConcentration)
A list
with two elements, which are:
data
A list with 524 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 1751 rows (time points) indicating time measurements and 3 columns (variables) indicating recording. The first 261 elements
correspond to the training set, whereas the last 263 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 4, indicating that there are 4 different classes in the database. Each class is associated with a different
concentration of ethanol. For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata1", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata1::EthanolConcentration".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
f4_classifier
computes the F4 classifier for MTS proposed
by Lopez-Oriona and Vilar (2021).
f4_classifier( training_data, new_data = NULL, classes, levels = c(0.1, 0.5, 0.9), cv_folds = 5, var_rate = 0.9 )
f4_classifier( training_data, new_data = NULL, classes, levels = c(0.1, 0.5, 0.9), cv_folds = 5, var_rate = 0.9 )
training_data |
A list of MTS constituting the training set to fit classifier F4. |
new_data |
A list of MTS for which the class labels have to be predicted. |
classes |
A vector containing the class labels associated with the
elements in |
levels |
The set of probability levels to compute the QCD-estimates. |
cv_folds |
The number of folds concerning the cross-validation
procedure used to fit F4 with respect to |
var_rate |
Rate of desired variability to select the principal components associated with the QCD-based features. |
This function constructs the classifier F4 of . Given a set of MTS with associated class labels, estimates of the quantile cross-spectral density (QCD) and the maximum overlap discrete wavelet transform (MODWT) are first computed for each series. Then Principal Components Analysis (PCA) is applied over the dataset of QCD-based features and a given number of principal components are retained according to a criterion of explained variability. Next, each series is decribed by means of the concatenation of the QCD-based transformed features and the MODWT-based features. Finally, a traditional random forest classifier is executed in the resulting dataset.
If new_data = NULL
(default), returns a fitted model of class
train
(see train
). Otherwise, the function
returns the predicted class labels for the elements in new_data
.
Ángel López-Oriona, José A. Vilar
Lopez-Oriona A, Vilar JA (2021). “F4: An All-Purpose Tool for Multivariate Time Series Classification.” Mathematics, 9(23), 3051.
predictions <- f4_classifier(training_data = Libras$data[1 : 20], new_data = Libras$data[181 : 200], classes = Libras$classes[181 : 200]) # Computing the predictions for the test set of dataset Libras
predictions <- f4_classifier(training_data = Libras$data[1 : 20], new_data = Libras$data[181 : 200], classes = Libras$classes[181 : 200]) # Computing the predictions for the test set of dataset Libras
Dataset containing 50 financial MTS associated with companies in the S&P 500 index.
data(FinancialData)
data(FinancialData)
A list
with two elements, which are:
data
A list with 50 MTS.
classes
A character vector indicating the abbreviations associated with the
series (companies) in data
.
Each element in data
is a matrix formed by 654 rows (series length)
and 2 columns (dimensions). Each MTS represents a company in the top 50 of the
S&P 500 index according to market capitalization. One dimension measures the
daily returns of the company, whereas the other measures the daily change in
trading volume. The sample period spans from 6th July 2015 to 7th February
2018.
Multivariate time series (MTS) indicating the finger movements of a subject while typing at a computer keyboard.
data(FingerMovements)
data(FingerMovements)
A list
with two elements, which are:
data
A list with 416 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 50 rows (time points) indicating EEG observations and 28 columns (variables) indicating EEG channel. The first 316 elements
correspond to the training set, whereas the last 100 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 2, indicating that there are 2 different classes in the database. Each class is associated with a different
side (left and right). For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata1", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata1::FingerMovements".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating the movement of a joystick by two subjects with their hand and wrist.
data(HandMovementDirection)
data(HandMovementDirection)
A list
with two elements, which are:
data
A list with 234 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 400 rows (time points) indicating MEG observations and 10 columns (variables) indicating MEG channel. The first 160 elements
correspond to the training set, whereas the last 74 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 4, indicating that there are 4 different classes in the database. Each class is associated with a different
direction (right, up, down and left). For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata1", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata1::HandMovementDirection".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating writing from a subject wearing a smartwatch.
data(Handwriting)
data(Handwriting)
A list
with two elements, which are:
data
A list with 1000 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 152 rows (time points) indicating acceleration trajectory and 3 columns (variables) indicating accelerometer value. The first 150 elements
correspond to the training set, whereas the last 850 elements correspond to the test set. The numeric vector classes
is formed
by integers from 1 to 26, indicating that there are 26 different classes in the database. Each class is associated with a different
alphabetical character. For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata1", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata1::Handwriting".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating heart sound from healthy patients and pathological patients (with a confirmed cardiac diagnosis).
data(Heartbeat)
data(Heartbeat)
A list
with two elements, which are:
data
A list with 409 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 405 rows (time points) indicating readings in a spectrogram and 61 columns
(variables) indicating frequency band from the spectrogram. The first 204 elements correspond to the training set, whereas the last 205 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 2, indicating that there are 2
different classes in the database. Each class is associated with a different alphabetical character.
For more information, see Bagnall et al. (2018).
To access this dataset, run "install.packages("ueadata1", repos="https://anloor7.github.io/drat")"
and use the syntax "ueadata1::Heartbeat".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating voice recordings of nine Japanese male speakers saying the vowels 'a' and 'e'.
data(JapaneseVowels)
data(JapaneseVowels)
A list
with two elements, which are:
data
A list with 640 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 29 rows (time points) indicating time recordings and 12 columns
(variables) indicating modified raw recordings. The first 270 elements correspond to the training set, whereas the last 370 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 9, indicating that there are 9
different classes in the database. Each class is associated with a different speaker.
For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata1", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata1::JapaneseVowels".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
knn_classifier
returns the predictions for a test set concerning a
nearest neighbours-based classifier.
knn_classifier(dataset, classes, index_test, distance, k, ...)
knn_classifier(dataset, classes, index_test, distance, k, ...)
dataset |
A list of MTS (numerical matrices). |
classes |
A vector containing the class labels associated with the
elements in |
index_test |
The indexes associated with the test elements in |
distance |
The corresponding distance measure to compute the nearest neighbours-based classifier (must be one the functions implemented in mlmts, as a string). |
k |
The number of neighbours. |
... |
Additional parameters for the function with respect to the considered distance. |
Given a collection of MTS containing the training and test set, the function constructs a nearest neighbours-based classifier based on a given dissimilarity measure. The corresponding predictions for the elements in the test set are returned.
The class labels for the elements in the test set.
Ángel López-Oriona, José A. Vilar
predictions_1_nn <- knn_classifier(BasicMotions$data[1 : 10], BasicMotions$classes[1 : 10], index_test = 6 : 10, distance = 'dis_modwt', k = 1) # Computing the # predictions for the test elements in dataset BasicMotions according to # a 1-nearest neighbour classifier based on dis_modtw. predictions_1_nn
predictions_1_nn <- knn_classifier(BasicMotions$data[1 : 10], BasicMotions$classes[1 : 10], index_test = 6 : 10, distance = 'dis_modwt', k = 1) # Computing the # predictions for the test elements in dataset BasicMotions according to # a 1-nearest neighbour classifier based on dis_modtw. predictions_1_nn
Multivariate time series (MTS) indicating hand movement concerning the official brazilian sign language from 4 different people, during 2 sessions.
data(Libras)
data(Libras)
A list
with two elements, which are:
data
A list with 360 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 45 rows (time points) indicating time points in video recordings and 2 columns
(variables) indicating video sessions. The first 180 elements correspond to the training set, whereas the last 180 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 15, indicating that there are 15
different classes in the database. Each class is associated with a hand movement type.
For more information, see Bagnall et al. (2018).
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) of simulated light curves imitating astronomical time series from the Large Synoptic Survey Telescope (LSST). The simulated series are measurements of an object's brightness as a function of time
data(LSST)
data(LSST)
A list
with two elements, which are:
data
A list with 4925 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 36 rows (time points) indicating time recordings and 6 columns
(variables) indicating different astronomical filters. The first 2459 elements correspond to the training set, whereas the last 2466 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 14, indicating that there are 14
different classes in the database. Each class is associated with a different astronomical object.
For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata1", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata1::LSST".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
mc2pca_clustering
performs the clustering algorithm proposed by
Li (2019), which is based on common principal component analysis (CPCA).
mc2pca_clustering(X, k, var_rate = 0.9, max_it = 1000, tol = 1e-05)
mc2pca_clustering(X, k, var_rate = 0.9, max_it = 1000, tol = 1e-05)
X |
A list of MTS (numerical matrices). |
k |
The number of clusters. |
var_rate |
Rate of retained variability concerning the reconstructed MTS samples (default is 0.90). |
max_it |
The maximum number of iterations (default is 1000). |
tol |
The tolerance (default is 1e-5). |
This function executes the crisp clustering method proposed by
. The algorithm is a -means-type procedure where the distance
between a given MTS and a centroid is given by the reconstruction error
taking place when the series is reconstructed from the common space obtained
by considering all the series in the cluster associated with the corresponding
centroid (the common space is the centroid).
A list with two elements:
cluster
. A vector defining the clustering solution.
iterations
. The number of iterations before the algorithm
stopped.
Ángel López-Oriona, José A. Vilar
Li H (2019). “Multivariate time series clustering based on common principal component analysis.” Neurocomputing, 349, 239–247.
clustering_algorithm <- mc2pca_clustering(BasicMotions$data, k = 4, var_rate = 0.30) # Executing the clustering algorithm in the dataset BasicMotions (var_rate = 0.30, # i.e., we keep only a few principal components for computing the reconstructed series) clustering_algorithm$cluster # The clustering solution clustering_algorithm$iterations # The number of iterations before the algorithm library(ClusterR) external_validation(clustering_algorithm$cluster, BasicMotions$classes, summary_stats = TRUE) # Evaluating the clustering algorithms vs the true partition # stopped
clustering_algorithm <- mc2pca_clustering(BasicMotions$data, k = 4, var_rate = 0.30) # Executing the clustering algorithm in the dataset BasicMotions (var_rate = 0.30, # i.e., we keep only a few principal components for computing the reconstructed series) clustering_algorithm$cluster # The clustering solution clustering_algorithm$iterations # The number of iterations before the algorithm library(ClusterR) external_validation(clustering_algorithm$cluster, BasicMotions$classes, summary_stats = TRUE) # Evaluating the clustering algorithms vs the true partition # stopped
mlmts provides an implementation of several machine learning algorithms for multivariate time series. The package includes functions allowing the execution of clustering, classification or outlier detection methods, among others. It also incorporates a collection of multivariate time series datasets which can be used to analyse the performance of new proposed algorithms. Practitioners from a broad variety of fields could benefit from the general framework provided by mlmts.
Multivariate time series (MTS) involving imagined movements performed by a subject with either the left small finger or the tongue. The time series of the electrical brain activity were stored during the corresponding trials
data(MotorImagery)
data(MotorImagery)
A list
with two elements, which are:
data
A list with 378 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 3000 rows (time points) indicating time recordings in EEG and 64 columns
(variables) indicating EEG electrodes. The first 278 elements correspond to the training set, whereas the last 100 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 2, indicating that there are 2
different classes in the database. Each class is associated with the label 'finger' or 'tongue' (the imagined movements).
For more information, see Bagnall et al. (2018).
To access this dataset, execute the code "install.packages("ueadata2", repos="https://anloor7.github.io/drat")"
and use the following syntax: "ueadata2::MotorImagery".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
mts_forecasting
computes a general forecasting method for MTS based
on fitting standard regression models to lag-embedding matrices.
mts_forecasting(X, max_lag = 1, model_caret = "lm", h = 1)
mts_forecasting(X, max_lag = 1, model_caret = "lm", h = 1)
X |
A list of MTS (numerical matrices). |
max_lag |
The maximum lag considered to construct the lag-embedding matrices. |
model_caret |
The corresponding regression model. |
h |
The prediction horizon. |
This function performs a forecasting procedure based on lag-embedding
matrices. Given a list of MTS, it returns the corresponding list of -step ahead
forecasts. We assume we want to forecast a given MTS
with certain univariate components
for a given forecasting horizon
and a maximum number of lags
.
For each component, the corresponding lag-embedded matrix is constructed
by considering the past information about that component and all the remaining
ones. The selected regression model is fitted to all the constructed matrices
(considering the last column as the response variables), and the fitted models
are used to construct the
-step ahead forecasts in a recursive manner.
A list containing the -step ahead forecast (matrix) for each
one of the MTS.
Ángel López-Oriona, José A. Vilar
predictions <- mts_forecasting(RacketSports$data[1], model_caret = 'lm', h = 1) # Obtaining the predictions for the first series in dataset RacketSports # by using standard linear regression and a forecasting horizon of 1 predictions <- mts_forecasting(RacketSports$data[1], model_caret = 'rf', h = 3) # Obtaining the predictions for the first series in dataset RacketSports # by using the random forest and a forecasting horizon of 3
predictions <- mts_forecasting(RacketSports$data[1], model_caret = 'lm', h = 1) # Obtaining the predictions for the first series in dataset RacketSports # by using standard linear regression and a forecasting horizon of 1 predictions <- mts_forecasting(RacketSports$data[1], model_caret = 'rf', h = 3) # Obtaining the predictions for the first series in dataset RacketSports # by using the random forest and a forecasting horizon of 3
mts_plot
constructs a plot of a MTS. Each univariate series comprising
the MTS object is displayed in a different colour.
mts_plot(series, title = "")
mts_plot(series, title = "")
series |
A MTS (numerical matrix). |
title |
Title for the plot (string). Default corresponds to no title. |
Given a MTS, the function constructs the corresponding plot, in which a different colour is used for each univariate series comprising the MTS object. Therefore, the MTS is represented as a collection of univariate series in a single graph.
The corresponding plot.
Ángel López-Oriona, José A. Vilar
mts_plot(BasicMotions$data[[1]]) # Represents the first MTS in dataset # BasicMotions
mts_plot(BasicMotions$data[[1]]) # Represents the first MTS in dataset # BasicMotions
Multivariate time series (MTS) related to several Naval Air Training and Operating Procedures Standardization-type motions used to control plane movements.
data(NATOPS)
data(NATOPS)
A list
with two elements, which are:
data
A list with 360 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 51 rows (time points) indicating time recordings and 24 columns
(variables) indicating sensors placed in a particular part of the body and associated with a particular coordinate.
The first 180 elements correspond to the training set, whereas the last 180 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 6, indicating that there are 6
different classes in the database. Each class is associated with a separate action performed by the subjects.
For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata2", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata2::NATOPS".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
outlier_detection
computes the outlier detection method for MTS proposed
by Lopez-Oriona and Vilar (2021).
outlier_detection(X, levels = c(0.1, 0.5, 0.9), alpha = NULL)
outlier_detection(X, levels = c(0.1, 0.5, 0.9), alpha = NULL)
X |
A list of MTS (numerical matrices). |
levels |
The set of probability levels to compute the QCD-estimates. |
alpha |
The desired rate of outliers to detect (a real number between 0 and 1). |
This function performs outlier detection according to the procedure proposed by Lopez-Oriona and Vilar (2021). Specifically, each MTS in the original set is described by means of a multivariate functional datum by using an estimate of its quantile cross- spectral density. Given the corresponding set of multivariate functional data, the functional depth of each object is computed. Based on depth computations, the outlying elements are the objects with low values for the depths.
A list with two elements:
Depths
. The functional depths associated with elements in X
, sorted
in increasing order.
Indexes
. The corresponding indexes associated with the
vector Depths
.
Ángel López-Oriona, José A. Vilar
Lopez-Oriona A, Vilar JA (2021). “Outlier detection for multivariate time series: A functional data approach.” Knowledge-Based Systems, 233, 107527.
outliers <- outlier_detection(SyntheticData2$data[c(1 : 3, 65)]) outliers$Indexes[1] # The first outlying MTS in dataset SyntheticData2 outliers$Depths[1] # The corresponding value for the depths
outliers <- outlier_detection(SyntheticData2$data[c(1 : 3, 65)]) outliers$Indexes[1] # The first outlying MTS in dataset SyntheticData2 outliers$Depths[1] # The corresponding value for the depths
Multivariate time series (MTS) indicating occupancy rate of different car lanes.
data(PEMS_SF_1)
data(PEMS_SF_1)
A list
with two elements, which are:
data
A list with 220 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 144 rows (time points) indicating minutes and 3 columns (variables) indicating sensors.
The first 220 elements
of the whole dataset are stored here. All these elements pertain to the training set. The numeric vector classes
is formed
by integers from 1 to 7, indicating that there are 7 different classes in the database. Each class is associated with a different
day of the week. For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata7", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata7::PEMS_SF_1".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating occupancy rate of different car lanes.
data(PEMS_SF_2)
data(PEMS_SF_2)
A list
with two elements, which are:
data
A list with 220 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 144 rows (time points) indicating minutes and 3 columns (variables) indicating sensors.
The last 220 elements
of the whole dataset are stored here. The last 173 elements of this dataset pertain to the test set. The numeric vector classes
is formed
by integers from 1 to 7, indicating that there are 7 different classes in the database. Each class is associated with a different
day of the week. For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata8", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata8::PEMS_SF_2".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) indicating writing of 44 people drawing the digits from 0 to 9. Each instance is made up of the x and y coordinates of the pen-tip traced accross a digital screen.
data(PenDigits)
data(PenDigits)
A list
with two elements, which are:
data
A list with 10992 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 8 rows (time points) spatial points and 2 columns
(variables) indicating coordinate. The first 7494 elements correspond to the training set, whereas the last 3498 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 10, indicating that there are 10
different classes in the database. Each class is associated with a different digit.
For more information, see Bagnall et al. (2018).
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) involving segmented audios of male and female speakers collected from Google Translate.
data(Phoneme)
data(Phoneme)
A list
with two elements, which are:
data
A list with 6668 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 217 rows (time points) indicating readings in a spectrogram and 11 columns
(variables) indicating frequency band from the spectrogram. The first 3315 elements correspond to the training set, whereas the last 3353 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 39, indicating that there are 39
different classes in the database. Each class is associated with a different phoneme.
For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata2", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata2::Phoneme".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
plot_2d_scaling
represents a 2-dimensional scaling plane starting from
a dissimilarity matrix.
plot_2d_scaling(distance_matrix, cluster_labels = NULL, title = "")
plot_2d_scaling(distance_matrix, cluster_labels = NULL, title = "")
distance_matrix |
A distance matrix. |
cluster_labels |
The labels associated with the elements involving the
entries in |
title |
The title of the graph (default is no title). |
Given a distance matrix, the function constructs the corresponding 2-dimensional
scaling, which is a 2d plane in which the distances between the points represent
the original distances as correctly as possible. If the vector cluster_labels
is provided to the function, points in the 2d plane are coloured according to the
given class labels.
The 2-dimensional scaling plane.
Ángel López-Oriona, José A. Vilar
distance_matrix_qcd <- dis_qcd(SyntheticData1$data[1 : 30]) # Computing the pairwise # distance matrix for the first 30 elements in dataset SyntheticData1 based on dis_qcd plot_2d_scaling(distance_matrix_qcd, cluster_labels = SyntheticData1$classes[1 : 30]) # Constructing the corresponding 2d-scaling plot. Each class is represented # in a different colour
distance_matrix_qcd <- dis_qcd(SyntheticData1$data[1 : 30]) # Computing the pairwise # distance matrix for the first 30 elements in dataset SyntheticData1 based on dis_qcd plot_2d_scaling(distance_matrix_qcd, cluster_labels = SyntheticData1$classes[1 : 30]) # Constructing the corresponding 2d-scaling plot. Each class is represented # in a different colour
Multivariate time series (MTS) collected from university students playing badminton or squash while wearing a smartwatch. The watch recorded the x, y, z coordinates for both a gyroscope and an accelerometer to an android phone.
data(RacketSports)
data(RacketSports)
A list
with two elements, which are:
data
A list with 303 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 30 rows (time points) indicating time recordings over an interval of 3 seconds
and 6 columns (variables) indicating gyroscope or accelerometer and the corresponding coordinate. The first 151 elements correspond to the
training set, whereas the last 152 elements correspond to the test set. The numeric vector classes
is formed by integers from 1 to 4,
indicating that there are 4 different classes in the database. Each class is associated with a sport and stroke a particular player is making.
For more information, see Bagnall et al. (2018).
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) taken from a healthy subject asked to move a cursor up and down on a computer screen while his cortical potentials were taken.
data(SelfRegulationSCP1)
data(SelfRegulationSCP1)
A list
with two elements, which are:
data
A list with 561 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 896 rows (time points) indicating time recordings over an interval of 3.5 seconds
and 6 columns (variables) indicating EEG channel. The first 268 elements correspond to the training set, whereas the last 293 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 2, indicating that there are 2
different classes in the database. Each class is associated with the label 'negativity' (downward movement of the cursor) or 'positivity'
(upward movement of the cursor). For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata2", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata2::SelfRegulationSCP1".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) taken from an Amyotrophyc Lateral Sclerosis (ALS) subject asked to move a cursor up and down on a computer screen while his cortical potentials were taken.
data(SelfRegulationSCP1)
data(SelfRegulationSCP1)
A list
with two elements, which are:
data
A list with 380 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 1152 rows (time points) indicating time recordings over an interval of 4.5 seconds
and 7 columns (variables) indicating EEG channel. The first 200 elements correspond to the training set, whereas the last 180 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 2, indicating that there are 2
different classes in the database. Each class is associated with the label 'negativity' (downward movement of the cursor) or 'positivity'
(upward movement of the cursor). For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata2", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata2::SelfRegulationSCP2".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) involving sound of 44 males and 44 females Arabic native speakers between the ages of 18 and 40. The 13 Mel Frequency Cepstral Coefficients (MFCCs) were computed.
data(SpokenArabicDigits)
data(SpokenArabicDigits)
A list
with two elements, which are:
data
A list with 8798 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 93 rows (time points) indicating time recordings and 13 columns
(variables) indicating different MFCCs. The first 6599 elements correspond to the training set, whereas the last 2199 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 10, indicating that there are 10
different classes in the database. Each class is associated with a different spoken arabic digit.
For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata2", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata2::SpokenArabicDigits".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Multivariate time series (MTS) involving short duration ECG signals recorded from a healthy 25-year-old male performing different physical activities
data(StandWalkJump)
data(StandWalkJump)
A list
with two elements, which are:
data
A list with 27 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 2500 rows (time points) indicating readings in a spectrogram and 4 columns
(variables) indicating frequency band from the spectrogram. The first 12 elements correspond to the training set, whereas the last 15 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 3, indicating that there are 3
different classes in the database. Each class is associated with the label 'standing', 'walking' or 'jumping'.
For more information, see Bagnall et al. (2018).
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
Synthetic dataset containing 60 MTS generated from four different generating processes.
data(SyntheticData1)
data(SyntheticData1)
A list
with two elements, which are:
data
A list with 60 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 400 rows (series length)
and 2 columns (dimensions). Series 1-15 were generated from a VAR(1) process
and series 16-30 were generated from a VMA(1) process. Series 31-45 were
generated from a QVAR(1) process and series 46-60 were generated from a different
QVAR(1) process. Therefore, there are 4 different classes in the dataset.
Synthetic dataset containing 65 MTS generated from five different generating processes.
data(SyntheticData1)
data(SyntheticData1)
A list
with two elements, which are:
data
A list with 65 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 400 rows (series length)
and 2 columns (dimensions). Series 1-15 were generated from a VAR(1) process
and series 16-30 were generated from a VMA(1) process. Series 31-45 were
generated from a QVAR(1) process and series 46-60 were generated from a different
QVAR(1) process. Finally, series 61-65 were generated from a VAR(1) model
different from the one associated with series 1-15. Note that series
61-65 can be seen as anomalous elements in the dataset.
Multivariate time series (MTS) including gestures from certain subjects measured with an accelerometer.
data(UWaveGestureLibrary)
data(UWaveGestureLibrary)
A list
with two elements, which are:
data
A list with 440 MTS.
classes
A numeric vector indicating the corresponding classes associated with the elements in data
.
Each element in data
is a matrix formed by 315 rows (time points) indicating time recordings and 3 columns
(variables) indicating coordinate (x, y or z) of each motion. The first 120 elements correspond to the training set, whereas the last 320 elements
correspond to the test set. The numeric vector classes
is formed by integers from 1 to 8, indicating that there are 8
different classes in the database. Each class is associated with a different gesture.
For more information, see Bagnall et al. (2018).
Run "install.packages("ueadata2", repos="https://anloor7.github.io/drat")"
to access this dataset and use the syntax "ueadata2::UWaveGestureLibrary".
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018). “The UEA multivariate time series classification archive, 2018.” arXiv preprint arXiv:1811.00075.
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021). “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances.” Data Mining and Knowledge Discovery, 35(2), 401–449.
Bagnall A, Lines J, Vickers W, Keogh E (2022). “The UEA & UCR Time Series Classification Repository.” www.timeseriesclassification.com.
vpca_clustering
performs the fuzzy clustering algorithm proposed
by He and Tan (2018).
vpca_clustering( X, k, m, var_rate = 0.9, max_it = 1000, tol = 1e-05, crisp = FALSE )
vpca_clustering( X, k, m, var_rate = 0.9, max_it = 1000, tol = 1e-05, crisp = FALSE )
X |
A list of MTS (numerical matrices). |
k |
The number of clusters. |
m |
The fuzziness coefficient (a real number greater than one). |
var_rate |
Rate of retained variability concerning the dimensionality-reduced MTS samples (default is 0.90). |
max_it |
The maximum number of iterations (default is 1000). |
tol |
The tolerance (default is 1e-5). |
crisp |
Logical. If |
This function executes the fuzzy clustering procedure proposed by
. The algorithm represents each MTS in the original collection by means of
a dimensionality-reduced MTS constructed through variable-based principal
component analysis (VPCA). Then, fuzzy -means-type procedure is considered
for the set of dimensionalityu-reduced samples. A spatial weighted matrix
dissimilarity is considered to compute the distances between the reduced
MTS and the centroids.
A list with three elements:
U
. If crisp = FALSE
(default), the membership matrix. Otherwise,
a vector defining the corresponding crisp partition.
centroids
. If crisp = FALSE
(default), a list containing the
series playing the role of centroids, which are dimensionality-reduced averaged MTS. Otherwise, this
element is not returned.
iterations
. The number of iterations before the algorithm
stopped.
Ángel López-Oriona, José A. Vilar
He H, Tan Y (2018). “Unsupervised classification of multivariate time series using VPCA and fuzzy clustering with spatial weighted matrix distance.” IEEE transactions on cybernetics, 50(3), 1096–1105.
fuzzy_clustering <- vpca_clustering(AtrialFibrillation$data, k = 3, m = 1.5) # Executing the fuzzy clustering algorithm in the dataset AtrialFibrillation # by considering 3 clusters and a value of 1.5 for the fuziness parameter fuzzy_clustering$U # The membership matrix crisp_clustering <- vpca_clustering(AtrialFibrillation$data, k = 3, m = 1.5, crisp = TRUE) # The same as before, but we are interested in the corresponding crisp partition crisp_clustering$U # The crisp partition crisp_clustering$iterations # The number of iterations before the algorithm # stopped
fuzzy_clustering <- vpca_clustering(AtrialFibrillation$data, k = 3, m = 1.5) # Executing the fuzzy clustering algorithm in the dataset AtrialFibrillation # by considering 3 clusters and a value of 1.5 for the fuziness parameter fuzzy_clustering$U # The membership matrix crisp_clustering <- vpca_clustering(AtrialFibrillation$data, k = 3, m = 1.5, crisp = TRUE) # The same as before, but we are interested in the corresponding crisp partition crisp_clustering$U # The crisp partition crisp_clustering$iterations # The number of iterations before the algorithm # stopped