Title: | Analyzing Categorical Time Series |
---|---|
Description: | An implementation of several functions for feature extraction in categorical time series datasets. Specifically, some features related to marginal distributions and serial dependence patterns can be computed. These features can be used to feed clustering and classification algorithms for categorical time series, among others. The package also includes some interesting datasets containing biological sequences. Practitioners from a broad variety of fields could benefit from the general framework provided by 'ctsfeatures'. |
Authors: | Angel Lopez-Oriona [aut, cre], Jose A. Vilar [aut] |
Maintainer: | Angel Lopez-Oriona <[email protected]> |
License: | GPL-2 |
Version: | 1.2.2 |
Built: | 2025-02-23 03:01:36 UTC |
Source: | https://github.com/cran/ctsfeatures |
binarization
constructs the binarized time series associated with a given
categorical time series.
binarization(series)
binarization(series)
series |
An object of type |
Given a CTS of length with range
,
, the function
constructs the binarized time series, which is defined as
,
with
such that
if
(
). The binarized series is constructed in the form of a matrix
whose rows represent time observations and whose columns represent the
categories in the original series
The binarized time series.
Ángel López-Oriona, José A. Vilar
López-Oriona Á, Vilar JA, D’Urso P (2023). “Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences.” Information Sciences, 624, 467–492.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] binarized_series <- binarization(sequence_1) # Constructing the binarized # time series for the first CTS in dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] binarized_series <- binarization(sequence_1) # Constructing the binarized # time series for the first CTS in dataset GeneticSequences
calculate_features
computes several features associated with a
categorical time series or between a categorical and a real-valued time series
calculate_features(series, n_series = NULL, lag = 1, type = NULL)
calculate_features(series, n_series = NULL, lag = 1, type = NULL)
series |
An object of type |
n_series |
A real-valued time series. |
lag |
The considered lag (default is 1). |
type |
String indicating the feature one wishes to compute. |
Assume we have a CTS of length with range
,
, with
being the natural estimate of the marginal probability of the
th
category, and
being the natural estimate of the joint probability
for categories
and
at lag l,
. Assume also that
we have a real-valued time series of length
,
.
The function computes the following quantities depending on the argument
type
:
If type=gini_index
, the function computes the
estimated gini index, .
If type=entropy
, the function computes the
estimated entropy, .
If type=chebycheff_dispersion
, the function computes the
estimated chebycheff dispersion, .
If type=gk_tau
, the function computes the
estimated Goodman and Kruskal's tau, .
If type=gk_lambda
, the function computes the
estimated Goodman and Kruskal's lambda, .
If type=uncertainty_coefficient
, the function computes the
estimated uncertainty coefficient, .
If type=pearson_measure
, the function computes the
estimated Pearson measure, .
If type=phi2_measure
, the function computes the
estimated Phi2 measure, .
If type=sakoda_measure
, the function computes the
estimated Sakoda measure, .
If type=cramers_vi
, the function computes the
estimated Cramer's vi, .
If type=cohens_kappa
, the function computes the
estimated Cohen's kappa, .
If type=total_correlation
, the function computes the
the estimated sum ,
where
is the estimated correlation
,
, being
,
with
, the
binarized time series of
.
If type=spectral_envelope
, the function computes the
estimated spectral envelope.
If type=total_mixed_correlation_1
, the function computes the
estimated total mixed l-correlation given by
where
, being
,
with
, the
binarized time series of
.
If type=total_mixed_correlation_2
, the function computes the
estimated total mixed q-correlation given by
where
, being
,
with
, the
binarized time series of
,
a probability
level,
the indicator function and
the quantile
function of the corresponding real-valued process.
The corresponding feature.
Ángel López-Oriona, José A. Vilar
Weiß CH, Göb R (2008). “Measuring serial dependence in categorical time series.” AStA Advances in Statistical Analysis, 92, 71–89.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] uc <- calculate_features(series = sequence_1, type = 'uncertainty_coefficient' ) # Computing the uncertainty coefficient # for the first series in dataset GeneticSequences se <- calculate_features(series = sequence_1, type = 'spectral_envelope' ) # Computing the spectral envelope # for the first series in dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] uc <- calculate_features(series = sequence_1, type = 'uncertainty_coefficient' ) # Computing the uncertainty coefficient # for the first series in dataset GeneticSequences se <- calculate_features(series = sequence_1, type = 'spectral_envelope' ) # Computing the spectral envelope # for the first series in dataset GeneticSequences
calculate_motifs
computes the motifs of a categorical time series
calculate_motifs(series, motif_length)
calculate_motifs(series, motif_length)
series |
An object of type |
motif_length |
The length of the motif. |
Given a CTS of length with range
,
, and a motif length
,
the function returns an array of
elements, with the element
in the position
being the relative frequency
of the motif “
” in the corresponding time series.
Returns an array with the relative frequency of motifs in a categorical time series.
Ángel López-Oriona, José A. Vilar
Lonardi JLEKS, Patel P (2002). “Finding motifs in time series.” In Proc. of the 2nd Workshop on Temporal Data Mining, 53–68.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] calculate_motifs(sequence_1, motif_length = 3) # Computing the relative frequencies of motifs of length 3 for the first # series in dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] calculate_motifs(sequence_1, motif_length = 3) # Computing the relative frequencies of motifs of length 3 for the first # series in dataset GeneticSequences
calculate_features
computes several subfeatures associated with a
categorical time series or between a categorical and a real-valued time series
calculate_subfeatures(series, n_series, lag = 1, type = NULL)
calculate_subfeatures(series, n_series, lag = 1, type = NULL)
series |
An object of type |
n_series |
A real-valued time series. |
lag |
The considered lag (default is 1). |
type |
String indicating the subfeature one wishes to compute. |
Assume we have a CTS of length with range
,
, with
being the natural estimate of the marginal probability of the
th
category, and
being the natural estimate of the joint probability
for categories
and
at lag l,
. Assume also that
we have a real-valued time series of length
,
.
The function computes the following subfeatures depending on the argument
type
:
If type=entropy
, the function computes the
subfeatures associated with the estimated entropy, ,
.
If type=gk_tau
, the function computes the
subfeatures associated with the estimated Goodman and Kruskal's tau, ,
.
If type=gk_lambda
, the function computes the
subfeatures associated with the estimated Goodman and Kruskal's lambda, ,
.
If type=uncertainty_coefficient
, the function computes the
subfeatures associated with the estimated uncertainty coefficient, ,
.
If type=pearson_measure
, the function computes the
subfeatures associated with the estimated Pearson measure, ,
.
If type=phi2_measure
, the function computes the
subfeatures associated with the estimated Phi2 measure, ,
.
If type=sakoda_measure
, the function computes the
subfeatures associated with the estimated Sakoda measure, ,
.
If type=cramers_vi
, the function computes the
subfeatures associated with the estimated Cramer's vi, ,
.
If type=cohens_kappa
, the function computes the
subfeatures associated with the estimated Cohen's kappa, ,
.
If type=total_correlation
, the function computes the
subfeatures associated with the total correlation, ,
(see
type='total_mixed_cor'
in the function calculate_features
).
If type=total_mixed_correlation_1
, the function computes the
subfeatures associated with the total mixed l-correlation, ,
(see
type='total_mixed_correlation_1'
in the function calculate_features
).
If type=total_mixed_correlation_2
, the function computes the
subfeatures associated with the total mixed q-correlation, ,
(see
type='total_mixed_correlation_2'
in the function calculate_features
).
The corresponding subfeature
Ángel López-Oriona, José A. Vilar
Weiß CH, Göb R (2008). “Measuring serial dependence in categorical time series.” AStA Advances in Statistical Analysis, 92, 71–89.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] suc <- calculate_subfeatures(series = sequence_1, type = 'uncertainty_coefficient') # Computing the subfeatures associated with the uncertainty coefficient # for the first series in dataset GeneticSequences scv <- calculate_subfeatures(series = sequence_1, type = 'cramers_vi' ) # Computing the subfeatures associated with the cramers vi # for the first series in dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] suc <- calculate_subfeatures(series = sequence_1, type = 'uncertainty_coefficient') # Computing the subfeatures associated with the uncertainty coefficient # for the first series in dataset GeneticSequences scv <- calculate_subfeatures(series = sequence_1, type = 'cramers_vi' ) # Computing the subfeatures associated with the cramers vi # for the first series in dataset GeneticSequences
conditional_probabilities
returns a matrix with the conditional
probabilities of a categorical time series
conditional_probabilities(series, lag = 1)
conditional_probabilities(series, lag = 1)
series |
An object of type |
lag |
The considered lag (default is 1). |
Given a CTS of length with range
,
, the function computes the
matrix
,
with
, where
is the number of elements equal to
in the realization
and
is the number
of pairs
in the realization
.
A matrix with the conditional probabilities.
Ángel López-Oriona, José A. Vilar
Weiß CH, Göb R (2008). “Measuring serial dependence in categorical time series.” AStA Advances in Statistical Analysis, 92, 71–89.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] matrix_cp <- conditional_probabilities(series = sequence_1) # Computing the matrix of # joint probabilities for the first series in dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] matrix_cp <- conditional_probabilities(series = sequence_1) # Computing the matrix of # joint probabilities for the first series in dataset GeneticSequences
Categorical time series (CTS) of DNA sequences from different viruses
data(GeneticSequences)
data(GeneticSequences)
A tsibble
with four columns, which are:
Value
The categorical values of the time series in the dataset.
Series
Integer values indicating the considered time series (there are 32 time series in the dataset).
Time
Integer values indicating the temporal indexes of the observations.
Class
Integer values indicating the class of each time series.
The column Value
is the concatenation of 32 time series
taking four categorical values (DNA bases). The column Class
is formed
by integers from 1 to 4, indicating that there are 4 different classes in the database. Each class is associated with a different
family of viruses. For more information, see López-Oriona et al. (2023).
López-Oriona Á, Vilar JA, D’Urso P (2023). “Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences.” Information Sciences, 624, 467–492.
joint_probabilities
returns a matrix with the joint
probabilities of a categorical time series
joint_probabilities(series, lag = 1)
joint_probabilities(series, lag = 1)
series |
An object of type |
lag |
The considered lag (default is 1). |
Given a CTS of length with range
,
, the function computes the
matrix
,
with
, where
is the number
of pairs
in the realization
.
A matrix with the joint probabilities.
Ángel López-Oriona, José A. Vilar
Weiß CH, Göb R (2008). “Measuring serial dependence in categorical time series.” AStA Advances in Statistical Analysis, 92, 71–89.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] matrix_jp <- joint_probabilities(series = sequence_1) # Computing the matrix of # joint probabilities for the first series in dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] matrix_jp <- joint_probabilities(series = sequence_1) # Computing the matrix of # joint probabilities for the first series in dataset GeneticSequences
marginal_probabilities
returns a vector with the marginal
probabilities of a categorical time series
marginal_probabilities(series)
marginal_probabilities(series)
series |
An object of type |
Given a CTS of length with range
,
, the function computes the
vector
,
with
, where
is the number
of elements equal to
in the realization
.
A vector with the marginal probabilities.
Ángel López-Oriona, José A. Vilar
Weiß CH, Göb R (2008). “Measuring serial dependence in categorical time series.” AStA Advances in Statistical Analysis, 92, 71–89.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] vector_mp <- marginal_probabilities(series = sequence_1) # Computing the vector of # marginal probabilities for the first series in dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] vector_mp <- marginal_probabilities(series = sequence_1) # Computing the vector of # marginal probabilities for the first series in dataset GeneticSequences
plot_ccc
constructs a control chart for the cycle lengths of a categorical series
plot_ccc( series, mu_t, lcl_t, ucl_t, plot = TRUE, title = "Control chart (cycles)", ... )
plot_ccc( series, mu_t, lcl_t, ucl_t, plot = TRUE, title = "Control chart (cycles)", ... )
series |
An object of type |
mu_t |
The mean of the process measuring the cycle lengths. |
lcl_t |
The lower control limit. |
ucl_t |
The upper control limit. |
plot |
Logical. If |
title |
The title of the graph. |
... |
Additional parameters for the function. |
Constructs a control chart of a CTS based on cycle lengths. The chart is based on the
standardized statistic , with
and
,
where
expresses the length of a cycle ending with a specific category,
denotes the mean of
and
and
are
lower and upper individual control limits, respectively. Note that an
out-of-control alarm is signalled if
or
.
If plot = TRUE
(default), represents the control chart for the cycle lengths. Otherwise, the function
returns a matrix with the values of the standardized statistic for each time t
Ángel López-Oriona, José A. Vilar
Weiß CH (2008). “Visual analysis of categorical time series.” Statistical Methodology, 5(1), 56–71.
sequence_1 <- SyntheticData1[which(SyntheticData1$Series==1),] cycle_cc <- plot_ccc(series = sequence_1, mu_t = c(1, 1.5, 1), lcl_t = rep(10, 600), ucl_t = rep(10, 600)) # Representing # a control chart for the cycle lengths cycle_cc <- plot_ccc(series = sequence_1, mu_t = c(1, 1.5, 1), lcl_t = rep(10, 600), ucl_t = rep(10, 600), plot = FALSE) # Computing the # corresponding standardized statistic
sequence_1 <- SyntheticData1[which(SyntheticData1$Series==1),] cycle_cc <- plot_ccc(series = sequence_1, mu_t = c(1, 1.5, 1), lcl_t = rep(10, 600), ucl_t = rep(10, 600)) # Representing # a control chart for the cycle lengths cycle_cc <- plot_ccc(series = sequence_1, mu_t = c(1, 1.5, 1), lcl_t = rep(10, 600), ucl_t = rep(10, 600), plot = FALSE) # Computing the # corresponding standardized statistic
plot_cohen
constructs a serial dependence plot of a categorical
time series based on Cohen's kappa
plot_cohen( series, max_lag = 10, alpha = 0.05, plot = TRUE, title = "Serial dependence plot", bar_width = 0.12, ... )
plot_cohen( series, max_lag = 10, alpha = 0.05, plot = TRUE, title = "Serial dependence plot", bar_width = 0.12, ... )
series |
An object of type |
max_lag |
The maximum lag represented in the plot (default is 10). |
alpha |
The significance level for the corresponding hypothesis test (default is 0.05). |
plot |
Logical. If |
title |
The title of the graph. |
bar_width |
The width of the corresponding bars. |
... |
Additional parameters for the function. |
Constructs a serial dependence plot based on Cohens's kappa, ,
for several lags. A dashed lined is incorporated indicating the critical value
of the test based on the following asymptotic approximation (under the i.i.d. assumption):
where is the series length,
is the vector of estimated marginal probabilities for the
categories of
the series and
.
If plot = TRUE
(default), returns the serial dependence plot based on Cohens's kappa. Otherwise, the function
returns a list with the values of Cohens's kappa, the critical
value and the corresponding p-values.
Ángel López-Oriona, José A. Vilar
Weiß CH (2011). “Empirical measures of signed serial dependence in categorical time series.” Journal of Statistical Computation and Simulation, 81(4), 411–429.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] plot_ck <- plot_cohen(series = sequence_1, max_lag = 3) # Representing # the serial dependence plot list_ck <- plot_cohen(series = sequence_1, max_lag = 3, plot = FALSE) # Obtaining # the values of Cohens's kappa, the critical value and the p-values
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] plot_ck <- plot_cohen(series = sequence_1, max_lag = 3) # Representing # the serial dependence plot list_ck <- plot_cohen(series = sequence_1, max_lag = 3, plot = FALSE) # Obtaining # the values of Cohens's kappa, the critical value and the p-values
plot_cramer
constructs a serial dependence plot of a categorical
time series based on Cramer's vi
plot_cramer( series, max_lag = 10, alpha = 0.05, plot = TRUE, title = "Serial dependence plot", bar_width = 0.12, ... )
plot_cramer( series, max_lag = 10, alpha = 0.05, plot = TRUE, title = "Serial dependence plot", bar_width = 0.12, ... )
series |
An object of type |
max_lag |
The maximum lag represented in the plot (default is 10). |
alpha |
The significance level for the corresponding hypothesis test (default is 0.05). |
plot |
Logical. If |
title |
The title of the graph. |
bar_width |
The width of the corresponding bars. |
... |
Additional parameters for the function. |
Constructs a serial dependence plot based on Cramer's vi, ,
for several lags. A dashed lined is incorporated indicating the critical value
of the test based on the following asymptotic approximation (under the i.i.d. assumption):
where is the series length
and
is the number of categories in the time series.
If plot = TRUE
(default), returns the serial dependence plot based on Cramer's vi. Otherwise, the function
returns a list with the values of Cramer's vi, the critical
value and the corresponding p-values.
Ángel López-Oriona, José A. Vilar
Weiß CH (2013). “Serial dependence of NDARMA processes.” Computational Statistics and Data Analysis, 68, 213–238.
sequence_1 <- SyntheticData1[which(SyntheticData1$Series==1),] plot_cv <- plot_cramer(series = sequence_1, max_lag = 3) # Representing # the serial dependence plot list_cv <- plot_cramer(series = sequence_1, max_lag = 3, plot = FALSE) # Obtaining # the values of Cramer's vi, the critical value and the p-values
sequence_1 <- SyntheticData1[which(SyntheticData1$Series==1),] plot_cv <- plot_cramer(series = sequence_1, max_lag = 3) # Representing # the serial dependence plot list_cv <- plot_cramer(series = sequence_1, max_lag = 3, plot = FALSE) # Obtaining # the values of Cramer's vi, the critical value and the p-values
plot_cts
constructs a categorical time series plot
plot_cts(series, title = "Time series plot")
plot_cts(series, title = "Time series plot")
series |
An object of type |
title |
The title of the graph. |
Constructs a categorial time series plot for a given CTS.
The categorical time series plot.
Ángel López-Oriona, José A. Vilar
Weiß CH (2018). An introduction to discrete-valued time series. John Wiley and Sons.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] time_series_plot <- plot_cts(series = sequence_1) # Constructs a categorical # time series plot for the first 50 observations of the first time series in # dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] time_series_plot <- plot_cts(series = sequence_1) # Constructs a categorical # time series plot for the first 50 observations of the first time series in # dataset GeneticSequences
plot_ifsct
constructs the IFS circle transformation of
a categorical time series.
plot_ifsct(series, alpha, beta, title = "IFS circle transformation", ...)
plot_ifsct(series, alpha, beta, title = "IFS circle transformation", ...)
series |
An object of type |
alpha |
Parameter alpha in the circle transformation. |
beta |
Parameter beta in the circle transformation. |
title |
The title of the graph. |
... |
Additional parameters for the function. |
Constructs the IFS circle transformation for a given CTS, which is useful to identify cycles of arbitrary length.
The IFS circle transformation.
Ángel López-Oriona, José A. Vilar
Weiß CH (2008). “Visual analysis of categorical time series.” Statistical Methodology, 5(1), 56–71.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] ct <- plot_ifsct(sequence_1, alpha = 0.1, beta = 0.1) # Constructing the IFS circle transformation # for the first CTS in dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] ct <- plot_ifsct(sequence_1, alpha = 0.1, beta = 0.1) # Constructing the IFS circle transformation # for the first CTS in dataset GeneticSequences
plot_mcc
constructs a control chart for the marginal distribution
of a categorical series
plot_mcc( series, c, sigma, lambda = 0.99, k = 3.3, min_max = FALSE, plot = TRUE, title = "Control chart (marginal)", ... )
plot_mcc( series, c, sigma, lambda = 0.99, k = 3.3, min_max = FALSE, plot = TRUE, title = "Control chart (marginal)", ... )
series |
An object of type |
c |
The hypothetical marginal distribution. |
sigma |
A matrix containing the variances for each category (columns) and each time t (rows). |
lambda |
The constant lambda to construct the EWMA estimator. |
k |
The constant k to construct the k sigma limits. |
min_max |
Logical. If |
plot |
Logical. If |
title |
The title of the graph. |
... |
Additional parameters for the function. |
Constructs a control chart of a CTS with range based on the marginal distribution. The chart relies on the
standardized statistic
, where the
,
, are the components of the EWMA estimator of the marginal
distribution,
is the marginal probability of category
,
is the variance of
and
is a constant set by the user. If
min_max = FALSE
, then only the
statistics and
are plotted.
An out-of-control alarm is signalled if the statistics are below -1 or
above 1.
If plot = TRUE
(default), represents the control chart for the marginal distribution. Otherwise, the function
returns a matrix with the values of the standardized statistics for each time t
Ángel López-Oriona, José A. Vilar
Weiß CH (2008). “Visual analysis of categorical time series.” Statistical Methodology, 5(1), 56–71.
sequence_1 <- SyntheticData1[which(SyntheticData1$Series==1),] cycle_cc <- plot_ccc(series = sequence_1, mu_t = c(1, 1.5, 1), lcl_t = rep(10, 600), ucl_t = rep(10, 600)) cycle_md <- plot_mcc(series = sequence_1, c = c(0.3, 0.3, 0.4), sigma = matrix(rep(c(1, 1, 1), 600), nrow = 600)) # Representing # a control chart for the marginal distribution cycle_md <- plot_mcc(series = sequence_1, c = c(0.3, 0.3, 0.4), sigma = matrix(rep(c(1, 1, 1), 600), nrow = 600), plot = FALSE) # Computing the # corresponding standardized statistic
sequence_1 <- SyntheticData1[which(SyntheticData1$Series==1),] cycle_cc <- plot_ccc(series = sequence_1, mu_t = c(1, 1.5, 1), lcl_t = rep(10, 600), ucl_t = rep(10, 600)) cycle_md <- plot_mcc(series = sequence_1, c = c(0.3, 0.3, 0.4), sigma = matrix(rep(c(1, 1, 1), 600), nrow = 600)) # Representing # a control chart for the marginal distribution cycle_md <- plot_mcc(series = sequence_1, c = c(0.3, 0.3, 0.4), sigma = matrix(rep(c(1, 1, 1), 600), nrow = 600), plot = FALSE) # Computing the # corresponding standardized statistic
plot_ph
constructs the pattern histogram associated with a given category of a
categorical time series.
plot_ph( series, category, plot = TRUE, title = paste0("Pattern histogram (", category, ")"), ... )
plot_ph( series, category, plot = TRUE, title = paste0("Pattern histogram (", category, ")"), ... )
series |
An object of type |
category |
The selected category. |
plot |
Logical. If |
title |
The title of the graph. |
... |
Additional parameters for the function. |
Constructs the pattern histogram for a specific category of a CTS. This graph represents the frequencies of the cycles for the corresponding category according to their length.
The pattern histogram.
Ángel López-Oriona, José A. Vilar
Weiß CH (2008). “Visual analysis of categorical time series.” Statistical Methodology, 5(1), 56–71.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] ph <- plot_ph(sequence_1, category = 'a') # Constructing the pattern histogram # for the first CTS in dataset GeneticSequences concerning the category 'a' cycle_lengths <- plot_ph(sequence_1, category = 'a', plot = FALSE) # Obtaining the frequencies of cycle lengths
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] ph <- plot_ph(sequence_1, category = 'a') # Constructing the pattern histogram # for the first CTS in dataset GeneticSequences concerning the category 'a' cycle_lengths <- plot_ph(sequence_1, category = 'a', plot = FALSE) # Obtaining the frequencies of cycle lengths
plot_reg
constructs the rate evolution graph
proposed by Ribler (1997).
plot_reg( series, title = "Rate evolution graph", linear_fit = FALSE, cat_res = NULL, ... )
plot_reg( series, title = "Rate evolution graph", linear_fit = FALSE, cat_res = NULL, ... )
series |
An object of type |
title |
The title of the graph. |
linear_fit |
Logical. I |
cat_res |
If this parameter is set to any of the categories of the series, then the function returns a graph of residuals for the linear model associated with the corresponding category |
... |
Additional parameters for the function. |
Given a CTS of length with range
,
, and the
corresponding binarized time series,
,
the function constructs the rate evolution graph. Specifically, consider the
series of cumulated sums given by
, with
,
. The rate evolution graph displays a standard time series
plot for each one of the components of
simultaneously in one graph.
The rate evolution graph.
Ángel López-Oriona, José A. Vilar
Ribler RL (1997). Visualizing categorical time series data with applications to computer and communications network traces. Ph.D. thesis, Virginia Polytechnic Institute and State University.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] reg <- plot_reg(sequence_1) # Constructing the rate # evolution graph for the first time series in dataset GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] reg <- plot_reg(sequence_1) # Constructing the rate # evolution graph for the first time series in dataset GeneticSequences
plot_se
represents the spectral envelope of
a categorical time series
plot_se(series)
plot_se(series)
series |
An object of type |
The function represents the spectral envelope of a categorical time series
Returns returns a plot of the spectral envelope.
Ángel López-Oriona, José A. Vilar
Stoffer DS, Tyler DE, McDougall AJ (1993). “Spectral analysis for categorical time series: Scaling and the spectral envelope.” Biometrika, 80(3), 611–622.
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] plot_se(sequence_1) # Representing the spectral envelope for the first series in dataset # GeneticSequences
sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),] plot_se(sequence_1) # Representing the spectral envelope for the first series in dataset # GeneticSequences
Categorical time series (CTS) of protein sequences from different species
data(ProteinSequences)
data(ProteinSequences)
A tsibble
with four columns, which are:
Value
The categorical values of the time series in the dataset.
Series
Integer values indicating the considered time series (there are 40 time series in the dataset).
Time
Integer values indicating the temporal indexes of the observations.
Class
Integer values indicating the class of each time series.
The column Value
is the concatenation of 40 time series
taking four categorical values (amino-acids). The column Class
is formed
by integers from 1 to 4, indicating that there are 4 different classes in the database. Each class is associated with a different
family of viruses. For more information, see López-Oriona et al. (2023).
López-Oriona Á, Vilar JA, D’Urso P (2023). “Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences.” Information Sciences, 624, 467–492.
Categorical time series (CTS) of sleep stages from different subjects
data(SleepStages)
data(SleepStages)
A tsibble
with four columns, which are:
Value
The categorical values of the time series in the dataset.
Series
Integer values indicating the considered time series (there are 62 time series in the dataset).
Time
Integer values indicating the temporal indexes of the observations.
Class
Integer values indicating the class of each time series.
The column Value
is the concatenation of 62 time series
taking six categorical values (sleep stages). The column Class
is formed
by the integers 1 and 2 indicating that there are 2 different classes in the database. Each class is associated with a sleep
disorder (class 1 refers to nocturnal frontal lobe epilepsy, while class refers 2 to REM behavior disorder).
For more information, see López-Oriona et al. (2023).
López-Oriona Á, Vilar JA, D’Urso P (2023). “Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences.” Information Sciences, 624, 467–492.
Synthetic dataset containing 80 CTS generated from four different generating processes.
data(SyntheticData1)
data(SyntheticData1)
A tsibble
with four columns, which are:
Value
The categorical values of the time series in the dataset.
Series
Integer values indicating the considered time series (there are 80 time series in the dataset).
Time
Integer values indicating the temporal indexes of the observations.
Class
Integer values indicating the class of each time series.
@details The column Value
is the concatenation of 80 time series of length 600
taking three categorical values. Series 1-20, 21-40, 41-60 and 61-80 were generated from
Markov Chains with different matrices of transition probabilities (see Scenario 1 in López-Oriona et al. (2023)).
Therefore, there are 4 different classes in the dataset.
López-Oriona Á, Vilar JA, D’Urso P (2023). “Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences.” Information Sciences, 624, 467–492.
Synthetic dataset containing 80 CTS generated from four different generating processes.
data(SyntheticData2)
data(SyntheticData2)
A tsibble
with four columns, which are:
Value
The categorical values of the time series in the dataset.
Series
Integer values indicating the considered time series (there are 80 time series in the dataset).
Time
Integer values indicating the temporal indexes of the observations.
Class
Integer values indicating the class of each time series.
@details The column Value
is the concatenation of 80 time series of length 600
taking three categorical values. Series 1-20, 21-40, 41-60 and 61-80 were generated from
Hidden Markov Models with different matrices of transition and emission probabilities (see Scenario 2 in López-Oriona et al. (2023)).
Therefore, there are 4 different classes in the dataset.
López-Oriona Á, Vilar JA, D’Urso P (2023). “Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences.” Information Sciences, 624, 467–492.
Synthetic dataset containing 80 CTS generated from four different generating processes.
data(SyntheticData3)
data(SyntheticData3)
A tsibble
with four columns, which are:
Value
The categorical values of the time series in the dataset.
Series
Integer values indicating the considered time series (there are 80 time series in the dataset).
Time
Integer values indicating the temporal indexes of the observations.
Class
Integer values indicating the class of each time series.
@details The column Value
is the concatenation of 80 time series of length 600
taking three categorical values. Series 1-20, 21-40, 41-60 and 61-80 were generated from
NDARMA processes with different orders and vectors of coefficients (see Scenario 3 in López-Oriona et al. (2023)).
Therefore, there are 4 different classes in the dataset.
López-Oriona Á, Vilar JA, D’Urso P (2023). “Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences.” Information Sciences, 624, 467–492.