SIR optimally thresholded
SIR_threshold_opt.Rd
Apply a single-index \(SIR\) on \((X,Y)\) with \(H\) slices, with a soft/hard thresholding
of the interest matrix \(\widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n\) by an optimal
parameter \(\lambda_{opt}\). The \(\lambda_{opt}\) is found automatically among a vector
of n_lambda
\(\lambda\), starting from 0 to the maximum value of
\(\widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n\). For each feature of \(X\),
the number of \(\lambda\) associated with a selection of this feature is stored
(in a vector of size \(p\)). This vector is sorted in a decreasing way. Then, thanks to
strucchange::breakpoints
, a breakpoint is found in this sorted vector. The coefficients
of the variables at the left of the breakpoint, tend to be automatically toggled to 0 due
to the thresholding operation based on \(\lambda_{opt}\), and so should be removed (useless
variables). Finally, \(\lambda_{opt}\) corresponds to the first \(\lambda\) such that the
associated \(\hat{b}\) provides the same number of zeros as the breakpoint's value.
For example, for \(X \in R^{10}\) and n_lambda=100
, this sorted vector can look like this :
X10 | X3 | X8 | X5 | X7 | X9 | X4 | X6 | X2 | X1 |
2 | 3 | 3 | 4 | 4 | 4 | 6 | 10 | 95 | 100 |
Here, the breakpoint would be 8.
Usage
SIR_threshold_opt(
Y,
X,
H = 10,
n_lambda = 100,
thresholding = "hard",
graph = TRUE,
output = TRUE,
choice = ""
)
Arguments
- Y
A numeric vector representing the dependent variable (a response vector).
- X
A matrix representing the quantitative explanatory variables (bind by column).
- H
The chosen number of slices (default is 10).
- n_lambda
The number of lambda to test. The n_lambda tested lambdas are uniformally distributed between 0 and the maximum value of the interest matrix. (default is 100).
- thresholding
The thresholding method to choose between hard and soft (default is hard).
- graph
A boolean, set to TRUE to plot graphs (default is TRUE).
- output
A boolean, set to TRUE to print informations (default is TRUE).
- choice
the graph to plot:
"estim_ind" Plot the estimated index by the SIR model versus Y.
"opt_lambda" Plot the choice of the optimal lambda.
"cos2_selec" Plot the evolution of cos^2 and variable selection according to lambda.
"regul_path" Plot the regularization path of b.
"" Plot every graphs (default).
Value
An object of class SIR_threshold_opt, with attributes:
- b
This is the optimal estimated EDR direction, which is the principal eigenvector of the interest matrix.
- lambdas
A vector that contains the tested lambdas.
- lambda_opt
The optimal lambda.
- mat_b
A matrix of size p*n_lambda that contains an estimation of beta in the columns for each lambda.
- n_lambda
The number of lambda tested.
- vect_nb_zeros
The number of 0 in b for each lambda.
- list_relevant_variables
A list that contains the variables selected by the model.
- fit_bp
An object of class breakpoints from the strucchange package, that contains informations about the breakpoint which allows to deduce the optimal lambda.
- indices_useless_var
A vector that contains p items: each variable is associated with the number of lambda that selects this variable.
- vect_cos_squared
A vector that contains for each lambda, the cosine squared between vanilla SIR and SIR thresholded.
- Y
The response vector.
- n
Sample size.
- p
The number of variables in X.
- H
The chosen number of slices.
- M1
The interest matrix thresholded with the optimal lambda.
- thresholding
The thresholding method used.
- call
Unevaluated call to the function.
- X_reduced
The X data restricted to the variables selected by the model. It can be used to estimate a new SIR model on the relevant variables to improve the estimation of b.
- index_pred
The index Xb' estimated by SIR.
Examples
# Generate Data
set.seed(2)
n <- 200
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply SIR with soft thresholding
SIR_threshold_opt(Y,X,H=10,n_lambda=300,thresholding="soft")
#>
#> Call:
#> SIR_threshold_opt(Y = Y, X = X, H = 10, n_lambda = 300, thresholding = "soft")
#>
#> Results of EDR directions estimation:
#>
#> Estimated direction
#> X1 -0.705
#> X2 -0.709
#> X3 0.000
#> X4 0.000
#> X5 0.000
#> X6 0.000
#> X7 0.000
#> X8 0.000
#> X9 0.000
#> X10 0.000
#>