Skip to contents

Apply a single-index \(SIR\) on \((X,Y)\) with \(H\) slices, with a soft/hard thresholding of the interest matrix \(\widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n\) by an optimal parameter \(\lambda_{opt}\). The \(\lambda_{opt}\) is found automatically among a vector of n_lambda \(\lambda\), starting from 0 to the maximum value of \(\widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n\). For each feature of \(X\), the number of \(\lambda\) associated with a selection of this feature is stored (in a vector of size \(p\)). This vector is sorted in a decreasing way. Then, thanks to strucchange::breakpoints, a breakpoint is found in this sorted vector. The coefficients of the variables at the left of the breakpoint, tend to be automatically toggled to 0 due to the thresholding operation based on \(\lambda_{opt}\), and so should be removed (useless variables). Finally, \(\lambda_{opt}\) corresponds to the first \(\lambda\) such that the associated \(\hat{b}\) provides the same number of zeros as the breakpoint's value.

For example, for \(X \in R^{10}\) and n_lambda=100, this sorted vector can look like this :

X10X3X8X5X7X9X4X6X2X1
23344461095100

Here, the breakpoint would be 8.

Usage

SIR_threshold_opt(
  Y,
  X,
  H = 10,
  n_lambda = 100,
  thresholding = "hard",
  graph = TRUE,
  output = TRUE,
  choice = ""
)

Arguments

Y

A numeric vector representing the dependent variable (a response vector).

X

A matrix representing the quantitative explanatory variables (bind by column).

H

The chosen number of slices (default is 10).

n_lambda

The number of lambda to test. The n_lambda tested lambdas are uniformally distributed between 0 and the maximum value of the interest matrix. (default is 100).

thresholding

The thresholding method to choose between hard and soft (default is hard).

graph

A boolean, set to TRUE to plot graphs (default is TRUE).

output

A boolean, set to TRUE to print informations (default is TRUE).

choice

the graph to plot:

  • "estim_ind" Plot the estimated index by the SIR model versus Y.

  • "opt_lambda" Plot the choice of the optimal lambda.

  • "cos2_selec" Plot the evolution of cos^2 and variable selection according to lambda.

  • "regul_path" Plot the regularization path of b.

  • "" Plot every graphs (default).

Value

An object of class SIR_threshold_opt, with attributes:

b

This is the optimal estimated EDR direction, which is the principal eigenvector of the interest matrix.

lambdas

A vector that contains the tested lambdas.

lambda_opt

The optimal lambda.

mat_b

A matrix of size p*n_lambda that contains an estimation of beta in the columns for each lambda.

n_lambda

The number of lambda tested.

vect_nb_zeros

The number of 0 in b for each lambda.

list_relevant_variables

A list that contains the variables selected by the model.

fit_bp

An object of class breakpoints from the strucchange package, that contains informations about the breakpoint which allows to deduce the optimal lambda.

indices_useless_var

A vector that contains p items: each variable is associated with the number of lambda that selects this variable.

vect_cos_squared

A vector that contains for each lambda, the cosine squared between vanilla SIR and SIR thresholded.

Y

The response vector.

n

Sample size.

p

The number of variables in X.

H

The chosen number of slices.

M1

The interest matrix thresholded with the optimal lambda.

thresholding

The thresholding method used.

call

Unevaluated call to the function.

X_reduced

The X data restricted to the variables selected by the model. It can be used to estimate a new SIR model on the relevant variables to improve the estimation of b.

index_pred

The index Xb' estimated by SIR.

Examples

# Generate Data
set.seed(2)
n <- 200
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps

# Apply SIR with soft thresholding
SIR_threshold_opt(Y,X,H=10,n_lambda=300,thresholding="soft")
#> 
#> Call:
#> SIR_threshold_opt(Y = Y, X = X, H = 10, n_lambda = 300, thresholding = "soft")
#> 
#> Results of EDR directions estimation:
#> 
#>     Estimated direction
#> X1               -0.705
#> X2               -0.709
#> X3                0.000
#> X4                0.000
#> X5                0.000
#> X6                0.000
#> X7                0.000
#> X8                0.000
#> X9                0.000
#> X10               0.000
#>