In this first example, we have a matrix, A, with 100 columns of data, but the data in B only really depends on the first 4 of those columns.
>
|
|
>
|
|
The permutation vector computed shows the first 4 entries are relevant, and the coefficient vector, LSP, exactly matches the terms used to build B. All other columns not referenced by p can be discarded.
>
|
|
| (1) |
In this second example, we will create a result vector that depends on 10 variables, of which only 5 of them are measured in the matrix, A (along with 95 other measurements of irrelevant/random properties).
>
|
|
>
|
|
>
|
|
>
|
|
>
|
|
>
|
|
The notation A[..,p] will select all the rows of A and only the column indices found in the list p. This is the reduced matrix. Note the correlation of B and (A[..,p].LSP)
>
|
|
Compare this with the standard least squares fit.
>
|
|
>
|
|
The correlation with the training data is a closer match using standard least squares, but let's see what happens when we use these models to predict results using new data.
>
|
|
>
|
|
>
|
|
>
|
|
>
|
|
Note how the correlation of the new data is much better using the predictive model. The standard model suffers from overfitting.
>
|
|
>
|
|