Biscuit

From Stikir

Jump to: navigation, search

Biscuit Data

The issue here is to use the spectroscopy results to infer the composition of biscuit dough. There are 40 samples, of which the 23rd is of dubious quality. For each sample the file Biscuit_nir.csv contains Near-Infrared (NIR) spectroscopy results at 700 equally spaced wavelengths from 1100 to 2498nm. Meanwhile the file Biscuit_characteristics.csv contains corresponding percentages of fat, sucrose, flour, and water, the output values to be inferred.

References which have studied this dataset include,

P. J. Brown, T. Fearn, and M. Vannucci. Bayesian wavelet regression on curves with application to a spectroscopic calibration problem. Journal of the American Statistical Association, 96(454):398–408, June 2001.

M. Stone and R. J. Brooks. Continuum regression: Crossvalidated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. Journal of the Royal Statistical Society. Series B., 52(2):237–269, 1990.

In Brown et al. the regression output considered is the logarithm of the ratio of the percentages of fat, sucrose and water to the percentage of flour. We focus here on the log of the ratio of the percentage of fat to the percentage of flour.

[1]  40 700
       V1      V2      V3      V4      V5
1 0.24985 0.24974 0.24944 0.24904 0.24889
2 0.25610 0.25610 0.25635 0.25656 0.25687
3 0.27477 0.27472 0.27482 0.27477 0.27482
4 0.24374 0.24374 0.24378 0.24369 0.24345
5 0.24341 0.24326 0.24326 0.24336 0.24307
[1] 40  4
      Fat           Sucrose          Flour           Water      
 Min.   :15.01   Min.   : 9.95   Min.   :43.53   Min.   :11.03  
 1st Qu.:16.66   1st Qu.:13.32   1st Qu.:46.36   1st Qu.:13.27  
 Median :18.46   Median :16.36   Median :49.50   Median :14.28  
 Mean   :18.35   Mean   :16.54   Mean   :48.99   Mean   :14.19  
 3rd Qu.:19.94   3rd Qu.:19.82   3rd Qu.:50.89   3rd Qu.:15.16  
 Max.   :21.59   Max.   :23.11   Max.   :54.61   Max.   :17.41  
 [1] -0.8654967 -0.9938766 -1.1531252 -0.7470194 -1.0301386 -0.8689771
 [7] -1.2158232 -1.0624238 -0.7852275 -0.9815029 -0.8666239 -1.1281347
[13] -0.9698971 -1.0327689 -0.9716003 -1.1938426 -0.9025386 -1.1949456
[19] -0.7286892 -0.9740855 -0.9646495 -0.7824205 -0.8895562 -1.2806377
[25] -1.0530696 -0.9023045 -0.8782760 -0.9600620 -0.9934475 -1.1541705
[31] -0.9081274 -1.0106675 -1.1467369 -1.0512046 -0.9602223 -1.1506185
[37] -0.7574254 -1.0310951 -0.9244735 -0.9710432
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.2810 -1.0550 -0.9728 -0.9859 -0.8991 -0.7287 

Heatmap and dendrogram combination


Of all the NIR wavelengths, only a few contain useful information about the real-valued output. We would like to identify these. One traditional method for this sort of problem is Partial Least Squares (PLS).

Personal tools
Registration
Contact