23rd Jul, 2008

Future CACs

Originally, the Chemometrics in Analytical Chemistry (CAC) conferences were held every 4 years. That changed with CAC-2002, which was held in Seattle (USA), only two years after CAC-2000, in Antwerp (Belgium). Since then, the schedule has been conferences in Europe every four years, with conferences outside of Europe in the even years in-between. CAC-2004 was in Lisbon (Portugal), CAC-2006 in Águas de Lindóia-Guarulhos (Brazil), and the just completed CAC-2008 was in Montpellier (France).

In 2010, CAC moves back to North America, to New Orleans (USA). The host committee is headed up by Steve Brown and Karl Booksh. They invite you to come to the New Orleans, the birthplace of Jazz, in the first two weeks of June, 2010, for chemometrics plus southern hospitality, Louisiana Cajun-Creole cooking and the continuous party that is the French Quarter.

CAC-2012 will move back to Europe. Several groups have expressed an interest in hosting the meeting, so it’s possible it could be in Poland, Germany, or Hungary. The CAC series is alive and well!

BMW

22nd Jul, 2008

Software as Automobile

If your chemometrics software package were an automobile, what would it look like?

Sorry, couldn’t resist! :)

BMW

Some time ago I asked Rasmus Bro if he would be interested in teaching a short course with me in Europe this fall. He said, “Yes, and I really want to go to Rome!” Fortunately, I’d been in contact lately with Dr. Giovanni Visco of Rome University Chemistry Department regarding the CMA4CH meeting.

Dr. Visco has been kind enough to put us in touch with CASPUR, the nearby “Interuniversity Consortium for Supercomputing and Research,” which has good facilities for teaching a computer based course. We’re currently planning on teaching an introductory 3-day course October 27-29, 2008. The course will include:

Obviously, we’ll have to do a little editing of these courses to fit this 4.5 days worth of material into 3 days! If you have questions, please drop me a line (bmw@eigenvector.com).

See you in Rome this fall!

BMW

16th Jul, 2008

Sabbatical at EPFL

In September of 2007 I received an offer I couldn’t refuse. When Professor Dominique Bonvin of Ecole Polytechnique Fédérale de Lausanne (EPFL) wrote and asked if there was a chance to have me spend some time there in 2008, it didn’t take long for me to send back a positive response.

Besides the fact that EPFL is in a beautiful location, (on Lake Geneva in Switzerland), Dominique’s group there at the Automatic Control Laboratory has research interests that are well aligned with Eigenvector’s, and, of course, our users. For instance, they have worked on issues surrounding MSPC and curve resolution for investigating reacting systems.

I’ll be working with Michael Amrhein on model updating schemes for batch processes. This is rather timely as we work more and more with monitoring batch processes, including semiconductor manufacturing and bioreactors. Model updating is important because these process exhibit considerable drift with time.

I’ll be spending a little more than 2 months at EPFL, arriving in the last couple days of August and leaving in the first few days of November. My wife (and assistant!) Jill, will be along, as will daughters Clare and Mattie. It should be pretty much business as usual at EVRI, though it make take a few days on either end to set up operations to process orders, etc.

I look forward to this time to “sharpen the saw.”

BMW

Nomenclature has been a subject of some discussion within the chemometrics community, such as on the list ICS-L. I recall exchanges dealing with the definition of various terms such as “factor,” “latent variable,” and “principal component.” Its clear that we don’t all use these terms in exactly the same way. For the most part, this doesn’t bother me. Authors should be free to use terms as they wish provided that they define them unambiguously in their text.

However, it would be useful for the community to have a set of generally agreed upon definitions for commonly used terms and concepts. Enter IUPAC, the International Union of Pure and Applied Chemistry. I remember first hearing of IUPAC when I was an undergrad learning organic chemistry. Learning the IUPAC names for compounds was always straightforward as they were very systematic. This was in contrast to learning common names, which, it seemed at times, were pretty much random.

Professor D. Brynn Hibbert, of the University of New South Wales, has received funding for a small IUPAC project to develop a glossary of concepts and terms in chemometrics. He presented a brief introduction to this project at CAC-2008. His collaborators on this include Professor Pentti Minkkinen, Lappeenranta University of Technology, Dr. Klaas Faber, Chemometry Consultancy, and myself.

The initial project goal is to establish the scope of the problem, and to develop a draft glossary and a consultation process. To do this we plan to set up a “wiki” where members of the community could edit terms or add new ones. We’ve had several offers of existing glossaries which could be used to populate the wiki initially. We’ll do that and then let everybody have at it. The wiki software will keep track of all the edits submitted, so we’ll know what terms are particularly contentious. Once it has settled down, the project team will create a consensus list for eventual presentation to IUPAC.

An IUPAC glossary would make it easier for authors as they could simply state that they will adhere to IUPAC definitions, and thus not have to define terms further. But perhaps more importantly, it would make things easier for students of chemometrics, who could learn a common set of terms and then only have to worry about the exceptions as they come up. Ultimately, it should be good for the field of chemometrics.

It’s Eigenvector’s job to get the wiki set up. I’ll let you know when it becomes available.

BMW

Eigenvector was pleased to sponsor the “Best Poster” prize at CAC-2008. The top three poster presenters all received a certificate good for a copy of PLS_Toolbox or Solo (well, OK, it wasn’t exactly a certificate, it was one of my business cards with “Good for one PLS_Toolbox” written on the back!). The top poster also got $500USD, which equates to 320€.

There were 160 posters presented at CAC, so this was quite a contest! The winners, selected by the CAC scientific committee, represent some exceptional efforts selected from a very large body of good work.

The third place poster was “Drift compensation of gas sensor array data by Orthogonal Signal Correction” by M. Padilla, A. Perera, I. Montoliu, A. Chaudry, K. Persaud and S. Marco. This is a nice application of OSC. We’ve used it for spectroscopic instrument standardization and found it to work well in that application. It makes sense that it would work well for electronic noses as well.

Second place went to Pat Wiegand, Randy Pell and Enric Comas, all of Dow, for “Simultaneous Variable and Sample Selection for PLS Calibrations Using a Robust Genetic Algorithm.” This work addressed the problem where one has both samples and variables that are irrelevant for building a predictive model for a given property. Most previous work address either the variable selection or the sample selection problem, but not both. The robustness of their algorithm comes, in part, from a robust PLS algorithm from the LIBRA Toolbox, developed by Sabine Verboven and Mia Hubert. This toolbox is what provides the robust options for PCA and PLS in PLS_Toolbox, so of course we think that was a very good choice!

Emma Peré-Trepat accepted the first place prize on behalf of herself and co-workers I. Montoliu, F.P. Martin, S. Rezzi and S. Kochhar, all of Nestlé Research Center. They presented “Data fusion strategies for nutrimetabonomics.” Nutrimetabonomics, the application of metabonomics to nutritional sciences, is the study of metabolic responses to the consumption of specific foods and ingredients. Their approach used hierarchical modeling to fuse NMR and meta-data.

Congratulations again to the winners!

BMW

4th Jul, 2008

More from CAC-2008

Its been a long week, absolutely packed. I haven’t gotten to every session, but I thought I’d include a few notes about several more talks I really enjoyed.

Selena Richards presented “Self-Modeling Curve Resolution: a new approach to recovering temporal metabolite signal modulation in NMR spectroscopic data: Application to a life-long caloric restriction in dogs.” Its been known for some time that restricting caloric intake lengthens the life span of most mammals. This talk is concerned with finding the metabolomic signature of this effect. Besides the novel use of MCR, I enjoyed the talk because the subjects were Labrador Retrievers. We’ve been trying to keep our yellow lab, Jenny, thin, also because she has some joint problems that would be exacerbated if she was over weight. But man, labs will eat anything, so keeping them out of the food can be a challenge! I’m not sure how calorie restriction works in humans, but I’m sure life seems longer!

Steven Short talked on “Determination of Figures of Merit for Near-Infrared and Raman Spectrophotometers by Net Analyte Signal Analysis for a Four Compound Solid Dosage System.” This work discussed how NAS can be used to compare analytical instruments. I took a look at NAS some years ago after Avi Lorber published “Net analyte signal calculation in multivariate calibration.” My main disappointment with NAS, when calculated based on a regression model, is that its a function of the number of factors in the model, and it isn’t particularly useful for picking number of factors. Short gave a nice application of where NAS can be truly useful.

Resolution of hyperspectral images. Pre-, in- and post-processing” was presented by Anna de Juan. The talk was something of a overview of past work, but really summarized very well many of the possibilities of using MCR in images. Much of this talk is included in her article (with Maeder, Hancewicz and Tauler) “Use of local rank-based spatial information for resolution of spectroscopic images,” J. Chemo, 22, pps 291-298, 2008. I think the work is a good guide for users of PLS/MIA_Toolbox in that it shows a lot of what you can do with the tools.

All in all it was a very good conference. The only down side was that it was sometimes a victim of its own success–there were simply too many talks, posters and people I wanted to talk with to get to them all!

BMW

3rd Jul, 2008

Update from CAC-2008

Greetings from Montpellier, where Jeremy and I are attending CAC-2008. We’re now into our third day of the conference, and it has gotten off to a good start. I thought I’d just take a minute and highlight several talks that I really enjoyed.

Brynn Hibbert presented “Analysis of variance of complex data sets using GEMANOVA: An example using kill kinetics data.” GEMANOVA is essentially a variant of PARAFAC, used like ANOVA to determine what effects are significant, but in multi-way data. The talk made me want to make sure that we can get PARAFAC working in this way for our users. The trick is in setting the constraint options, and in automating the building of sequences of models with different constraints. In any case, this talk demonstrates that PARAFAC, in the right hands, is a very powerful and versatile technique.

New proposals for PCA model building with missing data” was delivered by Alberto Ferrer. As usual, Alberto gave a very clear presentation–a nice talk to listen to. Alberto showed how methods for imputing missing data in PCA models when a model exists can also be used to develop new PCA models in the face of missing data. PLS_Toolbox, incidentally, uses one of these methods. It was also shown that the NIPALS method for building models with missing data does not work well in comparison to the other methods.

I also really enjoyed Henri Tapp’s talk, “OPLS: an ideal tool for interpreting PLS regression models?” Henri discussed, why, in his opinion, there really isn’t much advantage to OPLS, even in interpretability. (It is admitted by its creator, Johan Trygg, that it does not improve predictive ability over conventional PLS.) Another interesting point in Tapp’s talk was the bibliographic survey of papers citing the original OPLS paper, which showed that OPLS is mostly referenced by Umeå/Umetrics authors and Imperial University. I wonder, how much do you suppose the patent on OPLS has to do with this rather in-bred distribution?

My own talk, “Tools for Multivariate Calibration Robustness Testing with Observations on Effects of Data Preprocessing” was reasonably well-received (at least I wasn’t booed off the stage) and sparked some discussion. I’ve learned over the years that a relatively simple talk with some nice graphics is a good thing to present in the right after lunch spot, when conferees are suffering from PLS (post-lunch syndrome). And of course always energetic & enthusiastic Jeremy did a great job with “Automatic Sample Weighting for Inferential Modeling of Historical In-Control Process Data.”

So far, so good. More later!

BMW

In 2007, Randy Pell, Scott Ramos and Rolf Manne (PRM) ignited a controversy when they published “The model space in PLS regression.” Their paper pointed out that the X-block residuals in different PLS packages were not the same. Specifically, packages which use the NIPALS or SIMPLS method for PLS (including PLS_Toolbox/Solo, Unscrambler and SIMCA-P) produce different residuals than those that use Lanczos Bidiagonalization (primarily Pirouette). PRM claimed that that residuals in NIPALS were “inconsistent” and made the rather inflammatory statement that NIPALS “amounted to giving up mathematics.”

As you might imagine, this has resulted in a considerable amount of activity in the chemometrics community. And it really has been useful because many of us, including myself, have learned quite a bit about PLS, a subject we thought we already understood pretty well.

There will be a crop of articles in the upcoming issue of Journal of Chemometrics on this subject. This will include a letter to the editor by Svante Wold et. al., “The PLS model space revisited,” which takes a theoretical/philosophical look at how PLS via NIPALS is derived and shows that, in this light, it is not inconsistent. Rasmus Bro and Lars Eldén’s contribution, “PLS Works,” shows that while the PLS NIPALS residual space is orthogonal to the model scores, and thus the fitted y-values, this is not true of Bidiag. I understand that there will also be a paper in the upcoming issue from Rolf Ergon, though I don’t know the title yet.

The work of Bro and Eldén served as a launching point for an investigation of my own regarding how and why Bidiag residuals are correlated with scores. The result is a poster which I will show at CAC-2008 next week, “Properties of PLS, and Differences between NIPALS and Lanczos Bidiagonalization.” The poster shows why and when NIPALS and Bidiag residuals are different, and shows some examples of when Bidiag residuals are strongly correlated with the scores. This includes the main example given in PRM, where, as it turns out, the main difference in the residuals is due to the 3rd factor in the Bidiag model being quite correlated with the residuals.

If you are attending CAC, please drop by and talk to me during the poster presentation. I’m sure we’ll have a lively discussion!

BMW

References:
R. J. Pell, L. S. Ramos and R. Manne, “The model space in PLS regression,” J.Chemometrics, Vol. 21, pps 165-172, 2007. 
R. Bro and L. Eldén, “PLS Works,” J. Chemometrics, in press, 2008.
S. Wold, M. Høy, H. Martens, J. Trygg, F. Westad, J. MacGregor and B.M. Wise, “The PLS model space revisited, J. Chemometrics, in press, 2008.
B.M. Wise, “Properties of PLS, and Differences between NIPALS and Lanczos Bidiagonalization,” CAC-2008, Montpellier, France, 2008.

The Eleventh Conference on Chemometrics in Analytical Chemistry, CAC-2008, begins next week in Montpellier, France. The conference runs from June 30 through July 4.

All indications are that it will be a great conference. The organizers say that attendance is will be close to 350, which must be a record for CAC.

Eigenvector will be there, of course. Our Jeremy Shaver will present Automatic Sample Weighting for Inferential Modeling of Historical In-Control Process Data, which is concerned with the problem of developing calibration models from data where the bulk of the samples are tightly clustered, with only a few samples exhibiting significant variation.

I’ll be there as well, presenting Tools for Multivariate Calibration Robustness Testing with Observations on Effects of Data Preprocessing. We all want calibration models that are robust, and thus, have good longevity. But how do you tell how brittle a model is? This talk demonstrates some tools for assessing model performance in the face of changes in the samples and instruments.

I’m also presenting a poster, Properties of PLS, and Differences between NIPALS and Lanczos Bidiagonalization. I’ll write about this a little more in my next post, but suffice it to say that there is a bit of controversy of late about various algorithms for Partial Least Squares Regression and the residuals they generate.

Eigenvector is of course proud to be a sponsor of CAC. We are sponsoring the Best Poster Contest, and will present the winner with $500USD (about 322€ today). I personally really like poster sessions. Its a great time to really talk with people about their research, and its generally much more of an exchange of scientific ideas than a talk, which are primarily one-way communications.

So, if you are going to CAC, look us up. Jeremy and I are always happy to answer questions about our products and services, and are always looking for user input on features for PLS_Toolbox, Solo, etc.

See you at CAC!

BMW

Categories