11th Jun, 2009

Referencing Software

Yesterday I picked up a newly-arrived journal and noted that the article highlighted on the front page looked quite interesting as we have been doing some related work. I eagerly turned to the article and found that the author had been using PLS-DA (Partial Least Squares Discriminant Analysis) and was, in fact, one of our PLS_Toolbox users. Imagine my disappointment when I could find no reference to our software in the article! The article had many references, with more than 50 journal articles cited and at least a half dozen equipment manufacturers named. But there was no mention of the software that was used to turn the measurements into the results that were presented.

I checked with the author of the article, and yes, our software was used to produce the results that were re-plotted in another application prior to publication. But exactly whose software was used is beside the point. The point is that software is a critical part of the experiment and, in order to ensure reproducibility, should be referenced.

Some might say that referencing the original work on the development of the particular analysis method implemented in the software should suffice, (though in this instance that wasn’t done either, the author referenced previous work of their own where they used PLS-DA). I’d argue that isn’t enough. The problem is that it typically takes a LOT of additional work to turn a method from an academic paper into a working program. There are often many special (sometimes schizophrenic) cases that must be handled properly to assure consistent results. Sometimes various meta-parameters must be optimized. Preprocessing can be critical. And then there is the whole interface which allows the user to interact with the data so that it can be effectively viewed, sorted and selected.

So why do I care? Obviously, there is the commercial aspect: having our software referenced is good advertising for us, and leads to more sales. But beyond that, (like many other publishers of scientific software, I’m sure), our software is probably our most significant work of scholarship. To not reference it is to not acknowledge the contributions we’ve made to the field.

So I’m calling on all authors to reference the software they use, and editors and reviewers to check that they do. Listing it in the references would be preferred. Software is, after all, a publication, with a title, version number (edition), publisher, and year of publication. Often, authors are also known, and can be listed. But one way or the other, software should be cited as it is critical to reproducibility and represents scholarly work upon which results depend. Referencing software upholds scientific and scholarly tradition and it is academically dishonest to not do so.

BMW

Responses

Agreed, within some limits. My bit of the USg’mnt strongly discourages explicit reference of “generic” wares, hard and soft. And indeed commercial implementations of some tools – like one-way ANOVA or PCA and now even PLS – at least should be quite generic. (The environment of “home brew” implementations should be acknowledged, if only as a warning.)

That said, what constitutes “generic enough” will often be a value judgment… perhaps best arbitrated by our Journals?

Dave

For better or worse, even PLS probably isn’t generic enough to forego a reference. There are quite a few variants running around out there. Example: the current NIPALS/SIMPLS vs. Bidiagonalization and its affect on residuals debate.

[…] post I wrote on June 11, Referencing Software, resulted in a rather lengthy thread on ICS-L, the Chemometrics discussion list. Most discussants […]

Leave a response

You must be logged in to post a comment.

Categories