Machine-learning techniques used by thousands of scientists to analyse data are producing results that are misleading and often completely wrong.
Dr Genevera Allen from Rice University in Houston said that the increased use of such systems was contributing to a “crisis in science”.She warned scientists that if they didn’t improve their techniques they would be wasting both time and money. Her research was presented at the American Association for the Advancement of Science in Washington.
A growing amount of scientific research involves using machine learning software to analyse data that has already been collected. This happens across many subject areas ranging from biomedical research to astronomy. The data sets are very large and expensive.
But, according to Dr Allen, the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world.
“Often these studies are not found out to be inaccurate until there’s another real big dataset that someone applies these techniques to and says ‘oh my goodness, the results of these two studies don’t overlap‘,” she said.
“There is general recognition of a reproducibility crisis in science right now. I would venture to argue that a huge part of that does come from the use of machine learning techniques in science.”