Back to Archive
Podcast

Raising Quality Standards for the Scientific Record

Gustav Nilsonne
Karolinska Institutet

Gustav is a neuroscientist at CarlinzkaInstitutants.com, one of the leading medical research centers in the world. His research looks at sleep and communication between the brain and immune system, and Gustav also does research in meta science on the transparency, reproducibility and openness in science. Gustav was the inaugural speaker in the future of science seminar series where he talked about research objects and pathways towards community-based, transparent and systematic quality control of science.

What is metascience and how did you become a metascientist?

I've recently started calling myself a metascientist, which is essentially the study of science itself. Our focus is on research outputs, such as research articles published by scientists. The activities we engage in include examining the transparency of research, assessing data accessibility, reproducing results, verifying consistency, and exploring the impact of different assumptions, among other things. Meta-science has kind of come together as a movement in maybe the last 10 years or so. Now there are societies, conferences, and a few journals.

Of course, people have been looking at science much longer than that. There are individuals throughout the centuries that we could today call meta-scientists, even if they didn't think of themselves that way. There are some people in the field of science and technology studies who seem to be a bit upset that a bunch of other people have now arrived with quantitative methods to study science, thinking they're reinventing the whole thing from the ground up. For example, the evidence-based movement in medicine, starting in the 90s, had many characteristics of meta-science. There are some individuals who we now mainly think of as statisticians, but they had quite a broad view of the practice of science, like Cohen and Fisher.

For me, I have always been interested in how and why we can trust the findings we make in science. And that interest has been with me ever since I started. I had a crisis of confidence myself around the time I finished my PhD, which caused me to switch fields. It was only after that that I started doing neuroscience. However, originally I was conducting research in experimental pathology, specifically malignant mesothelioma. I was studying cell cultures and tissue samples, among other things. I had a kind of reckoning with my own perception of the field and my own work as well, which seemed to be a bit of a house of cards by the end of my PhD.

Your research led you to seeing the Triple Crisis in academic publishing, what is the Triple Crisis?

The triple crisis is something I've written about with some colleagues beyond Bramson others. We had a paper called Replacing Academic Journals. We talk about three crisis of academic publishing. A crisis of functionality, a crisis of Affordability and a crisis of reproducibility.

And what the crisis of functionality refers to is that academic publishing is still stuck in the same format as it was back when journals were printed on paper. We have largely failed to make use of the possibilities that we now have on the internet and digital technologies.

Secondly, much of what we publish gets locked behind paywalls, and traditional publishers increase their prices at alarming rates. So that's the crisis of affordability.

And then there's the crisis of reproducibility, which is related to what we were talking about a moment ago with the development of meta-science as a field. There has been talk of the reproducibility crisis for about a decade or so. It's a growing realization that when we actually try to reproduce scientific findings in many fields, it turns out that many of them can't be replicated or reproduced, even if we try as faithfully as possible. These large-scale systematic attempts to reproduce findings are actually quite recent. They've only been going on in the last decade.

For these large-scale replication attempts none of them have been highly representative. Some of them have chosen only impactful findings and others have chosen partly based on convenience findings that were possible to replicate out of a certain sample. So I don't think we know the answer. And I also want to acknowledge that it's entirely possible that meta-science has its own biases. And sometimes people accuse meta-scientists of trying to find low reproducibility because it gives more attention.

To set a baseline of understanding - what would a ‘healthy’ reproducibility rate look like?

You can think about this in terms of hypothesis testing. If you have a framework for hypothesis testing according to the traditional null hypothesis method, then the proportion of findings that come out positive depends on the statistical power that you have, which is to say how big the study is in relation to its design. Another factor is how probable the hypothesis is beforehand, the prior probability.

So if we have studies with 90% power, which is way above where we're operating right now, and 100% prior probability, then we can reach 90%. That's basically the only way. Or we can have 100% power. But then we would publish maybe one study a year in the whole field. So I don't think we're going to achieve such high reproducibility rates.

In addition, we have publication bias. The proportion of findings in the literature that support the author's hypothesis is about 90%. That immediately tells us that something is missing. There is some kind of bias where the negative findings either go into the drawer or somehow the results are analyzed until a positive finding can be found. So I'm not at all surprised that we have found reproducibility rates that have been much lower than 80 or 90%. For example, in the Reproducibility Project Psychology that I think we've both been alluding to, which came out in 2015, we found that about one third of the 100 papers in experimental psychology successfully replicated.

When scientists always publish positive results, it means that we have no way of knowing which of the positive results we can actually trust. This is not entirely intuitive, perhaps, but it's a major problem for humanity. I mean, how can we make rational decisions in society if we can't trust the scientific record?

How well known is this problem?

Sometimes I feel like I live in a different world from some colleagues. I meet many colleagues who are highly attuned to these problems and are trying to solve them, partly because that's the kind of circles I'm moving in now. But I also meet many colleagues who don't think it's a problem, and also quite a lot of colleagues who think there are significant quality problems but conceptualize them in an entirely different way. For instance, some colleagues believe that much of science is superficial, and that's why we need to rely on high-prestige journals. However, I don't think that's the case.

The high-prestige journals, such as Nature and Science in the field of science, the New England Journal of Medicine in medicine, and others in economics, are highly selective. It's very difficult to get published in them. However, if you succeed, it can be extremely beneficial for your career. For young scientists, it can make or break their career. It seems that many of my colleagues believe that the high selectivity of these journals guarantees quality. Therefore, you need to persuade the editor and a set of reviewers that your work is not only good but also of very high quality and relevance to the field. However, there are some issues with this approach.

What you often find in these journals are flashy results that could make the news or somehow overturn conventional wisdom. This means they are inherently riskier with a lower prior probability. Additionally, the quality control inherent in the traditional peer review system is quite unsystematic. Usually, the process consists of sending the paper to peers with minimal instructions for review, and then they return a review eventually, perhaps sooner or later. It's surprising to many people outside of science that the whole process is on a voluntary basis, with reviewers asked to review for free. This often means it takes a lot of time, and reviewers do it in their spare time, maybe in the evening or on the weekend. The point I'm trying to make is that reviewers perform the review in a narrative manner, using their expert opinion to determine if the findings are supported by the data. However, they rarely go and look at the primary data or the code used for analysis, or try to reproduce the findings by running the code on the data or performing the same operations.

What is the alternative to the ‘novelty’ criteria of these journals?

We need to establish a systematic and transparent quality control process so that when readers examine scientific outputs, they can determine the procedures followed in reviewing this interoperable research object. By research objects, I mean things like scientific datasets or any digital resource resulting from research that can be published online. Therefore, I propose that we develop community-based procedures and criteria for ensuring quality. We can start with the basics, such as verifying the existence of a research dataset and checking if it has the correct number of observations. Additionally, we should ensure that the observations fall within the expected range dictated by nature. For example, it is highly unlikely that participants' weight would reach 800. Something is likely incorrect in that case.

Some criteria of quality are related to study design. There are well-known tools for assessing the risk of bias that are used, for example, in systematic reviews. In these reviews, you can examine factors such as blinding and randomization in experiments conducted in medicine. You can ask questions like: Was the blinding successful for both the participants and the researcher? Was the randomization process described in detail? These are typical aspects of study design that contribute to quality.

Additionally, you can analyze how the study was conducted and how the data were analyzed. Were the data analyzed according to a pre-specified plan? Was the main outcome pre-registered? This is important because sometimes researchers measure multiple outcomes and then choose the one that suits their preferences, which can compromise the integrity of the study. Pre-registration helps to prevent this bias.

Another measure of quality is whether the reporting aligns with the pre-registration. It is also important to assess if the results can be reproduced and if they withstand challenges to the assumptions made during the analysis. There are more aspects to consider, but these are some key points.

To get here, innovation is needed and it has been sluggish. It has been incredibly slow in the realm of traditional publishing. There are some journals that perform computational reproducibility checks, but to a large extent, these things that I'm talking about have not been implemented. And I think it's because of misaligned incentives, as the journal editors have the opportunity to allocate prestige and any systematic procedures interfere with their latitude in doing so. I speculate that this misaligned incentive is holding back innovation. Of course, I'm very excited about new disruptive publishing platforms that could free us from the dependency on traditional journal publishing. And of course, DeSci is one such platform that I'm very excited about.

How would ‘Interoperable Research Objects’ serve as a solution?

The scientific paper, as we know it is static. It typically has the form of a PDF. The contributions are in the form of authorship, which means that a scientist is an author and an author is a scientist. But at the same time, it's not possible to tell usually who contributed what in a paper. And the paper itself doesn't contain in most cases all the information that we would need in order to know how the work was done and how we can build on it or reproduce it. So what I was talking about in the talk was to imagine that instead of the paper we have digital research objects that can contain some of the things that are in the paper, but also other things.

So you could imagine, that if I run a study, I started by writing a protocol. And then I posted publicly. Then I have information to my participants. I post that publicly. I have a data management plan. I post that publicly. And it says where are the data are going to go? Where the analysis code is going to go? Then the data will go in that repository. These objects will be connected through digital identifiers. Then I can have code in another repository, which operates on the data and generates results that anyone can execute and demonstrate that they can get the same thing.

What are the barriers to having a system where we have multiple different research objects?

Much of the technology exists. I can post things online in repositories that cover most of these scenarios. Maybe not 100%, but we're well on the way there. The major problem in my opinion is that it's hard to get credit for publishing research objects outside of traditional journals. And that's the factor that locks us in more than any other factor to the traditional publishing system. I need to throw the journal publications in order to further my career. At least that's the general perception. If I publish research objects in some other way, perhaps it's useful or beneficial, but it's hard for me to claim credit in the recognized system of merits.

And this is highly pernicious, it forces scientists to write papers for the sake of writing papers. Instead of for the sake of communicating important scientific outputs. So how do we get out of that? So it seems like that these incentives are so deeply ingrained in the academic system, right? So it's not, I wouldn't even just blame this on the journal editors or the publication industry.

To a similar extent, it's the academic community itself that has created all these quantitative benchmarks to evaluate research or productivity. And that has trickled down very deeply, influencing what people do on a day-to-day basis. So, I observed this at Rasmus University when I started there. They introduced a point system to evaluate researchers, allocate research budgets, and determine eligibility for promotion. Ultimately, this meant that a committee discussed what the formula should look like. However, once the formula was finalized, all promotion and evaluation decisions could have been made by a computer or a cleaning lady.

There was little influence of human judgment or actual expertise, or actually engaging with the research. There was a strong belief that we have quantitative indicators of research or productivity that are so robust and great that we can entirely rely on them to steer our entire system. And while that has certain convenient advantages, I could also see that once that system was introduced, it clearly influenced what people were doing. So what types of studies they started to engage in, how they shifted their priorities, and how they also started allocating more and more time towards the few projects where they thought they actually had a shot at getting into these higher backgrounds, especially the ones that were flashy and potentially attention-seeking. So yeah, you could really see that the entire thing trickled down. So how do we change that? How do we get out of that? How do we improve that?

What are the least quality controlled aspects of science?

When you look at empirical research or computational research, it's probably the underlying research artifacts themselves, specifically the code. So I don't think that anyone, with very few exceptions, but the vast majority of referees or evaluators, never go to the actual origin of the empirical results, which is the code and the data on which the code runs.

I want to nominate another thing, which is our citations. Many citations are simply missed citations. Or if they're not, they can be done for a variety of reasons. For example, it might be something that the writer was thinking about and they used it as an argument to support their point. Or they may want to show their allegiance to a specific group of scientists or something similar. Yeah. The argument I'm trying to make here is that citations are a weak measure of impact.

They measure to some extent whether you gain traction and the academic community you're addressing. Some papers at some point become staples in a particular field. And if you don't cite them in your own work within that field, you're missing an opportunity to signal that you are actually a member of that field. As an expert, you're supposed to know all the key works that have been done. And then you cite them.

Going back to what you were asking. Yeah. How do we move away from citation-based metrics to a better form of appraising quality? Well, one way is to rely more on experts' judgments. That is, having an expert actually read the work. But it's expensive and time-consuming. And you can't read, I mean, if, let's say I'm trying to hire you or you're trying to hire me, it would take far too long to read through our entire body of work. Passable. I wouldn't want to read all my own stuff. And it's way too much. Like these 200 pages of supplementary information are like, who the hell is supposed to read that. Yeah. So we need some kind of quality control that is recognized and portable. So I mean, I would love to be able to send my work to some kind of independent body that could look at it according to certain standards and say, OK, here are the criteria that you meet. Your study is well designed, well executed, the data are present, the results are reproducible, or whichever standards we choose to have for the particular discipline or subfield. I want to try to make this happen somehow. I think that's the way to move away from the reliance on journals instead of trying to break down the current model to build an alternative.

How much hope do you have that we can improve this system?

I think the future is very bright. There are many people working on solutions. And there is an increased uptake of practices that I like to see, like, pre-registration and data ferry. We need to find an alignment on the incentives. And I see lots of work happening both top down from large funders like the European Commission and bottom up from individual scientists from entrepreneurs, from communities. So I'm quite hopeful, actually.

Philipp:  I'm an empirical realist. So I tend to be neither particularly optimistic nor pessimistic. I'm very much driven in my judgments by just data and evidence. And I completely agree with you that we've seen a lot of very encouraging developments in the last couple of years, including the uptake of the open science movement, a growing recognition of the importance of replicability, the growing interest of people in understanding what the incentives are that we're operating under and what the consequences are. So all of that are good signs. So whether that's enough for us to actually escape from the traps of the prestige economy that we're currently in, I don't know that. But I really, really hope that you're right. We're definitely working in that direction. So we're putting all our weight into this and trying to make our small contribution to actually improve in that. But yeah, so I'm very glad that you started to give us an insight into what's actually going on behind the curtain and signs and what are some of the problems and then show us some potential ways how we can improve that.