Building Scite

Josh Nicholson started as a PhD student thinking about the problems in Science. He began by reading about the replication crisis and explored almost every route to resolving it. Peer reviews, publishing grey literature, suggesting to his advisor that they start a pre-print server, micropublications, and, finally, creating whole new metrics.

After all this exploration, building The Winnower and working with researchers across meta-science, he built Scite. This seminar is the story of Scite.

The idea behind Scite started with a business competition. Josh proposed to create a pre-print server that combined pre-prints with what he described as the R-factor or reproducibility factor. The judges turned him down - they didn’t think there was a valid market for this pre-print server.

Nevertheless, the concept of the R-factor stayed with him. It was simple: it took the most used metric - citations - and made them mean something. It quantified the reliability of the results based on the replications of these studies. For example, if there were ten reproductions with 8 in agreement, we would have an R factor of 0.8.

While the concept was simple, implementing it was not. How could you tell among the 10s, 100s, or sometimes 1000s of citations which were actual replications instead of simple mentions? Then add on the work of determining which replications supported the research and which contradicted it.

Imagine doing that for the entire research corpus.

Out of this core struggle - Scite was born. Josh asked, “if we can manually determine which citations are relevant, could we automate it?”. After talking to machine learning specialists, finding co-founders, and partnering with grobid, they created a tool that separates citations into three buckets: mentions, supporting, and challenging.

The tool takes you from seeing that there are citations to knowing what those citations mean. You can click on each citation type and see the citation context. This tool allows researchers to trace when study results have successfully replicated, failed to replicate, or when they have not been checked.

This critical solution gives us the tools to build a new validation system. Scite is working on expanding this validation beyond academia into Wikipedia and the public to give readers a clearer view of the reliability of the ‘science’ in media. They have a retraction bot that checks references and identifies if they have been retracted, ensuring researchers do not build on unvalidated research. Finally, they are working on ‘lighting up’ research. Instead of just seeing the citations for a paper, what might it be like to see the supporting and challenging arguments for an idea or hypothesis?Scite aims to take us from knowing that there is a conversation around a paper to knowing the valence of the conversation, the different possibilities for interpretation, and the reliability of the research. Beyond that, it is a tool to make research more reliable and informative.

Next-Generation Citations

Josh Nicholson

Scite

Building Scite