The Google Scholar preprint bug redux

I don’t expect Google Scholar to ever fix the preprint bug.

Regular readers of my blog will know that I regularly complain about Google Scholar’s handling of preprints, see e.g. here or here. Well, this week, I had the opportunity to raise my concerns to Anurag Acharya, the co-founder of Google Scholar. His initial response and the subsequent discussion have clarified several things. We now know:

  1. The bug exists
  2. The Scholar team is aware of it
  3. They don’t know how to fix it
  4. They don’t think it’s a particularly pressing problem
  5. For any given paper, the problem will go away eventually, after several months or more

So what is this bug you’re talking about?

In a nutshell, for papers with a preprint, the bug will prevent the final, official journal publication to appear in the Scholar database, often for many months. If you search for the article by title, only the preprint version will show. If you search by DOI, nothing will show. Importantly, other articles from the same issue of the journal will all be properly indexed in Scholar, but the one article that happened to have a preprint will be missing.

Why should I care if the problem will fix itself eventually?

Anybody who would like to encourage more scientists to post preprints should care. And any junior scientist should care twice. This bug can have a very real effect on the career of junior scientists, by limiting their visibility or making them appear much less successful or competent than they actually are. Here are a few very real scenarios the bug can cause:

  • You know that John Smith posted an interesting preprint 2 years ago, and you wonder if that work was ever published. You search Google Scholar and only find the preprint. So you conclude the paper never saw the light of day or maybe is embattled in review. In truth, the paper came out 8 months ago in PNAS, but Google Scholar will hide that version from you.

  • You consider hiring a promising young scientist as a postdoc or maybe even a faculty member. However, as you pull up their Google Scholar profile, you notice that over the last two years they seem to have published only preprints. And several of the articles they list on their cv don’t show up in the Scholar database at all. You conclude the scientist is dishonest and you decline the application.

  • You post a preprint that contains an error. Thankfully, the error gets noticed in review and you fix it for the final publication (and/or post a new version of the preprint). However, Google Scholar keeps showing the old, erroneous version of the preprint, many months after the fix has been made. People keep reading the erroneous version and keep giving you grief over it.

  • An important paper in your field is published, and you would like to know about it. However, since the paper had a preprint, the official article is hidden from Scholar, and Scholar won’t notify you that it came out.

But it happens only very rarely, right?

That’s the stance of the Scholar team. It doesn’t mesh with my experience, though. Everybody I know who regularly posts preprints has been bitten by the bug. I cross my fingers every time I post one. And whenever I bring up this issue, some random person mentions that they have experienced the same. Also, my colleague Chris Adami just posted the following:

While this bug may be rare in some sense of the word “rare,” it happens frequently enough to be a real issue for real scientists and every-day users of Google Scholar.

Is there a workaround?

Not really. You can add papers manually to your Google Scholar profile, but that won’t make them show up in the search results. And they will also not be linked to the actual journal publications, a major drawback in my opinion. If you know of a preprint and are wondering whether it has been published or not, don’t check with Scholar. Check with some other data base, such as PubMed. Or just do a regular Google search. The preprint bug does not affect regular Google, which will find the papers that Google Scholar doesn’t know about.

I hope that the Google Scholar team will eventually realize that this is an important issue to get right. In the mean time, if you have been bitten by the bug, please let me know, so we can build a record of cases and demonstrate this is an important issue. And, if you’re looking for the official publications of long-standing preprints, look for them using regular Google, not Google Scholar.

Claus O. Wilke
Professor of Integrative Biology


comments powered by Disqus