Gaming Google Scholar: Academic Search Engine Spam

Gaming Google Scholar: Academic Search Engine Spam

Image of the website, Google Scholar.

There has been no shortage of posts in this space devoted to discussions of open access educational resources and the possible future(s) of web-based academic publishing and research.  A recent study detailed in a paper appearing in this month's Journal of Electronic Publishing peers into that digital future and finds..."academic search engine spam"--a neologism that is strange, funny, and just may be portentous.

Authors Joeran Beel and Bela Gipp have been studying Google Scholar's ranking algorithm for some time, and the duo published an article earlier this year investigating the possibilities of what they called "academic search engine optimization" (ASEO).  In that paper (which is also well-worth reading), they advised scholars on “[...] the creation, publication, and modification of scholarly literature in a way that makes it easier for academic search engines to both crawl it and index it.”  Not surprisingly, the paper and the very idea of ASEO sparked a controversy in the academic community.

In the opening of their new article, Beel and Gipp note that the ASEO study elicited a varied response, and the new study promises to do likewise.  The authors followed up on the guidelines they laid out in the ASEO paper to test the degree to which the ranked results of academic search engines (primarily focusing on Google Scholar) can be manipulated through altered citation counts, keyword padding, and the inclusion of invisible text within academic papers.  And in what has to be a first in academic research, they even manage to get Google Scholar to link to a doctored version of a research paper containing an ad for Viagra.

Accordingly, Beel and Gipp find that academic search engines can be gamed and that it isn't even terribly difficult to do so.  The threat, in other words, is real, and in their concluding discussion, the authors recommend that Google Scholar and other engines "should apply at least the common spam detection techniques known from Web spam detection, analyze text for sense-making, and not count all citations."  More provocatively, they aver that "the potential benefits of academic search engine spam might be too tempting for some researchers."  In a paper likely to spark considerable discussion, that's the sentence that will provide the most tender.

Interestingly, Beel and Gipp are at work on their own, hopefully robust academic search engine, Sciplore. And in case you were wondering: this author can confirm that all of the keywords for the pair's recent article are indeed legit.