Google Scholar Metrics for Publications
googlescholar.blogspot.comIt's interesting to note that the 5th highest ranked publication is arXiv. For those who aren't familiar with it, arxiv.org is an open-access repository of academic papers, mostly in quantitative science. In my field (computer science) it is standard practice to deposit a copy of one's papers in arxiv before submitting them for publication, and arxiv is the place to find the latest research.
There is currently a lot of hand-wringing in academia about open access publications. Everyone wants it, and it is trivial to switch a field to it (machine learning has done so, for the most part), but it requires the leaders in the field to lead the change and they are normally too invested in the status quo. What the high ranking of arxiv suggests to me is that while people maintain lip service to the idea that the (mostly closed) publications are important and the maintain the definite version of a publication, the reality is that no-one gives a damn and goes to arxiv when they want to read something.
It's still surprising that arxiv is that high up. You'd think most people would cite the definitive publications, not the preprints at arxiv?
Anyway, given that in some fields _everything_ that's written goes to arxiv, arxiv will [by definition][1] have a very high h-index. The thing is, the h-index was conceived to compare individuals, not journals.
Google Scholar also indexes papers that are only published on arxiv, which likely often include citations to other things on arxiv.
The question is how "serious" Google scholar rankings are going to be seen. In the past quite a few demonstrations have shown, that the Google scholar ranking/citation system can be gamed with relatively little effort. Let's hope that it will have more impact than those University rankings which calculate the scores based on aspects such as the inbound/outbound link ratio (as some web"metric" rankings indeed do).
So it looks like with this method, if a journal publishes more papers, this will give it more of a chance to boost its h5-index? This probably accounts for the high level of arXiv, and PLoS One beating out PLoS Biol.
One problem with impact factors is the way that a few articles can account for the majority of citations. For instance, a bioinformatics method that is widely used could attract thousands of citations, boosting the impact factor of the journal by a few points. This method doesn't solve this, as it expressly focuses on the top n articles and ignores the impact of the remainder. For instance, PLoS One's score of 100 is because the top 100 articles got 100 citations - it says nothing about the distribution of the rest.
That does seem to be the case. I'd be interested in some kind of median-citedness measure as well, to distinguish a venue that publishes 100 high-impact articles a year, from a venue that publishes 10,000 articles a year, of which 100 are high-impact.
In particular, it's not robust to one factor often mentioned in the bibliometrics literature, trivial changes in agglomeration size. Say a set of 200 articles are published by either: 1) a single journal; or 2) two journals, which publish 100 of them each. In each of the hypotheticals, individual articles have the same citation counts. Under this metric, #1 gets a higher ranking, meaning that you can raise rankings without increasing paper quality by just agglomerating journals. (You can even run the two former journals separately inside the new journal if you want, with a two-track review structure, as long as there's only one title on the front page.)
It's nice to see that Google is adding features to Scholar. There's concern in the library community that it will go away since its not a revenue producing service.
Incidentally, Microsoft Academic Search is pretty impressive so far. They've added many features. They also have an API that is pretty easy to use, which Scholar doesn't.
I just looked at Microsoft's offering, and while it has a lot of nice features it seems like the citation counts are wildly wrong when compared to the numbers produced by Web of Science and Google Scholar. Perhaps they have not completed their indexing yet. A few other concerns that I have besides inaccuracy are (1) The interface is significantly more cluttered and confusing, and (2) In their decision to auto-generate profiles, rather than wait for authors to create their own, you have a glut of profiles that again are incorrect in the papers that are attributed to them. The chances that a significant number of people are going to go in and curate their profiles seems small, instead of having a limited but accurate collection of profiles, you end of having a majority of incorrect ones.
The "citation context" for a paper in Microsoft's search is particularly nice. Instead of just a list of papers that cite a given paper, you can get little blurb excerpts of the sentences that cite it.
There are definitely things that skew the index that might not necessarily reflect the quality of the journal. For example, the 20th ranked journal by H-5 index is Nucleic Acids Research (NAR). However, when you look at the H-index articles for NAR, you see that they are dominated by articles announcing or simply cataloging an important database. These get cited very extensively, becuase anytime you use a database you need to cite it, but they aren't what I would call high impact research articles. NAR just happens to be a journal that has a special annual Database issue where bioinformaticists can drop an article describing their useful database.
EDIT: It would be fair to say that since a database is so widely cited it is important. So maybe the index is more robust than I originally considered. But something still seems skewed here.
Rob J Hyndman has a very nice review on Google scholar metrics [1]. Here is his ending quote:
In summary, the h5-index is simple to understand, hard to
manipulate, and provides a reasonable if crude measure of
the respect accorded to a journal by scholars within its
field.
While journal metrics are no guarantee of the quality of a
journal, if they are going to be used we should use the
best available, and Google’s h5-index is a big improvement
on the ISI impact factor.
[1] http://robjhyndman.com/researchtips/google-scholar-metrics/The only CS conference/journal I saw on the list was "IEEE Conference on Computer Vision and Pattern Recognition, CVPR". That's not the top CS venue I know of.
That conference probably has a lot of biology stuff going on and, if you noticed the list of journals, they're almost all biology related. CS is a relatively small field, in comparison.
Nope. CVPR doesn't have too much biology in it (it does have a little). It tries to do a bit of cogsci, but it's mostly ML applied to vision. It generally doesn't use applications in the medical field either (there are bigger conferences for medical imaging).
it is the top venue in the field of computer vision
Is there any reason to believe this h-index method of ranking is a good idea? Why not use PageRank?
Interesting... how is CVPR the only computer science-related publication in the top 100?