Saturday, June 7, 2008

Citations and other animals

There was a time when a scientist was evaluated on the basis of his or her (mostly his, in those patriarchal days) published papers. Then, as science, and in particular my field, physics, became specialised, it became necessary to check a person's citation index to gauge the depth and importance of the work. The other advantage of citations was that it absolved the evaluating authority of the necessity to read or understand any of the papers of the person being evaluated (for appointments, promotions, awards etc. ). After all, in this dog eat dog world, if you could reduce a person's worth to an integer, (or fractions if you are looking at averages) what better way to make immediate comparisons? (Incidentally, a quirk about average citations is that if a person publishes a couple of highly cited papers and then goes more or less into hibernation, then the average citation can become very large due to a small denominator!!). Unfortunately, it soon transpired that in certain fields (such as String Theory for example) everyone quoted everyone else ("There has recently been a lot of activity [1-57] in ...") resulting in overall high citation indices, and it became necessary to fine tune the idea of citations. Thus was born the h index developed by Jorge Hirsch (the original paper is here). A researcher with an h-index of, say, 9, indicates that he or she has published at least 9 papers, each of which has been cited 9 or more times. There are of course no prizes for guessing that the physicist with the highest h-index is the string theorist Edward Witten of the Institute for Advanced Study in Princeton, who has an h-index of 110, which implies Witten has published 110 papers with at least 110 citations each. Other highly ranked physicists include: Marvin Cohen (94), a condensed matter theorist at the University of California at Berkeley; Philip Anderson (91), a condensed matter theorist at Princeton University; Steven Weinberg (88), a particle theorist (and Nobel Laureate at the University of Texas at Austin (more on him later); and Michael Fisher (88), a mathematical physicist at the University of Maryland (88). According to Hirsch a "successful scientist" will have an index of 20 after 20 years; an "outstanding scientist" will have an index of 40 after 20 years; and a "truly unique individual" will have an index of 60 after 20 years. Moreover, he goes on to propose that a researcher should be promoted to associate professor when they achieve a h-index of around 12, and to full professor when they reach a h about of 18. Of course the usual qualifications apply - it's different for different fields, there are always exceptions, (Feynman, Einstein?) but I am sure there are places which use the h-index in many (un)healthy ways. It would, of course, be interesting to see how these numbers stack up in the Indian context. One could use the ISI Web of Knowledge to get the h-index of a person, and perhaps Google Scholar, though I haven't done this exercise yet. Now comes the most recent development in this field (if it can be called a 'field'). This is the w index or Wu index, developed by Qiang Wu from the University of Science and Technology of China in Hefei. The w-index (or the 10h index), indicates that a researcher has published w papers, with at least 10w citations each. A researcher who has a w-index of 24, for example, means he or she has 24 papers with at least 240 citations each. According to Wu, the index is a significant improvement on the h-index, as it “more accurately reflects the influence of a scientist’s top papers”. Again, no prizes for guessing Ed Witten from the Institute for Advanced Study in Princeton, who has the highest h-index, also comes top in the w-index ranking with a score of 41. Witten is followed by condensed-matter theorist Phillip Anderson at Princeton University, with a w-index of 26, and cosmologist Stephen Hawking at Cambridge University coming third with a w-index of 24. Particle theorist Frank Wilczek (Massachusetts Institute of Technology) and Marvin Cohen (University of California, Berkeley) are joint fourth with a score of 23. While Witten, Anderson and Wilczek also took three of the top five slots in the h-index ranking, the big winner under the new criterion is Hawking, who has a relatively modest h-index of just 62, compared to Witten's score of 110. Again, according to Wu, .. a researcher with a w-index of 1 or 2 is someone who "has learned the rudiments of a subject". A w-index of 3 or 4 characterizes a researcher who has mastered “the art of scientific activity”, while "outstanding individuals" are those with a w-index of 10. Wu reserves the accolade of "top scientists" to those with a w-index of 15 after 20 years or 20 after 30 years. It's not clear to me that the w-index adds anything more to a person's reputation, since, as I pointed out earlier, it is just a 10h index. Presumably top-cited papers get slightly better billing in this counting. One presumes that as long as these indices are not taken too seriously, and exclusively, it is a pleasant Sunday afternoon exercise to browse and calculate the various indices for one's friends and enemies. It would be worrisome and a travesty, though, if a person's contribution to the world of science were to be reduced to a bunch of (in this case) integers. One aspect that I cannot help commenting upon are the relative h indices of Weinberg and Witten, both from the field of High Energy Physics (HEP). Weinberg is one of the authors of the Standard Model of Particle Physics, for all intents and purposes the theory of nature. The Standard Model is text book material and is the cornerstone of almost all mainstream Particle Physics activity today. As a result his original paper is rarely referred to, just as nobody quotes Einstein's 1905 paper when discussing relativistic transformations, or Feynman's paper when using Feynman diagrams, even though the Weinberg paper itself holds the record for the highest number of citations in HEP (> 6500). Ed Witten, a brilliant theorist from Princeton (and a Fields' medallist), works in the more esoteric field of string theory, which while contributing much beautiful mathematics and mathematical techniques useful in other branches of physics, has yet to prove itself relevant to the real world of elementary particle interactions. However, Witten's papers have had enormous influence in the development of string theory, which is the reason for his high index value. (Amusingly, (for non-physicists), there is also a Witten index though this is a pure physics quantity, nothing to do with the indices we are discussing). I should mention here that Witten has several highly cited papers in other areas of HEP - Skyrmions, Anomalies, 1/N, chiral symmetry breaking, supersymmetry etc. but he is of course best known for his work in string theory. He also holds the record for the top cited author (63958 as of today) compared to Weinberg (a mere 33712). But these numbers are more misleading than the other indices. For example, D. V. Nanopoulos has more total citations (27545) than David Gross, Frank Wilczek, Gerard 't Hooft and many others.

10 comments:

Anant said...

Think of poor Euclid. His h index can never exceed 13.

Rahul Basu said...

Of such examples there are plenty. I am told Feynman had just 24 papers (check SPIRES and leave out his conference talks). Which, therefore, is the upper bound on his h index.

AMOK said...

Is there an error bar, or resolving power, for these indices? Known limitations and boundaries of applicability? As the eminent bloggers note, it seems flawed for small N. Perhaps it is only valid when the number of papers approaches the Avagadro's number.

mekhala said...

I am curious which Indian scientist has the highest h/wu- index (and what is that)?

Rahul Siddharthan said...

Do the h-index or w-index account for number of authors on the paper being cited? That is, if you're author number 375 on a paper with 573 authors, does that count as much as a single-author paper with the same number of citations? (Being lazy, and also rushed at the moment, I haven't read your links.)

kapil said...

Note that "impact factor" and "citation indices" were originally invented by journals as a way to advertise their quality.

These and other gems can be found in the Wikipedia article on the topic.

Now comes the most recent development in this field (if it can be called a 'field').


There _is_ an entire field of "scientific research" devoted to measuring scientific activity!

as science, and in particular my field, physics, became specialised, it became necessary to check a person's citation index to gauge the depth and importance of the work.


One point repeatedly raised in favour of such numbers is the perceived "objectivity" of using them as opposed to the supposed subjectivity of those who read and evaluate the work.

The relentless pursuit of awards and promotions and lists of excellence by academics is pretty ridiculous to someone on the outside. So it is probably only appropriate that "funny science" like "citation index" is used in the process.

AMOK said...

There is convincing evidence now (this blog) that the h-index can be a replacement for the Google PageRank patented algorithm. It would allow us to find the top scientists quickly and accurately and not subject to any debate.

Conversely, one could apply the Google PageRank algorithm instead of the h-index. Each citation is a link and the set of the author's papers is a Page. This would be more interesting, provided the universe of pages was restricted to a defined set of authors. It would bring a real-world effect into science as Google search is found to be widely useful and its application to ranking scientists would be apt. What is YOUR Google ScientistRank?

Rahul Basu said...

Regarding Rahul Siddharthan's question, I am not sure of the answer. This is very pertinent though in HEP experiments. A discovery paper which includes a 1000 names say, (a Higgs discovery paper will probably have many more) will reach the top of all citation lists very fast. But does that number apply to all the authors or it is divided by the number of authors. Both, according to me, are unfair.

Rahul Basu said...

To continue on the above theme, Hirsch himself recognises the pitfalls of computing h-indices for people involved in large collaborations like HEP experiments. He suggests, (rather weakly in my view) that one compare different individuals to normalise h by a factor that reflects the average number of coauthors. But without a clear algorithm to do this, this is not very useful.

Rahul Siddharthan said...

I suppose the point is that no such system can be used to compare across fields. If you are comparing two people in the same field, who both tend to publish with many co-authors, that is fine. To compare a mathematician with an experimental HEP physicist, using such numbers, is not such a great idea.

By the way, Google Scholar says Hirsch's paper has been cited by 233 people and many of those who cite him have dozens of citations themselves. So this meta-scientific field may be a good way to boost one's citation count.