Inspired by Michael Mitzenmacher's
There is a significant difference in citation patterns between STOC/FOCS and SODAThe plan:
First, we obtained the list of titles of conference papers appearing in STOC, FOCS and SODA in the last 10 years (1997-2006). We deliberately excluded 2007 because FOCS hasn't happened yet. We got this list from DBLP (Note: DBLP does not make any distinction between actual papers and tutorials/invited articles; we decided to keep all titles because there weren't that many tutorials/invited papers in any case).
For each title, we extracted the citation count from Google Scholar, using a process that we henceforth refer to as "The Extractor". Life is too short to describe what "The Extractor" is. Suffices to say that its output, although not perfect, has been verified to be somewhat close to the true distribution (see below).
The results, and methodology, can be found at this link. The tables and graphs are quite self-explanatory. All source data used to generate the statistics are also available; you are welcome to download the data and make your own inferences. We'll be happy to post new results here and at the webpage.
The main conclusion is that the hypothesis is valid: a systematic discrepancy between citation counts of SODA vs. STOC/FOCS does appear to exist. However, the discrepancy varies significantly over time, with years 1999-2001 experiencing the highest variations. It is interesting to note that 1999 was the the year when SODA introduced four parallel sessions as well as the short paper option.
Although most of the stats for STOC and FOCS are quite similar, there appears to be a discrepancy at the end of the tail. Specifically, the 5 highest citation counts per year for STOC (years 1997-2001) are all higher than the highest citation count for FOCS (year 2001). (Note: the highest cited STOC article in 2001 was Christos Papadimitriou's tutorial paper on algorithms and game theory). The variation between SODA and STOC/FOCS in the 1999-2001 range shows up here too, between STOC and FOCS themselves. So maybe it's just something weird going on these years. Who knows :)
Another interesting observation comes from separating the SODA cites into long and short paper groups (for the period 1999-2005). Plotting citations for short vs long papers separately indicates that the presence of short papers caused a net downward influence on SODA citation counts, but as fewer and fewer shorts were accepted, this influence decreased.
There are other observations we might make, especially in regard to what happens outside the peak citations, but for that we need more reliable data. Which brings us to the next point.
VALIDATION OF THE DATA:
To make sure that the output makes sense, we performed a few "checks and balances". In particular:
- we sampled 10 random titles from each of FOCS, STOC and SODA for each of the 10 years, and for each title we checked the citation count by hand. Results: there were 7 mistakes in FOCS, 9 in STOC, and 11 in SODA, indicating a current error rate in the 9-10% range.
- for each of FOCS, STOC, SODA, we verified (by hand) the values of the top 10 citation numbers, as reported by The Extractor
- we compared our stats for the year 2000 with the stats obtained by Michael. The results are pretty close:
|Median (Mike's/Ours)||Total |
A CALL FOR HELP:
Warning: the data displayed here is KNOWN to contain errors (our estimate is that around 10% of citation counts are incorrect). We would very much appreciate any efforts to reduce the error rate. If you would like to help:
- choose a "random" conference/year pair (e.g., STOC 1997)
- check if this pair has been already claimed in this blog; if yes, go to (1)
- post a short message claiming your pair (e.g., "CLAIMING STOC 1997") on the blog.
- follow the links to check the citations. For each incorrect citation, provide two lines: (1) paper title (2) a Google Scholar link to the correct paper
- Email the file to Suresh Venkatasubramanian.