Comments on The Geomblog: STOC/FOCS/SODA: The Cage Match (with data!)

A fully automatic extractor would be nice to the c...

2007-08-31T11:54:00.000-06:00

A fully automatic extractor would be nice to the community (very much like EasyChair).

I guess this whole idea is from the database area, i.e., they give best paper award to the mostly referred paper at the top SIGMOD (?) conf exact 10 years ago.

With these citation statistics, maybe we can change the atmosphere a bit in the theory community. (Personally I have heard too many complains "that guy got another so-so paper into STOC".) These kind of data will certainly be useful for tenure/promotion purposes, especially for the promotion as one needs time to accumulate citations. For instance, if one only got 2 FOCS/STOC/SODA papers at the time of promotion, under the current situation, it might not be enough (people can easily say "he hasn't made enough contribution as he hasn't got many FOCS/STOC/SODA papers". But if both of the papers have high citations, then the candidate can easily list this as a fact, e.g., "both of my 2 FOCS/STOC/SODA papers were cited over 70 times and are among the top-10 mostly cited papers at that year's FOCS/STOC/SODA".

At the end of one's career, it is the 2-3 best papers which decide one's statue. Period.

Hi, The "grand plan" would be to exploit the know...

2007-08-31T08:19:00.000-06:00

Hi,

The "grand plan" would be to exploit the knowledge that we gained in the process, and construct a *fully automatic* extractor - then no human volunteers would be needed. Right now the Extractor utilizes a combination of automatic crawling and human help (and perhaps even some goblin magic, who knows), so it does not scale that well.

A fully automatic extractor (with low error rate) seems doable. The only problem is that it would require a fair amount of perl hacking, and so it has been relegated to the "to do" list for now. But if anyone was interested in doing this, we would be happy to provide guidelines.

Cheers,

Piotr

I need volunteers if I were to do this for SoCG. i...

2007-08-31T01:13:00.000-06:00

I need volunteers if I were to do this for SoCG. it was hard enough doing it for these conferences :).

also, I think SoCG is too small a community for the results to be meaningful in comparison to the "big three", so I'm not sure what the exercise would achieve.

Just saw this today. Very interesting.I think all ...

2007-08-30T22:06:00.000-06:00

Just saw this today. Very interesting.

I think all top theory conf should have some statistics like that.

IMHO, with these data, we should really look "# of highly ranked papers in STOC/FOCS/SODA" instead of "# of papers in STOC/FOCS/SODA".

Suresh: try to do this for SoCG, I am sure you will get lots of help.

Top ten papers by citation count in STOC 2003:145 ...

2007-08-21T14:21:00.000-06:00

Top ten papers by citation count in STOC 2003:

145 A tight bound on approximating arbitrary metrics by tree metrics.
95 Exponential algorithmic speedup by a quantum walk.
65 Better streaming algorithms for clustering problems.
63 Simpler and better approximation algorithms for network design.
61 Cell-probe lower bounds for the partial match problem.
54 Near-optimal network design with selfish agents.
53 Pricing network edges for heterogeneous selfish users.
52 A stochastic process on the hypercube with applications to peer-to-peer networks.
48 Optimal oblivious routing in polynomial time.
48 Hidden translation and orbit coset in quantum computing.

Top ten papers in SODA 2003:

189 Skip graphs.
162 Data streams: algorithms and applications.
101 The similarity metric.
87 Comparing top k lists.
73 Simultaneous optimization for concave costs: single sink aggregation or single source buy-at-bulk.
64 Packing Steiner trees.
63 An approximate truthful mechanism for combinatorial auctions with single parameter agents.
61 Computing homotopic shortest paths in the plane.
58 Dominating sets in planar graphs: branch-width and exponential speed-up.
54 High-order entropy-compressed text indexes.

From which we can clearly deduce that ... erh... SODA papers have shorter titles on the average. Any other conclusions that jump out?

it might also be interesting to look at how many c...

2007-08-02T09:24:00.000-06:00

it might also be interesting to look at how many citations come from papers that weren't themselves in STOC/FOCS/SODA ... how much bathwater drinking is going on? i know that at least for my own field(s), the answer is probably: a lot. you probably couldn't do this with things more recent than, say, the early 2000s, since percolation takes a while.

Claiming SODA 2001Piotr

2007-07-31T09:22:00.000-06:00

Claiming SODA 2001

Piotr

Claiming SODA 1999Piotr

2007-07-31T04:25:00.000-06:00

Claiming SODA 1999

Piotr

There are a number of factors in these numbers.One...

2007-07-28T06:45:00.000-06:00

There are a number of factors in these numbers.

One factor may be that the summer is when most people are not involved in courses giving them more time for research which then gets written up in time for STOC. (This is different from the "nobody is around to write things up in the summer" comment about SODA.)

Some things are just 'random'. For example, Irit Dinur's paper on the new proof of the PCP theorem, which won the best paper award at STOC 2006 and will likely be heavily cited in future, would have appeared at FOCS 2005 except for the fact that she was on the FOCS PC that year.

About the journal vs conference versions:1) Google...

2007-07-27T15:50:00.000-06:00

About the journal vs conference versions:

1) Google Scholar typically does collapse such papers into one entry. Thus, the Extractor did not take this into account.

2) However, the results are being currently verified and corrected by humans (see CALL FOR HELP), who take this issue into account, to the best of their abilities.

Piotr

Claiming STOC 1999Piotr

2007-07-27T15:45:00.000-06:00

Claiming STOC 1999

Piotr

When the journal version of a paper appears, peopl...

2007-07-27T14:56:00.000-06:00

When the journal version of a paper appears, people usually cites the journal version. So I am wondering if this has been taken into consideration ...

Of course, a lot of papers never have its journal versions, especially those appeared in FOCS/STOC.

As said by someone above, Jeff's counts of the H-i...

2007-07-27T11:18:00.000-06:00

As said by someone above, Jeff's counts of the H-index, supports my theory I mentioned earlier.

I think the 'larger conference' argument breaks down a little when you look at the *total* number of citations to all papers per conference (over all years): SODA loses out to STOC on this measure.

It depends how you define "break down". I think the total number of citations for all three are pretty close; SODA is about 12% behind STOC and is 15% ahead of FOCS. These numbers aren't very significant IMHO. You could argue though that a more accurate comparison would be to take only the count of top 70 or so papers of each of these three. My guess is they won't be too far off either.

What I don't know is this: what is a statistically...

2007-07-27T10:20:00.000-06:00

What I don't know is this: what is a statistically sound way of comparing medians that corrects for sample size ?

As far as I know, this is typically done using bootstrapping.

Anyway, it seems like it created some confusion. S...

2007-07-27T07:33:00.000-06:00

Anyway, it seems like it created some confusion. Sincere apologies.

That explains it, good. It really made no sense otherwise. This goes to show the value of sanity checking experimental data.

How about the following algorithm:Look, papers 1-7...

2007-07-27T07:31:00.000-06:00

How about the following algorithm:

Look, papers 1-70 in FOCS surely will beat papers 70-140 in SODA. SODA would have to be a conference with much higher prestige than FOCS for that not to be the case. This is something that not even the most ardent proponent of SODA has ever claimed. The claim, at least as relayed by Lance a while back, is that SODA has joined (not surpassed) STOC and FOCS as one of the big three.

What I don't know is this: what is a statistically...

2007-07-27T00:39:00.000-06:00

What I don't know is this: what is a statistically sound way of comparing medians that corrects for sample size ?

How about the following algorithm:

bestchoice=median;
For (every other justfiable measure m)
{
Compare citations counts for FOCS, STOC,SODA using measure m.
If (sodaperformance[m] better than sodaperformance[bestchoice]) { bestchoice=m}
}

Next forget everything done so far and proclaim that bestchoice is a good measure of typical quality. Then compare the three conferences with respect to bestchoice, and see what results you get.

Claiming FOCS 99.Sudipto

2007-07-26T20:50:00.000-06:00

Claiming FOCS 99.

Sudipto

Claiming STOC 1998Piotr

2007-07-26T19:55:00.000-06:00

Claiming STOC 1998

Piotr

Hi, Responding to the last 3 posts about the tota...

2007-07-26T17:01:00.000-06:00

Hi,

Responding to the last 3 posts about the total number of citations: the actual numbers are:

FOCS: 20427
STOC: 26481
SODA: 23517

These numbers were not posted anywhere earlier. I just computed them for the data available on this blog.

HOWEVER, it looks like we've got an unfortunate typo: in the table comparing our findings with Mikes, for the year 2000, there is a random phrase "over 10 years". This phrase SHOULD NOT HAVE BEEN THERE, it is a leftover from the old version of the post, and it does not make sense in the context. It will be deleted whenever Suresh comes on-line, or when I learn how to edit blog posts.

Anyway, it seems like it created some confusion. Sincere apologies.

Piotr

STOC might be competing with more conferrences tha...

2007-07-26T16:33:00.000-06:00

STOC might be competing with more conferrences than FOCS. For example, STOC and SoCG deadlines are identical, and the better geometry papers are usually submitted to SoCG. Indeed, FCRC include STOC, implying that there are a big number of conferrences with similar deadlines...

SODA loses out to STOC on this measure.True, but s...

2007-07-26T16:18:00.000-06:00

SODA loses out to STOC on this measure.

True, but so does STOC to FOCS, even though STOC is larger.

I really don't understand how come STOC has less total citations than FOCS, even though they seem similar in quality and STOC has the advantage in size... the mistery deepens.

I think the 'larger conference' argument breaks do...

2007-07-26T16:02:00.000-06:00

I think the 'larger conference' argument breaks down a little when you look at the *total* number of citations to all papers per conference (over all years): SODA loses out to STOC on this measure.

Thanks for the stats, Jeff! One factor that came u...

2007-07-26T15:40:00.000-06:00

Thanks for the stats, Jeff!

One factor that came up in the context of STOC vs FOCS comparison is that, in any given year, STOC occurs a few months before FOCS (and therefore, STOC papers have more time to accumulate citations). This influences the results, at least in the short term.

Piotr

To me the biggest source of puzzlement is the gap ...

2007-07-26T15:15:00.000-06:00

To me the biggest source of puzzlement is the gap between STOC and FOCS citation counts, as most of us consider those conferences functionally equivalent.

Could it be due to the slightly larger number of accepted papers by STOC (about 20% more) thus echoing Mohammad R. comment on the SODA vs STOC/FOCS gap?

Or could it be due to the summer effect as suggested by anonymous 7/26/2007 12:19:00 AM?

Personally, I'm inclined towards Mohammad explanation, a larger conference almost by necessity will have somewhat weaker papers at the bottom thus dragging the citation count a bit down. Jeff's counts of the H-index, top 10 and top 25 certainly confirm this.