One type of academic paper doesn’t shout. Arriving on arXiv on a Thursday night, it undergoes a quiet revision eleven days later. Over the course of years, it gradually transforms into something that other researchers cite without quite recalling when they first saw it. That includes the 2011 paper on submodular functions by Mahdi Cheraghchi, Adam Klivans, Pravesh Kothari, and Homin K. Lee. Although it is only a few pages long, the assertion it makes is the kind that subtly changes the way a field approaches a problem.
When the formalism is removed, the argument becomes nearly unyielding in its simplicity. The authors demonstrate the noise stability of submodular functions, which are mathematical entities that encapsulate the concept of diminishing returns, the same intuition that explains why the second slice of pizza never quite matches the first. The function continues to behave itself even if you flip a few input bits and change the conditions. That has an almost comforting quality. It seems more like a personality trait than a theorem.
| Information | Details |
|---|---|
| Paper Title | Submodular Functions Are Noise Stable |
| Authors | Mahdi Cheraghchi, Adam Klivans, Pravesh Kothari, Homin K. Lee |
| First Submitted | 2 June 2011 |
| Latest Revision | 13 June 2011 (v2) |
| Published In | SODA ’12 — Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms |
| Conference Date | 17 January 2012 |
| Pages | 1586 – 1592 |
| Subjects | Machine Learning (cs.LG), Computational Complexity (cs.CC), Game Theory (cs.GT) |
| Citations | 57+ (and counting) |
| Core Result | Non-negative submodular functions exhibit high noise stability |
| Secondary Contribution | Differentially private algorithms for Boolean conjunctions and halfspaces |
| Preceding Work Improved | Gupta, Hardt, Roth, Ullman (STOC 2011) |
| arXiv ID | 1106.0518 |
Practically speaking, this is important because noise stability allows for learning. A function can be effectively approximated if it remains intact in the face of minor perturbations. Using this observation, the authors developed a polynomial-time algorithm that, even in the agnostic setting, learns any non-negative submodular function over product distributions on the {-1,1}^n cube. Previous research, such as the well-known findings from Balcan and Harvey at STOC 2011, typically required query access or had to make a restrictive assumption about the function class. This essay didn’t.
The larger machine learning community might not have realized what had happened right away. Submodularity is in a difficult position because it is too applied for pure complexity theorists and too discrete for deep learning enthusiasts. However, those who are interested in viral marketing models, sensor placement, recommendation systems, and combinatorial optimization noticed. After about sixty citations, the paper has subtly sparked subsequent research on noise-tolerant optimization under structured uncertainty, hypergraph sparsification, and differentially private submodular maximization.
Additionally, a second result—the kind that reviewers adore—is tucked away in the paper, almost as an aside. The authors provided straightforward, effective algorithms for releasing differentially private answers to all Boolean conjunctions and all halfspaces with constant average error using related techniques. A STOC 2011 result by Gupta, Hardt, Roth, and Ullman—a not-trivial group of names to absorb—was subsumed and enhanced by this. It was quickly discovered by privacy researchers.

Looking back from 2026, the paper’s aging process is striking. Since 2011, machine learning theory has undergone significant change, with the majority of the public’s focus shifting to transformers, scaling laws, and alignment discussions. Submodular optimization continued to perform beneficial tasks in the background, such as data summarization, active learning, and feature selection pipelines. One of the cleaner foundational pieces supporting that was the Cheraghchi-Klivans-Kothari-Lee result.
As I read it now, I get the impression that the writers knew they had something strong. The consequences are stated succinctly, the framing is modest, and the proofs are strong. Not a word about revolutionizing anything. Just a meticulous illustration of how some classes of functions are inherently resistant to being diverted. That kind of quiet stability is important to consider in a time when a lot of computer science is about fragile systems trying to appear strong.
