Home Page
This page contains
links to the latest (and most complete) revision of each work.
- Mahdi Cheraghchi, Adam R. Klivans, Pravesh Kothari, Homin K.
Lee. Submodular
Functions are Noise Stable. To appear in
Proceedings of the ACM-SIAM
Symposium on Discrete Algorithms (SODA), 2012.
arXiv:1106.0518.
We show that all
non-negative submodular functions have high noise-stability. As a
consequence, we obtain a polynomial-time learning algorithm for
this class with respect to any product distribution on {-1,1}n
(for any constant accuracy parameter ε).
Our algorithm also succeeds in the agnostic setting.
Previous work on learning submodular functions required either
query access or strong assumptions about the types of submodular
functions to be learned (and did not hold in the agnostic
setting). Additionally we give simple algorithms
that efficiently release differentially private answers to all
Boolean conjunctions and to all halfspaces with constant average
error, subsuming and improving recent work due to Gupta, Hardt,
Roth and Ullman (STOC 2011).
We review connections
between coding-theoretic objects and sparse learning problems.
In particular, we show how seemingly different combinatorial
objects such as error-correcting codes, combinatorial designs,
spherical codes, compressed sensing matrices and group testing
designs can be obtained from one another. The reductions enable
one to translate upper and lower bounds on the parameters
attainable by one object to another. We survey some of the
well-known reductions in a unified presentation, and bring some
existing gaps to attention. New reductions are also introduced;
in particular, we bring up the notion of minimum "L-wise
distance" of codes and show that this notion closely captures
the combinatorial structure of RIP-2 matrices. Moreover, we show
how this weaker variation of the minimum distance is related to
combinatorial list-decoding properties of codes.
The rapid development of
derandomization theory, which is a fundamental area in
theoretical computer science, has recently led to many
surprising applications outside its initial intention. We will
review some recent such developments related to combinatorial
group testing. In its most basic setting, the aim of group
testing is to identify a set of "positive" individuals in a
population of items by taking groups of items and asking whether
there is a positive in each group.
In particular, we will discuss explicit constructions of optimal
or nearly-optimal group testing schemes using
"randomness-conducting" functions. Among such developments are
constructions of error-correcting group testing schemes using
randomness extractors and condensers, as well as threshold group
testing schemes from lossless condensers.
- Mahdi Cheraghchi, Johan Håstad, Marcus Isaksson, Ola
Svensson. Approximating
Linear
Threshold Predicates. In Proceedings of the 13th International Workshop on
Approximation Algorithms for Combinatorial Optimization
Problems (APPROX), 2010.
We study constraint
satisfaction problems with constraints defined by homogeneous
linear threshold predicates. Specifically, we consider the
standard optimization problem Max-CSP(P) where the objective is to satisfy as many
constraints of the same type P
on a number of Boolean variables as possible. We assume that the
constraints are defined by a fixed linear threshold predicate P: {-1,+1}n →
{-1,+1} of the form P(x1, ..., xn) = sgn(w1 x1 + ... + wn xn),
for positive integer weights wi (i=1, ..., n).
Our focus is on a range of threshold predicates for which the
problem does not become approximation resistant, and we study
the approximation curve of this class of problems. For the
special case of the majority function with w1 = ... = wn
= 1 and n odd, we
obtain almost-matching approximability and hardness results that
can be summarized as follows:
- Approximation: Using
linear
programming,
we
design
a
polynomial-time algorithm that, given a 1-δ/(n+1)
satisfiable instance for any δ
< 1, satisfies a 1/2 + (1-δ)7
Ω(1/sqrt(n)) fraction of the constraints.
- Hardness: Assuming
the
Unique
Games
Conjecture
(Khot,
STOC 2002), for any ε>0 and δ in [0,1],
given a (1-δ/(n+1) - ε)-satisfiable
instance it is NP-hard to satisfy a 1/2 + (1-δ)
Ω(1/sqrt(n) + ε
fraction of the constraints.
We extend the above results to a more general class of
"majority-like" predicates and obtain parallel results for them.
Loosely speaking, this class of predicates can be defined using
weights wi
that are generally small. Towards this, we introduce the notion
of Chow-robustness
that might be of independent interest.
The basic goal in
combinatorial group testing is to identify a set of up to d defective items within a
large population of size n
>> d using a
pooling strategy. Namely, the items can be grouped together in
pools, and a single measurement would reveal whether there are
one or more defectives in the pool. The threshold model is a
generalization of this idea where a measurement returns positive
if the number of defectives in the pool passes a fixed threshold
u, negative if this
number is below a fixed lower threshold L ≤ u , and may behave
arbitrarily otherwise. We study non-adaptive threshold group
testing (in a possibly noisy setting) and show that, for this
problem, O(dg+2 (log d) log(n/d)) measurements (where g := u - L) suffice to identify the
defectives, and also present almost matching lower bounds. This
significantly improves the previously known (non-constructive)
upper bound O(du+1 log(n/d)). Moreover, we obtain a framework for
explicit construction of measurement schemes using lossless
condensers. The number of measurements resulting from this
scheme is ideally bounded by O(dg+3 (log d) log n). Using state-of-the-art
constructions of lossless condensers, however, we come up with
explicit testing schemes with O(dg+3
(log d) quasipoly(log
n)) and O(dg+3+β poly(log n))
measurements, for arbitrary constant β > 0.
- Mahdi Cheraghchi, Amin Karbasi, Soheil Mohajer, Vankatesh
Saligrama. Graph-Constrained
Group
Testing. In Proceedings
of IEEE International Symposium on Information Theory (ISIT),
2010.
- Journal version to appear in IEEE Transactions on
Information Theory, 2011.
Non-adaptive group testing
involves grouping arbitrary subsets of n items into different pools. Each pool is
then tested and defective items are identified. A fundamental
question involves minimizing the number of pools required to
identify at most d
defective items. Motivated by applications in network
tomography, sensor networks and infection propagation we
formulate group testing problems on graphs. Unlike conventional
group testing problems each group here must conform to the
constraints imposed by a graph. For instance, items can be
associated with vertices and each pool is any set of nodes that
must be path connected. In this paper we associate a test with a
random walk. In this context conventional group testing
corresponds to the special case of a complete graph on n vertices.
For interesting classes of graphs we arrive at a rather
surprising result, namely, that the number of tests required to
identify d defective
items is substantially similar to that required in conventional
group testing problems, where no such constraints on pooling is
imposed. Specifically, if T(n) corresponds to the
mixing time of the graph G,
we show that with m=O(d2 T2(n) log(n/d)) non-adaptive tests, one can identify the
defective items. Consequently, for the Erdos-Renyi random graph
G(n,p), as well as expander graphs with constant
spectral gap, it follows that m=O(d2 log3(n)) non-adaptive
tests
are sufficient to identify d
defective items. We next consider a specific scenario that
arises in network tomography and show that m=O(d3 log3(n)) non-adaptive
tests
are sufficient to identify d
defective items. We also consider noisy counterparts of the
graph constrained group testing problem and develop parallel
results for these cases.
Detection of defective
members of large populations has been widely studied in the
statistics community under the name "group testing", a problem
which dates back to World War II when it was suggested for
syphilis screening. There the main interest is to identify a
small number of infected people among a large population using
collective samples. In viral epidemics, one way to acquire
collective samples is by sending agents inside the population.
While in classical group testing, it is assumed that the
sampling procedure is fully known to the reconstruction
algorithm, in this work we assume that the decoder possesses
only partial knowledge about the sampling process. This
assumption is justified by observing the fact that in a
viral sickness, there is a chance that an agent remains healthy
despite having contact with an infected person. Therefore, the
reconstruction method has to cope with two different types of
uncertainty; namely, identification of the infected population
and the partially unknown sampling procedure.
In this work, by using a
natural probabilistic model for "viral infections", we design
non-adaptive sampling procedures that allow successful
identification of the infected population with overwhelming
probability 1-o(1). We propose both probabilistic and explicit
design procedures that require a "small" number of agents to
single out the infected individuals. More precisely, for a
contamination probability p,
the number of agents required by the probabilistic and explicit
designs for identification of up to k infected members is bounded by m = O(k2 (log n) / p3) and m = O(k2 (log2n) / p3),
respectively. In both cases, a simple decoder is able to
successfully identify the infected population in time O(mn).
We study combinatorial group
testing schemes for learning d-sparse
boolean
vectors
using
highly
unreliable
disjunctive
measurements.
We consider an adversarial noise model that only limits the
number of false observations, and show that any noise-resilient
scheme in this model can only approximately reconstruct the
sparse vector. On the positive side, we give a general
framework for construction of highly noise-resilient group
testing schemes using randomness condensers. Simple randomized
instantiations of this construction give non-adaptive
measurement schemes, with m=O(d log n) measurements, that
allow efficient reconstruction of d-sparse vectors up to O(d) false positives even in
the presence of delta*m
false positives and Omega(m/d) false negatives within
the measurement outcomes, for any
constant delta < 1. None of these parameters can be
substantially improved without dramatically affecting the
others. Furthermore, we obtain several explicit (and
incomparable) constructions, in particular one matching the
randomized trade-off but using m = O(d1+o(1)
log n) measurements.
We also obtain explicit constructions that allow fast
reconstruction in time poly(m),
which would be sublinear in n
for sufficiently sparse vectors.
We give a general framework
for construction of small ensembles of capacity achieving linear
codes for a wide range of (not necessarily memoryless) discrete
symmetric channels, and in particular, the binary erasure and
symmetric channels. The main tool used in our
constructions is the notion of randomness extractors and
lossless condensers that are regarded as central tools in
theoretical computer science. Same as random codes, the
resulting ensembles preserve their capacity achieving properties
under any change of basis. Our methods can potentially
lead to polynomial-sized ensembles; however, using known
explicit constructions of randomness conductors we obtain
specific ensembles whose size is as small as quasipolynomial in
the block length. By applying our construction to
Justesen's concatenation scheme (Justesen, 1972) we obtain
explicit capacity achieving codes for BEC (resp., BSC) with
almost linear time encoding and almost linear time (resp.,
quadratic time) decoding and exponentially small error
probability. The explicit code for BEC is defined and capacity
achieving for every block length.
A wiretap protocol is a pair
of randomized encoding and decoding functions such that
knowledge of a bounded fraction of the encoding of a message
reveals essentially no information about the message, while
knowledge of the entire encoding reveals the message using the
decoder. In this paper we study the notion of efficiently
invertible extractors and show that a wiretap protocol can be
constructed from such an extractor. We will then construct
invertible extractors for symbol-fixing, affine, and general
sources and apply them to create wiretap protocols with
asymptotically optimal trade-offs between their rate (ratio of
the length of the message versus its encoding) and resilience
(ratio of the observed positions of the encoding and the length
of the encoding). We will then apply our results to create
wiretap protocols for challenging communication problems, such
as active intruders who change portions of the encoding, network
coding, and intruders observing arbitrary boolean functions of
the encoding.
This paper studies the
stability of some reconstruction algorithms for compressed
sensing in terms of the bit
precision. Considering the fact that practical digital
systems deal with discretized signals, we motivate the
importance of the total number of accurate bits needed from the
measurement outcomes in addition to the number of
measurements. It is shown that if one uses a 2k*n Vandermonde matrix with roots on the
unit circle as the measurement matrix, O(L + k log(n/k)) bits of precision per measurement are
sufficient to reconstruct a k-sparse
signal x \in Rn with dynamic range (i.e., the
absolute ratio between the largest and the smallest nonzero
coefficients) at most 2L within L bits of precision, hence
identifying its correct support. Finally, we obtain an upper
bound on the total number of required bits when the measurement
matrix satisfies a restricted isometry property, which is in
particular the case for random Fourier and Gaussian
matrices. For very sparse signals, the upper bound on the
number of required bits for Vandermonde matrices is shown to be
better than this general upper bound.
We consider the problem of
uniform sampling of points on an algebraic variety.
Specifically, we develop a randomized algorithm that, given a
small set of multivariate polynomials over a sufficiently
large finite field, produces a common zero of the polynomials
almost uniformly at random. The statistical distance between
the output distribution of the algorithm and the uniform
distribution on the set of common zeros is polynomially small
in the field size, and the running time of the algorithm is
polynomial in the description of the polynomials and their
degrees provided that the number of the polynomials is a
constant.
- Mahdi Cheraghchi, Amin Shokrollahi, Avi Wigderson. Computational
Hardness and Explicit Constructions of Error Correcting
Codes. In Proceedings
of 44th Allerton Conference on Communication, Control and
Computing (invited paper), 2006.
We outline a procedure
for using pseudorandom generators to construct binary
codes with good properties, assuming the existence of
sufficiently hard functions. Specifically, we give a
polynomial time algorithm, which for every integers n and k, constructs
polynomially many linear codes of block length n and dimension k, most of which
achieving the Gilbert-Varshamov bound. The success of the
procedure relies on the assumption that the exponential
time class of E
:= DTIME[2O(n)]
is not contained in the sub-exponential space class
DSPACE[2o(n)]. The methods used in this paper
are by now standard within computational complexity
theory, and the main contribution of this note is
observing that they are relevant to the construction of
optimal codes. We attempt to make this note self
contained, and describe the relevant results and proofs
from the theory of pseudorandomness in some detail.
- Mahdi Cheraghchi. On
Matrix
Rigidity
and
the
Complexity
of
Linear
Forms. ECCC Technical
Report TR05-070. (talk)
The
rigidity function of a matrix is defined as the minimum number
of its entries that need to be changed in order to reduce the
rank of the matrix to below a given parameter. Proving a strong
enough lower bound on the rigidity of a matrix implies a
nontrivial lower bound on the complexity of any linear circuit
computing the set of linear forms associated with it. However,
although it is shown that most matrices are rigid enough, no
explicit construction of a rigid family of matrices is known. In
this survey report we review the concept of rigidity and some of
its interesting variations as well as several notable results
related to that. We also show the existence of highly rigid
matrices constructed by evaluation of bivariate polynomials over
finite fields.
Randomized techniques play a
fundamental role in theoretical computer science and discrete
mathematics, in particular for the design of efficient
algorithms and construction of combinatorial objects. The basic
goal in derandomization theory is to eliminate or reduce the
need for randomness in such randomized constructions.
Towards this goal, numerous fundamental notions have been
developed to provide a unified framework for approaching various
derandomization problems and to improve our general
understanding of the power of randomness in computation.
Two important classes of such tools are pseudorandom generators
and randomness extractors. Pseudorandom generators
transform a short, purely random, sequence into a much longer
sequence that looks random, while extractors transform a weak
source of randomness into a perfectly random one (or one with
much better qualities, in which case the transformation is
called a randomness condenser).
In this thesis, we explore some applications of the fundamental
notions in derandomization theory to problems outside the core
of theoretical computer science, and in particular, certain
problems related to coding theory. First, we consider the
wiretap channel problem which involves a communication system in
which an intruder can eavesdrop a limited portion of the
transmissions. We utilize randomness extractors to construct
efficient and information-theoretically optimal communication
protocols for this model.
Then we consider the combinatorial group testing problem. In
this classical problem, one aims to determine a set of defective
items within a large population by asking a number of queries,
where each query reveals whether a defective item is present
within a specified group of items. We use randomness condensers
to explicitly construct optimal, or nearly optimal, group
testing schemes for a setting where the query outcomes can be
highly unreliable, as well as the threshold model where a query
returns positive if the number of defectives pass a certain
threshold.
Next, we use randomness condensers and extractors to design
ensembles of error-correcting codes that achieve the
information-theoretic capacity of a large class of communication
channels, and then use the obtained ensembles for construction
of explicit capacity achieving codes. Finally, we consider the
problem of explicit construction of error-correcting codes on
the Gilbert-Varshamov bound and extend the original idea of
Nisan and Wigderson to obtain a small ensemble of codes, mostly
achieving the bound, under suitable computational hardness
assumptions.
Error
correcting codes are combinatorial objects that allow reliable
recovery of information in presence of errors by cleverly
augmenting the original information with a certain amount of
redundancy.
The availability of efficient means of error detection is
considered as a fundamental criterion for error correcting
codes. Locally testable codes are families of error correcting
codes that are highly robust against transmission errors and in
addition provide super-efficient (sublinear time) probabilistic
algorithms for error detection. In particular, the error
detection algorithm probes the received sequence only at a small
(or even constant) number of locations.
There seems to be a trade-off between the amount of
redundancy and the number of probes for the error detection
procedure in locally testable codes. Even though currently best
constructions allow reduction of redundancy to a nearly linear
amount, it is not clear whether this can be further reduced to
linear while preserving a constant number of probes.
We study the formal notion of locally testable codes and
survey several major results in this area. We also investigate
closely related concepts, and in particular, polynomial
low-degree tests and probabilistically checkable proofs.
Last Updated on October 2011
Home Page