small-bias sample space

{{More footnotes needed|date=June 2020}}

In theoretical computer science, a small-bias sample space (also known as \epsilon-biased sample space, \epsilon-biased generator, or small-bias probability space) is a probability distribution that fools parity functions.

In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

The main useful property of small-bias sample spaces is that they need far fewer truly random bits than the uniform distribution to fool parities. Efficient constructions of small-bias sample spaces have found many applications in computer science, some of which are derandomization, error-correcting codes, and probabilistically checkable proofs.

The connection with error-correcting codes is in fact very strong since \epsilon-biased sample spaces are equivalent to \epsilon-balanced error-correcting codes.

Definition

=Bias=

Let X be a probability distribution over \{0,1\}^n.

The bias of X with respect to a set of indices I \subseteq \{1,\dots,n\} is defined ascf., e.g., {{harvtxt|Goldreich|2001}}

:

\text{bias}_I(X)

=

\left|

\Pr_{x\sim X} \left(\sum_{i\in I} x_i = 0\right)

-

\Pr_{x\sim X} \left(\sum_{i\in I} x_i = 1\right)

\right|

=

\left|

2 \cdot \Pr_{x\sim X} \left(\sum_{i\in I} x_i = 0\right)

-1

\right|

\,,

where the sum is taken over \mathbb F_2, the finite field with two elements. In other words, the sum \sum_{i\in I} x_i equals 0 if the number of ones in the sample x\in\{0,1\}^n at the positions defined by I is even, and otherwise, the sum equals 1.

For I=\emptyset, the empty sum is defined to be zero, and hence \text{bias}_{\emptyset} (X) = 1.

= ϵ-biased sample space =

A probability distribution X over \{0,1\}^n is called an \epsilon-biased sample space if

\text{bias}_I(X) \leq \epsilon

holds for all non-empty subsets I \subseteq \{1,2,\ldots,n\}.

= ϵ-biased set =

An \epsilon-biased sample space X that is generated by picking a uniform element from a multiset X\subseteq \{0,1\}^n is called \epsilon-biased set.

The size s of an \epsilon-biased set X is the size of the multiset that generates the sample space.

= ϵ-biased generator =

An \epsilon-biased generator G:\{0,1\}^\ell \to \{0,1\}^n is a function that maps strings of length \ell to strings of length n such that the multiset X_G=\{G(y) \;\vert\; y\in\{0,1\}^\ell \} is an \epsilon-biased set. The seed length of the generator is the number \ell and is related to the size of the \epsilon-biased set X_G via the equation s=2^\ell.

Connection with epsilon-balanced error-correcting codes

There is a close connection between \epsilon-biased sets and \epsilon-balanced linear error-correcting codes.

A linear code C:\{0,1\}^n\to\{0,1\}^s of message length n and block length s is

\epsilon-balanced if the Hamming weight of every nonzero codeword C(x) is between (\frac{1}{2}-\epsilon)s and (\frac{1}{2}+\epsilon)s.

Since C is a linear code, its generator matrix is an (n\times s)-matrix A over \mathbb F_2 with C(x)=x \cdot A.

Then it holds that a multiset X\subset\{0,1\}^{n} is \epsilon-biased if and only if the linear code C_X, whose columns are exactly elements of X, is \epsilon-balanced.cf., e.g., p. 2 of {{harvtxt|Ben-Aroya|Ta-Shma|2009}}

Constructions of small epsilon-biased sets

Usually the goal is to find \epsilon-biased sets that have a small size s relative to the parameters n and \epsilon.

This is because a smaller size s means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.

= Theoretical bounds =

The probabilistic method gives a non-explicit construction that achieves size s=O(n/\epsilon^2).

The construction is non-explicit in the sense that finding the \epsilon-biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness.

However, this non-explicit construction is useful because it shows that these efficient codes exist.

On the other hand, the best known lower bound for the size of \epsilon-biased sets is s=\Omega(n/ (\epsilon^2 \log (1/\epsilon)), that is, in order for a set to be \epsilon-biased, it must be at least that big.

= Explicit constructions =

There are many explicit, i.e., deterministic constructions of \epsilon-biased sets with various parameter settings:

  • {{harvtxt|Naor|Naor|1990}} achieve \displaystyle s= \frac{n}{\text{poly}(\epsilon)}. The construction makes use of Justesen codes (which is a concatenation of Reed–Solomon codes with the Wozencraft ensemble) as well as expander walk sampling.
  • {{harvtxt|Alon|Goldreich|Håstad|Peralta|1992}} achieve \displaystyle s= O\left(\frac{n}{\epsilon \log (n/\epsilon)}\right)^2. One of their constructions is the concatenation of Reed–Solomon codes with the Hadamard code; this concatenation turns out to be an \epsilon-balanced code, which gives rise to an \epsilon-biased sample space via the connection mentioned above.
  • Concatenating Algebraic geometric codes with the Hadamard code gives an \epsilon-balanced code with \displaystyle s= O\left(\frac{n}{\epsilon^3 \log (1/\epsilon)}\right).
  • {{harvtxt|Ben-Aroya|Ta-Shma|2009}} achieves \displaystyle s= O\left(\frac{n}{\epsilon^2 \log (1/\epsilon)}\right)^{5/4}.
  • {{harvtxt|Ta-Shma|2017}} achieves \displaystyle s= O\left(\frac{n}{\epsilon^{2+o(1)}}\right) which is almost optimal because of the lower bound.

These bounds are mutually incomparable. In particular, none of these constructions yields the smallest \epsilon-biased sets for all settings of \epsilon and n.

Application: almost k-wise independence

An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.

= k-wise independent spaces =

A random variable Y over \{0,1\}^n is a k-wise independent space if, for all index sets I\subseteq\{1,\dots,n\} of size k, the marginal distribution Y|_I is exactly equal to the uniform distribution over \{0,1\}^k.

That is, for all such I and all strings z\in\{0,1\}^k, the distribution Y satisfies \Pr_Y (Y|_I = z) = 2^{-k}.

== Constructions and bounds ==

k-wise independent spaces are fairly well understood.

  • A simple construction by {{harvtxt|Joffe|1974}} achieves size n^k.
  • {{harvtxt|Alon|Babai|Itai|1986}} construct a k-wise independent space whose size is n^{k/2}.
  • {{harvtxt|Chor|Goldreich|Håstad|Freidmann|1985}} prove that no k-wise independent space can be significantly smaller than n^{k/2}.

== Joffe's construction ==

{{harvtxt|Joffe|1974}} constructs a k-wise independent space Y over the finite field with some prime number n>k of elements, i.e., Y is a distribution over \mathbb F_n^n. The initial k marginals of the distribution are drawn independently and uniformly at random:

:(Y_0,\dots,Y_{k-1}) \sim\mathbb F_n^k.

For each i with k \leq i < n, the marginal distribution of Y_i is then defined as

:Y_i=Y_0 + Y_1 \cdot i + Y_2 \cdot i^2 + \dots + Y_{k-1} \cdot i^{k-1}\,,

where the calculation is done in \mathbb F_n.

{{harvtxt|Joffe|1974}} proves that the distribution Y constructed in this way is k-wise independent as a distribution over \mathbb F_n^n.

The distribution Y is uniform on its support, and hence, the support of Y forms a k-wise independent set.

It contains all n^k strings in \mathbb F_n^k that have been extended to strings of length n using the deterministic rule above.

= Almost k-wise independent spaces =

A random variable Y over \{0,1\}^n is a \delta-almost k-wise independent space if, for all index sets I\subseteq\{1,\dots,n\} of size k, the restricted distribution Y|_I and the uniform distribution U_k on \{0,1\}^k are \delta-close in 1-norm, i.e., \Big\|Y|_I - U_k\Big\|_1 \leq \delta.

== Constructions ==

{{harvtxt|Naor|Naor|1990}} give a general framework for combining small k-wise independent spaces with small \epsilon-biased spaces to obtain \delta-almost k-wise independent spaces of even smaller size.

In particular, let G_1:\{0,1\}^h\to\{0,1\}^n be a linear mapping that generates a k-wise independent space and let G_2:\{0,1\}^\ell \to \{0,1\}^h be a generator of an \epsilon-biased set over \{0,1\}^h.

That is, when given a uniformly random input, the output of G_1 is a k-wise independent space, and the output of G_2 is \epsilon-biased.

Then G : \{0,1\}^\ell \to \{0,1\}^n with G(x) = G_1(G_2(x)) is a generator of an \delta-almost k-wise independent space, where \delta=2^{k/2} \epsilon.Section 4 in {{harvtxt|Naor|Naor|1990}}

As mentioned above, {{harvtxt|Alon|Babai|Itai|1986}} construct a generator G_1 with h=\tfrac{k}{2} \log n, and {{harvtxt|Naor|Naor|1990}} construct a generator G_2 with \ell=\log s=\log h + O(\log(\epsilon^{-1})).

Hence, the concatenation G of G_1 and G_2 has seed length \ell = \log k + \log \log n + O(\log(\epsilon^{-1})).

In order for G to yield a \delta-almost k-wise independent space, we need to set \epsilon = \delta 2^{-k/2}, which leads to a seed length of \ell = \log \log n + O(k+\log(\delta^{-1})) and a sample space of total size 2^\ell \leq \log n \cdot \text{poly}(2^k \cdot\delta^{-1}).

Notes

{{Reflist}}

References

  • {{Citation

| last1 = Alon

| first1 = Noga

| last2 = Babai

| first2 = László

| last3 = Itai

| first3 = Alon

| title = A fast and simple randomized parallel algorithm for the maximal independent set problem

| year = 1986

| volume = 7

| issue = 4

| pages = 567–583

| journal = Journal of Algorithms

| doi=10.1016/0196-6774(86)90019-2

| url=http://www.tau.ac.il/~nogaa/PDFS/Publications2/A%20fast%20and%20simple%20randomized%20parallel%20algorithm%20for%20the%20maximal%20independent%20set%20problem.pdf

}}

  • {{Citation

| last1 = Alon

| first1 = Noga

| last2 = Goldreich

| first2 = Oded

| last3 = Håstad

| first3 = Johan

| last4 = Peralta

| first4 = René

| title = Simple Constructions of Almost k-wise Independent Random Variables

| year = 1992

| volume = 3

| issue = 3

| pages = 289–304

| journal = Random Structures & Algorithms

| doi=10.1002/rsa.3240030308

| url=http://tau.ac.il/~nogaa/PDFS/aghp4.pdf

| citeseerx = 10.1.1.106.6442

}}

  • {{Cite book|first1 = Avraham | last1= Ben-Aroya

|first2 = Amnon | last2= Ta-Shma

| title= 2009 50th Annual IEEE Symposium on Foundations of Computer Science

| chapter= Constructing Small-Bias Sets from Algebraic-Geometric Codes

|year=2009

|pages=191–197

|doi=10.1109/FOCS.2009.44

|url=http://www.wisdom.weizmann.ac.il/~benaroya/SmallBiasNew.pdf

|isbn = 978-1-4244-5116-6

| citeseerx= 10.1.1.149.9273

}}

  • {{Cite book|first1 = Benny | last1=Chor | author1-link = Benny Chor

|first2 =Oded | last2=Goldreich

|first3 =Johan | last3=Håstad

|first4 =Joel | last4=Freidmann

|first5 =Steven | last5=Rudich

|first6 =Roman | last6=Smolensky

| title=26th Annual Symposium on Foundations of Computer Science (SFCS 1985) | chapter=The bit extraction problem or t-resilient functions |year=1985

|pages=396–407

|doi=10.1109/SFCS.1985.55

|url=

|isbn = 978-0-8186-0644-1

| citeseerx=10.1.1.39.6768

| s2cid=6968065

}}

  • {{Citation

| last = Goldreich

| first = Oded

| author-link =

| title = Lecture 7: Small bias sample spaces

| year = 2001

| url = http://www.wisdom.weizmann.ac.il/~oded/PS/RND/l07.ps

}}

  • {{Citation

| last=Joffe | first=Anatole

| title=On a Set of Almost Deterministic k-Independent Random Variables

| year=1974

| journal=Annals of Probability

| pages=161–162

| volume=2

| issue=1

| doi=10.1214/aop/1176996762

| doi-access=free

}}

  • {{Citation

| last1 = Naor

| first1 = Joseph

| last2 = Naor

| first2 = Moni

| title = Proceedings of the twenty-second annual ACM symposium on Theory of computing - STOC '90

| chapter = Small-bias probability spaces: Efficient constructions and applications

| year = 1990

| url = http://www.wisdom.weizmann.ac.il/~naor/PAPERS/bias_abs.html

| pages=213–223

| doi=10.1145/100216.100244

| isbn = 978-0897913614| citeseerx = 10.1.1.421.2784

| s2cid = 14031194

}}

  • {{Citation

| last1 = Ta-Shma

| first1 = Amnon

| title = Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

| chapter = Explicit, almost optimal, epsilon-balanced codes

| year = 2017

| pages=238–251

| doi=10.1145/3055399.3055408

| isbn = 9781450345286| s2cid = 5648543

}}

Category:Pseudorandomness

Category:Theoretical computer science