Predecessor problem

{{Short description|Query of largest element in a set less than an element}}

In computer science, the predecessor problem involves maintaining a set of items to, given an element, efficiently query which element precedes or succeeds that element in an order. Data structures used to solve the problem include balanced binary search trees, van Emde Boas trees, and fusion trees. In the static predecessor problem, the set of elements does not change, but in the dynamic predecessor problem, insertions into and deletions from the set are allowed.{{cite journal|last1=Beame|first1=Paul|last2=Fich|first2=Faith|author2-link=Faith Ellen|title = Optimal Bounds for the Predecessor Problem and Related Problems|journal=Journal of Computer and System Sciences|volume=65|issue=1|date=August 2002|pages=38–72|doi=10.1006/jcss.2002.1822|s2cid=1991980 |doi-access=free}}

The predecessor problem is a simple case of the nearest neighbor problem, and data structures that solve it have applications in problems like integer sorting.

Definition

The problem consists of maintaining a set {{mvar|S}}, which contains a subset of {{mvar|U}} integers. Each of these integers can be stored with a word size of {{mvar|w}}, implying that U \le 2^w. Data structures that solve the problem support these operations:{{cite conference|last1=Rahman|first1=Naila|last2=Cole|first2=Richard|last3=Raman|first3=Rajeev|title=Optimized Predecessor Data Structures for Internal Memory|conference=International Workshop on Algorithm Engineering|date=17 August 2001|pages=67–78|url=https://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/RCR01.pdf}}

  • predecessor(x), which returns the largest element in {{mvar|S}} strictly smaller than {{mvar|x}}
  • successor(x), which returns the smallest element in {{mvar|S}} strictly greater than {{mvar|x}}

In addition, data structures which solve the dynamic version of the problem also support these operations:

  • insert(x), which adds {{mvar|x}} to the set {{mvar|S}}
  • delete(x), which removes {{mvar|x}} from the set {{mvar|S}}

The problem is typically analyzed in a transdichotomous model of computation such as word RAM.

Data structures

File:Xfast trie example.svg

One simple solution to this problem is to use a balanced binary search tree, which achieves (in Big O notation) a running time of O(\log n) for predecessor queries. The Van Emde Boas tree achieves a query time of O(\log \log U), but requires O(U) space. Dan Willard proposed an improvement on this space usage with the x-fast trie, which requires O(n \log U) space and the same query time, and the more complicated y-fast trie, which only requires O(n) space.{{cite journal|last1=Willard|first1=Dan|author1-link=Dan Willard|title=Log-logarithmic worst-case range queries are possible in space Θ(n)|journal=Information Processing Letters|date=24 August 1983|volume=17|issue=2|pages=81–84|doi=10.1016/0020-0190(83)90075-3}} Fusion trees, introduced by Michael Fredman and Willard, achieve O(\log_w n) query time and O(n) for predecessor queries for the static problem.{{cite journal|last1=Fredman|first1=Michael|author1-link=Michael Fredman|last2=Willard|first2=Dan|author2-link=Dan Willard|title=Blasting through the information theoretic barrier with fusion trees|journal=Symposium on Theory of Computing|pages=1–7|date=1990}} The dynamic problem has been solved using exponential trees with O(\log_w n + \log \log n) query time,{{citation

| last1 = Andersson | first1 = Arne

| last2 = Thorup | first2 = Mikkel | author2-link = Mikkel Thorup

| doi = 10.1145/1236457.1236460

| issue = 3

| journal = Journal of the ACM

| mr = 2314255

| page = A13

| title = Dynamic ordered sets with exponential search trees

| volume = 54

| year = 2007| arxiv = cs/0210006| s2cid = 8175703

}}. and with expected time O(\log_w n) using hashing.{{citation

| last = Raman | first = Rajeev

| contribution = Priority queues: small, monotone and trans-dichotomous

| doi = 10.1007/3-540-61680-2_51

| mr = 1469229

| pages = 121–137

| publisher = Springer-Verlag | location = Berlin

| series = Lecture Notes in Computer Science

| title = Fourth Annual European Symposium on Algorithms (ESA '96), Barcelona, Spain, September 25–27, 1996

| volume = 1136

| year = 1996| isbn = 978-3-540-61680-1

}}.

Mathematical properties

There have been a number of papers proving lower bounds on the predecessor problem, or identifying what the running time of asymptotically optimal solutions would be. For example, Michael Beame and Faith Ellen proved that for all values of {{mvar|w}}, there exists a value of {{mvar|n}} with query time (in Big Theta notation) \Omega\left(\tfrac{\log w}{\log \log w}\right), and similarly, for all values of {{mvar|n}}, there exists a value of {{mvar|n}} such that the query time is \Omega\left(\sqrt{\tfrac{\log n}{\log \log n}}\right). Other proofs of lower bounds include the notion of communication complexity.

For the static predecessor problem, Mihai Pătrașcu and Mikkel Thorup showed the following lower bound for the optimal search time, in the cell-probe model:{{cite book |last1=Pătraşcu |first1=Mihai |last2=Thorup |first2=Mikkel |title=Proceedings of the thirty-eighth annual ACM symposium on Theory of Computing |chapter=Time-space trade-offs for predecessor search |date=21 May 2006 |pages=232–240 |doi=10.1145/1132516.1132551|arxiv=cs/0603043 |isbn=1595931341 |s2cid=1232 }}

O(1) \min \left\{ \begin{array}{l}

\log_{w} n \\

\lg \frac{\ell - \lg n}{a} \\

\frac{\lg \frac{\ell}{a}}{\lg \left( \frac{a}{\lg n}

\,\cdot\, \lg \frac{\ell}{a} \right)} \\

\frac{\lg \frac{\ell}{a}}{\lg \left( \lg \frac{\ell}{a} \right/\left. \lg\frac{\lg n}{a} \right)}

\end{array} \right.

where the RAM has word length w, the set contains n integers of \ell bits each and is represented in the RAM using S words of space, and defining a = \lg \frac{S}{n} + \lg w.

In the case where w = \ell = \gamma \lg n for \gamma > 1 and S = n \cdot \lg^{O(1)} n, the optimal search time is

\Theta(\lg \ell) and the van Emde Boas tree achieves this bound.

See also

References