persistent array

{{Short description|Computer science data structure}}

In computer science, and more precisely regarding data structures, a persistent array is a persistent data structure with properties similar to a (non-persistent) array. That is, after a value's update in a persistent array, there exist two persistent arrays: one persistent array in which the update is taken into account, and one which is equal to the array before the update.

Difference between persistent arrays and arrays

An array

\mathrm{ar}=[e_0,\dots,e_{n-1}] is a data structure,

with a fixed number n of elements e_0, \dots,

e_{n-1}. It is expected that, given the array ar and an

index 0\le i, the value e_i can be

retrieved quickly. This operation is called a

lookup. Furthermore, given the array ar, an index

0\le i and a new value v, a new array ar2 with

content [e_0,\dots,e_{i-1},v,e_{i+1},\dots,e_{n-1}] can

be created quickly. This operation is called an update. The

main difference between persistent and non-persistent arrays being

that, in non-persistent arrays, the array ar is destroyed during

the creation of ar2.

For example, consider the following pseudocode.

array = [0, 0, 0]

updated_array = array.update(0, 8)

other_array = array.update(1, 3)

last_array = updated_array.update(2, 5)

At the end of execution, the value of array is still [0, 0, 0], the

value of updated_array is [8, 0, 0], the value of other_array

is [0, 3, 0], and the value of last_array is [8, 0, 5].

There exist two kinds of persistent arrays. A persistent array may be

either partially or fully persistent. A fully persistent

array may be updated an arbitrary number of times while a partially

persistent array may be updated at most once. In our previous example,

if array were only partially persistent, the creation of

other_array would be forbidden; however, the creation of

last_array would still be valid. Indeed, updated_array is an array

distinct from array and has never been updated before the creation

of last_array.

Lower Bound on Persistent Array Lookup Time

Given that non-persistent arrays support both updates and lookups in constant time, it is natural to ask whether the same is possible with persistent arrays. The following theorem shows that under mild assumptions about the space complexity of the array, lookups must take \Omega(\log \log n) time in the worst case, regardless of update time, in the cell-probe model.

{{Math theorem |name=Theorem{{cite book |last1=Straka e|first1=Milan |title=Functional Data Structures and Algorithms |date=2013 |location=Prague}}{{rp|67-69}} |math_statement=Consider a partially persistent array with n elements and m = n^{\gamma} modifications, where \gamma is a constant fulfilling 1 < \gamma \le 2.

Assuming the space complexity of the array is O(m\log^k m) for a constant k,

the lower bound on the lookup complexity in this partially persistent

array is \Omega(\log \log n). }}

Implementations

In this section, n is the number of elements of the array, and m is the number of updates.

=Worst case log-time=

The most straightforward implementation of a fully persistent array uses an arbitrary persistent map, whose keys are the numbers from 0 to n − 1. A persistent map may be implemented using a persistent balanced tree, in which case both updates and lookups would take O(\log n) time. This implementation is optimal for the pointer machine model.{{rp|88-89}}

=Shallow binding=

A fully persistent array may be implemented using an array and the

so-called Baker's trick.{{cite book |last1=Fillâtre|first1=Jean-Christophe |last2=Conchon |first2=Sylvain |title=A Persistent Union-find Data Structure |date=2007 |publisher=ACM |location=New York, NY, USA |isbn=978-1-59593-676-9 |pages=37–46 |url=https://www.lri.fr/~filliatr/ftp/publis/puf-wml07.pdf}} This implementation is used in the OCaml module parray.ml{{cite web |last1=Filliâtre |first1=Jean-Christophe |title=Persistent-array implementation |website=GitHub |url=https://github.com/backtracking/ocaml-bazaar/blob/main/parray.ml}} by Jean-Christophe Filliâtre.

In order to define this implementation, a few other definitions must

be given. An initial array is an array that is not generated by

an update on another array. A child of an array ar is an

array of the form ar.update(i,v), and ar is the parent

of ar.update(i,v). A descendant of an array ar is either

ar or the descendant of a child of ar. The initial array

of an array ar is either ar if ar is initial, or it is the

initial array of the parent of ar. That is, the initial array of

ar is the unique array init such that \mathrm{ar} =

init.update(i_0,v_0).\dots.update(i_m,v_m), with ar initial

and i_0,\dots,i_m an arbitrary sequence of indexes and

v_0,\dots,v_m an arbitrary sequence of value. A

family of arrays is thus a set of arrays containing an initial

array and all of its descendants. Finally, the tree of a family of

arrays is the tree whose nodes are the

arrays, and with an edge e from ar to each of its children

ar.update(i,v).

A persistent array using Baker's trick consists of a pair with

an actual array called array and the tree of arrays. This tree

admits an arbitrary root - not necessarily the initial array. The

root may be moved to an arbitrary node of the tree. Changing the root

from root to an arbitrary node ar takes time proportional to

the depth of ar. That is, in the distance between root and

ar. Similarly, looking up a value takes time proportional to the

distance between the array and the root of its family. Thus, if the

same array ar may be lookup multiple times, it is more efficient

to move the root to ar before doing the lookup. Finally updating

an array only takes constant time.

Technically, given two adjacent arrays ar1 and ar2, with

ar1 closer to the root than ar2, the edge from ar1 to

ar2 is labelled by (i,ar2[i]), where i the only position

whose value differ between ar1 and ar2.

Accessing an element i of an array ar is done as follows. If

ar is the root, then ar[i] equals root[i]. Otherwise, let

e the edge leaving ar toward the root. If the label of e

is (i,v) then ar[i] equals v. Otherwise, let ar2 be

the other node of the edge e. Then ar[i] equals

ar2[i]. The computation of ar2[i] is done recursively using

the same definition.

The creation of ar.update(i,v) consists in adding a new node

ar2 to the tree, and an edge e from ar to ar2 labelled

by (i,v).

Finally, moving the root to a node ar is done as follows. If

ar is already the root, there is nothing to do. Otherwise, let

e the edge leaving ar toward the current root, (i,v) its

label and ar2 the other end of e. Moving the root to ar is

done by first moving the root to ar2, changing the label of e

to (i, ar2[i]), and changing array[i] to v.

Updates take O(1) time. Lookups take O(1) time if the root is the array being looked up, but \Theta(m) time in the worst case.

=Expected amortized log-log-time=

In 1989, Dietz{{Cite conference |last1=Dietz |first1=Paul F. |date=1989 |title=Fully persistent arrays |book-title=Proceedings of the Algorithms and Data Structures |pages=67–74 |doi=10.1007/3-540-51542-9_8|citeseerx=10.1.1.621.1599 }}

gave an implementation of fully persistent arrays using O(m+n) space such that lookups can be done in O(\log \log m) worst-case time, and updates can be done in

O(\log\log m) expected amortized time. By the lower bound from the previous section, this time complexity for lookup is optimal when m=n^{\gamma} for \gamma\in (1,2]. This implementation is related to the order-maintenance problem and involves vEB trees, one for the entire array and one for each index.

Straka showed that the times for both operations can be (slightly) improved to O(\log\log \min(m,n)).{{rp|88-89}}

=Worst case log-log-time=

Straka showed how to achieve O((\log \log m)^2/\log\log\log m) worst-case time and linear (O(m+n)) space, or O(\log\log m) worst-case time and super-linear space. It remains open whether it is possible to achieve worst-case time O(\log\log m) subject to linear space.{{rp|88}}

References