Partial word

{{short description|Computer science string term}}

In computer science and the study of combinatorics on words, a partial word is a string that may contain a number of "do not know" or "do not care" symbols i.e. placeholders in the string where the symbol value is not known or not specified. More formally, a partial word is a partial function u: \{ 0, \ldots, n-1 \} \rightarrow A where A is some finite alphabet. If u(k) is not defined for some k \in \{ 0, \ldots, n-1 \} then the unknown element at place k in the string is called a "hole". In regular expressions (following the POSIX standard) a hole is represented by the metacharacter ".". For example, aab.ab.b is a partial word of length 8 over the alphabet A ={a,b} in which the fourth and seventh characters are holes.{{citation

| last = Blanchet-Sadri | first = Francine

| isbn = 978-1-4200-6092-8

| mr = 2384993

| publisher = Chapman & Hall/CRC | location = Boca Raton, Florida

| series = Discrete Mathematics and its Applications

| title = Algorithmic Combinatorics on Partial Words

| title-link = Algorithmic Combinatorics on Partial Words

| year = 2008}}

Algorithms

Several algorithms have been developed for the problem of "string matching with don't cares", in which the input is a long text and a shorter partial word and the goal is to find all strings in the text that match the given partial word.{{citation

| last = Pinter | first = Ron Y. | authorlink = Ron Pinter

| contribution = Efficient string matching with don't-care patterns

| mr = 815329

| pages = 11–29

| publisher = Springer, Berlin

| series = NATO Adv. Sci. Inst. Ser. F Comput. Systems Sci.

| title = Combinatorial algorithms on words (Maratea, 1984)

| volume = 12

| year = 1985}}{{citation

| last1 = Manber | first1 = Udi | author1-link = Udi Manber

| last2 = Baeza-Yates | first2 = Ricardo | author2-link = Ricardo Baeza-Yates

| doi = 10.1016/0020-0190(91)90032-D

| issue = 3

| journal = Information Processing Letters

| mr = 1095695

| pages = 133–136

| title = An algorithm for string matching with a sequence of don't cares

| volume = 37

| year = 1991}}{{citation

| last = Kalai | first = Adam

| editor-last = Eppstein | editor-first = David | editor-link = David Eppstein

| contribution = Efficient pattern-matching with don't cares

| contribution-url = https://dl.acm.org/citation.cfm?id=545381.545468

| pages = 655–656

| publisher = ACM and SIAM

| title = Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6-8, 2002, San Francisco, CA, USA

| year = 2002}}

Applications

File:Cube-face-intersection-graph.svg

Two partial words are said to be compatible when they have the same length and when every position that is a non-wildcard in both of them has the same character in both. If one forms an undirected graph with a vertex for each partial word in a collection of partial words, and an edge for each compatible pair, then the cliques of this graph come from sets of partial words that all match at least one common string. This graph-theoretical interpretation of compatibility of partial words plays a key role in the proof of hardness of approximation of the clique problem, in which a collection of partial words representing successful runs of a probabilistically checkable proof verifier has a large clique if and only if there exists a valid proof of an underlying NP-complete problem.{{citation |first1=U. |last1=Feige |author1-link=Uriel Feige |first2=S. |last2=Goldwasser |author2-link=Shafi Goldwasser |first3=L. |last3=Lovász |author3-link=László Lovász |first4=S |last4=Safra |author4-link=Shmuel Safra |first5=M. |last5=Szegedy |author5-link=Mario Szegedy |title=Proc. 32nd IEEE Symp. on Foundations of Computer Science |pages=2–12 |year=1991 |doi=10.1109/SFCS.1991.185341 |chapter=Approximating clique is almost NP-complete |isbn=0-8186-2445-0|title-link=Symposium on Foundations of Computer Science |s2cid=46605913 }}

The faces (subcubes) of an n-dimensional hypercube can be described by partial words of length n over a binary alphabet, whose

symbols are the Cartesian coordinates of the hypercube vertices (e.g., 0 or 1 for a unit cube). The dimension of a subcube, in this representation, equals the number of don't-care symbols it contains. The same representation may also be used to describe the implicants of Boolean functions.{{citation

| last = Karnaugh | first = Maurice | author-link = Maurice Karnaugh

| doi = 10.1109/TCE.1953.6371932

| journal = Transactions of the American Institute of Electrical Engineers, Part I: Communication and Electronics

| mr = 0069032

| pages = 593–599

| title = The map method for synthesis of combinational logic circuits

| volume = 1953

| year = 1953| issue = 5 | s2cid = 51636736 }}

Related concepts

Partial words may be generalized to parameter words, in which some of the "do not know" symbols are marked as being equal to each other. A partial word is a special case of a parameter word in which each do not know symbol may be substituted by a character independently of all of the other ones.{{citation

| last = Prömel | first = Hans Jürgen

| doi = 10.1023/A:1020879709125

| issue = 1–2

| journal = Synthese

| jstor = 20117296

| mr = 1950045

| pages = 87–105

| title = Large numbers, Knuth's arrow notation, and Ramsey theory

| volume = 133

| year = 2002| s2cid = 18330949

}}

References