private information retrieval

In cryptography, a private information retrieval (PIR) protocol is a protocol that allows a user to retrieve an item from a server in possession of a database without revealing which item is retrieved. PIR is a weaker version of 1-out-of-n oblivious transfer, where it is also required that the user should not get information about other database items.

One trivial, but very inefficient way to achieve PIR is for the server to send an entire copy of the database to the user. In fact, this is the only possible protocol (in the classical or the quantum setting{{cite journal|last=Baumeler|first=Ämin|author2=Broadbent, Anne|author2-link= Anne Broadbent |title=Quantum Private Information Retrieval has Linear Communication Complexity|journal=Journal of Cryptology|volume=28|pages=161–175|date=17 April 2014|doi=10.1007/s00145-014-9180-2|arxiv=1304.5490|s2cid=1450526}}) that gives the user information theoretic privacy for their query in a single-server setting.{{cite journal|last=Chor|first=Benny|author-link= Benny Chor |author2=Kushilevitz, Eyal |author3=Goldreich, Oded |author4= Sudan, Madhu |title=Private information retrieval|journal=Journal of the ACM|date=1 November 1998|volume=45|issue=6|pages=965–981|doi=10.1145/293347.293350|url=http://www.tau.ac.il/~bchor/PIR.pdf|citeseerx=10.1.1.51.3663|s2cid=544823}} There are two ways to address this problem: make the server computationally bounded or assume that there are multiple non-cooperating servers, each having a copy of the database.

The problem was introduced in 1995 by Chor, Goldreich, Kushilevitz and Sudan in the information-theoretic setting and in 1997 by Kushilevitz and Ostrovsky in the computational setting.{{cite book|last=Kushilevitz|first=Eyal|title=Proceedings of the 38th Annual Symposium on Foundations of Computer Science|date=1997|publisher=IEEE Computer Society|location=Miami Beach, Florida, USA|isbn=978-0-8186-8197-4|pages=364–373|author2=Ostrovsky, Rafail|chapter=Replication is not needed: single database, computationally-private information retrieval|doi=10.1109/SFCS.1997.646125|citeseerx=10.1.1.56.2667|s2cid=11000506}} Since then, very efficient solutions have been discovered. Single database (computationally private) PIR can be achieved with constant (amortized) communication and k-database (information theoretic) PIR can be done with $n^{O\left(\frac{\log \log k}{k \log k}\right)}$ communication.

Advances in computational PIR

The first single-database computational PIR scheme to achieve communication complexity less than $n$ was created in 1997 by Kushilevitz and Ostrovsky and achieved communication complexity of $n^\epsilon$ for any $\epsilon$ , where $n$ is the number of bits in the database. The security of their scheme was based on the well-studied quadratic residuosity problem. In 1999, Christian Cachin, Silvio Micali and Markus Stadler{{cite book|last=Cachin|first=Christian|title=Advances in Cryptology – EUROCRYPT '99|date=1999|publisher=Springer-Verlag|location=Prague, Czech Republic|isbn=978-3-540-48910-8|pages=402–414|author2=Micali, Silvio |author3=Stadler, Markus |chapter=Computationally Private Information Retrieval with Polylogarithmic Communication|doi=10.1007/3-540-48910-X_28}} achieved poly-logarithmic communication complexity. The security of their system is based on the phi-hiding assumption. In 2004, Helger Lipmaa{{cite book|last=Lipmaa|first=Helger|title=Proceedings of the 8th International Conference on Information Security (ISC 2005)|volume=3650|date=2005|publisher=Springer-Verlag|location=Singapore|isbn=978-3-540-31930-6|pages=314–328|chapter=An Oblivious Transfer Protocol with Log-Squared Communication|doi=10.1007/11556992_23|series=Lecture Notes in Computer Science|citeseerx=10.1.1.73.8768}} achieved log-squared communication complexity $O(\ell \log n+k \log^2 n)$ , where $\ell$ is the length of the strings and $k$ is the security parameter. The security of his system reduces to the semantic security of a length-flexible additively homomorphic cryptosystem like the Damgård–Jurik cryptosystem. In 2005 Craig Gentry and Zulfikar Ramzan{{cite conference | first1 =Craig | last1 =Gentry | first2 =Zulfikar | last2 =Ramzan | title =Single-Database Private Information Retrieval with Constant Communication Rate | year =2005 | series =LNCS | volume =3580 | book-title =ICALP | publisher =Springer | pages =803–815 | doi =10.1007/11523468_65 | citeseerx =10.1.1.113.6572 }} achieved log-squared communication complexity which retrieves log-square (consecutive) bits of the database. The security of their scheme is also based on a variant of the Phi-hiding assumption. The communication rate was finally brought down to $1$ by Aggelos Kiayias, Nikos Leonardos, Helger Lipmaa, Kateryna Pavlyk, Qiang Tang, in 2015.{{cite conference | first1 =Aggelos | last1 =Kiayias | first2 =Nikos | last2 =Leonardos |first3 =Helger | last3 =Lipmaa | first4 =Kateryna | last4 =Pavlyk |first5 =Qiang | last5 =Tang | title =Optimal Rate Private Information Retrieval from Homomorphic Encryption | year =2015 | volume =2015 | book-title =Proceedings on Privacy Enhancing Technologies 2015 | publisher =DE GRUYTER | pages =222–243 | doi =10.1515/popets-2015-0016| doi-access =free | hdl =20.500.11820/0f84afcb-8ee1-4744-a2ba-4642104c51c5 | hdl-access =free }}

All previous sublinear-communication computational PIR protocol required linear computational complexity of $\Omega (n)$ public-key operations. In 2009, Helger Lipmaa{{cite book|last=Lipmaa|first=Helger|title=Proceedings of the 12th International Conference on Information Security and Cryptology|volume=5984|date=2010|publisher=Springer-Verlag|location=Seoul, Korea|isbn=978-3-642-14423-3|pages=193–210|chapter=First CPIR Protocol with Data-Dependent Computation|doi=10.1007/978-3-642-14423-3_14|series=Lecture Notes in Computer Science|citeseerx=10.1.1.215.7768}} designed a computational PIR protocol with communication complexity $O(\ell \log n+k \log^2 n)$ and worst-case computation of $O (n / \log n)$ public-key operations. Amortization techniques that retrieve non-consecutive bits have been considered by Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky and Amit Sahai.{{cite conference | first1 = Yuval | last1 = Ishai | first2 = Eyal | last2 = Kushilevitz | first3 = Rafail | last3 = Ostrovsky | first4 = Amit | last4 = Sahai | title = Batch codes and their applications | year =2004 | book-title =STOC'04 | publisher =ACM | url =http://web.cs.ucla.edu/~rafail/PUBLIC/62.pdf | pages =262–271 | doi =10.1145/1007352.1007396 | access-date =2015-10-23 }}

As shown by Ostrovsky and Skeith,{{cite book|last=Ostrovsky|first=Rafail|title=Proceedings of the 10th International Conference on Practice and Theory in Public-Key Cryptography|date=2007|publisher=Springer-Verlag|isbn=978-3-540-71677-8|pages=393–411|author2=Skeith III |author3=William E. |chapter=A Survey of Single-Database Private Information Retrieval: Techniques and Applications|doi=10.1007/978-3-540-71677-8_26}} the schemes by Kushilevitz and Ostrovsky and Lipmaa use similar ideas based on homomorphic encryption. The Kushilevitz and Ostrovsky protocol is based on the Goldwasser–Micali cryptosystem while the protocol by Lipmaa is based on the Damgård–Jurik cryptosystem.

Advances in information theoretic PIR

Achieving information theoretic security requires the assumption that there are multiple non-cooperating servers, each having a copy of the database. Without this assumption, any information-theoretically secure PIR protocol requires an amount of communication that is at least the size of the database n. Multi-server PIR protocols tolerant of non-responsive or malicious/colluding servers are called robust or Byzantine robust respectively. These issues were first considered by Beimel and Stahl (2002). An ℓ-server system that can operate where only k of the servers respond, ν of the servers respond incorrectly, and which can withstand up to t colluding servers without revealing the client's query is called "t-private ν-Byzantine robust k-out-of-ℓ PIR" [DGH 2012]. In 2012, C. Devet, I. Goldberg, and N. Heninger (DGH 2012) proposed an optimally robust scheme that is Byzantine-robust to $\nu < k-t-1$ which is the theoretical maximum value. It is based on an earlier protocol of Goldberg that uses Shamir's Secret Sharing to hide the query. Goldberg has released a C++ implementation on SourceForge.[http://percy.sourceforge.net Percy++ / PIR in C++] at SourceForge

Relation to other cryptographic primitives

One-way functions are necessary, but not known to be sufficient, for nontrivial (i.e., with sublinear communication) single database computationally private information retrieval. In fact, such a protocol was proved by Giovanni Di Crescenzo, Tal Malkin and Rafail Ostrovsky to imply oblivious transfer (see below).{{cite conference | first1 =Giovanni | last1 =Di Crescenzo | first2 =Tal | last2 =Malkin | first3 =Rafail | last3 =Ostrovsky | title =Single Database Private Information Retrieval Implies Oblivious Transfer | year =2000 | series =LNCS | volume =1807 | book-title =Eurocrypt 2000 | publisher =Springer | pages =122–138 | doi =10.1007/3-540-45539-6_10 | doi-access =free }}

Oblivious transfer, also called symmetric PIR, is PIR with the additional restriction that the user may not learn any item other than the one she requested. It is termed symmetric because both the user and the database have a privacy requirement.

Collision-resistant cryptographic hash functions are implied by any one-round computational PIR scheme, as shown by Ishai, Kushilevitz and Ostrovsky.{{cite book|last=Ishai|first=Yuval|title=Proceedings of the Second Theory of Cryptography Conference|date=2005|publisher=Springer-Verlag|location=Cambridge, MA, USA|isbn=978-3-540-30576-7|pages=445–456|author2=Kushilevitz, Eyal |author3=Ostrovsky, Rafail |chapter=Sufficient Conditions for Collision-Resistant Hashing|doi=10.1007/978-3-540-30576-7_24}}

PIR variations

The basic motivation for private information retrieval is a family of two-party protocols in which one of the parties (the sender) owns a database, and the other part (the receiver) wants to query it with certain privacy restrictions and warranties. So, as a result of the protocol, if the receiver wants the i-th value in the database he must learn the i-th entry, but the sender must learn nothing about i. In a general PIR protocol, a computationally unbounded sender can learn nothing about i so privacy is theoretically preserved. Since the PIR problem was posed, different approaches to its solution have been pursued and some variations were proposed.

A CPIR (computationally private information retrieval) protocol is similar to a PIR protocol: the receiver retrieves an element chosen by him from the sender's database, so that the sender obtains no knowledge about which element was transferred. The only difference is that privacy is safeguarded against a polynomially bounded sender.{{cite book|last=Saint-Jean|first=Felipe|title=Yale University Technical Report YALEU/DCS/TR-1333|date=2005|chapter-url=http://crypto.stanford.edu/portia/papers/TR1333.pdf|chapter=A Java Implementation of a Single-Database Computationally Symmetric Private Information Retrieval (cSPIR) protocol}}

A CSPIR (computationally symmetric private information retrieval) protocol is used in a similar scenario in which a CPIR protocol is used. If the sender owns a database, and the receiver wants to get the i-th value in this database, at the end of the execution of a SPIR protocol, the receiver should have learned nothing about values in the database other than the i-th one.

PIR implementations

Numerous Computational PIR and Information theoretic PIR schemes in the literature have been implemented. Here is an incomplete list:

MuchPIR{{Cite web|url=https://github.com/ReverseControl/MuchPIR|title = MuchPIR Demo| website=GitHub |date = 14 September 2021}} is a CPIR implementation as a Postgres C/C++ Extension [GitHub, 2021].
SealPIR{{Cite web|url=https://github.com/pung-project/SealPIR|title=SealPIR|website=Github|access-date=2018-06-07}} is a fast CPIR implementation [ACLS 2018].
Popcorn{{Cite web |url=http://www.cs.utexas.edu/~trinabh/papers/popcorn-PIR-nsdi16.pdf |title=Popcorn |access-date=2016-05-26 |archive-url=https://web.archive.org/web/20160821164304/http://www.cs.utexas.edu/~trinabh/papers/popcorn-PIR-nsdi16.pdf |archive-date=2016-08-21 |url-status=dead }} is a PIR implementation tailored for media [GCMSAW 2016].
Percy++ includes implementations of [AG 2007, DGH 2012, CGKS 1998, Goldberg 2007, HOG 2011, LG 2015].
RAID-PIR{{Cite web|url=https://github.com/encryptogroup/RAID-PIR|title=encryptogroup/RAID-PIR|website=GitHub|access-date=2016-05-26}} is an implementation of the ITPIR scheme of [DHS 2014].
XPIR{{Cite web|url=https://github.com/XPIR-team/XPIR|title=XPIR-team/XPIR|website=GitHub|access-date=2016-05-26}} is a fast CPIR implementation [ABFK 2014].
upPIR{{Cite web|url=https://uppir.poly.edu/projects/project|title=upPIR|website=uppir.poly.edu|access-date=2016-05-26|archive-url=https://web.archive.org/web/20160625160724/https://uppir.poly.edu/projects/project|archive-date=2016-06-25|url-status=dead}} is an ITPIR implementation [Cappos 2013].

=Notes=

References

A. Beimel, Y. Ishai, E. Kushilevitz, and J.-F. Raymond. Breaking the $O(n^{1/(2k-1)})$ barrier for information-theoretic private information retrieval. Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, Vancouver, Canada, pages 261–270, 2002.
A. Beimel and Y. Stahl, Robust information-theoretic private information retrieval, in Proceedings of the 3rd International Conference on Security in Communication Networks (SCN'02), pp. 326–341, 2003. Cite is from DGH 2012, op. cit.
[DGH 2012] Casey Devet, Ian Goldberg, and Nadia Heninger, [http://www.cs.uwaterloo.ca/~iang/pubs/orpir-usenix.pdf Optimally Robust Private Information Retrieval], 21st USENIX Security Symposium, August 2012.
[AG 2007] C. Aguilar-Melchor and P. Gaborit. A lattice-based computationally-efficient private information retrieval protocol, Western European Workshop on Research in Cryptology (WEWoRC), 2007.
[CGKS 1998] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, Private information retrieval, Journal of the ACM, 45(6):965–981, 1998.
[Goldberg 2007] I. Goldberg, Improving the robustness of private information retrieval, IEEE Symposium on Security and Privacy (S&P), 2007.
[HOG 2011] R. Henry, F. Olumofin, and I. Goldberg, Practical PIR for electronic commerce, ACM Conference on Computer and Communications Security (CCS), 2011.
[LG 2015] W. Lueks and I. Goldberg, Sublinear scaling for multi-client private information retrieval, International Conference on Financial Cryptography and Data Security (FC), 2015.
[DHS 2014] D. Demmler, A. Herzberg, and T. Schneider, RAID-PIR: Practical multi-server PIR, In Cloud computing security workshop (CCSW), 2014.
[ABFK 2014] C. Aguilar-Melchor, J. Barrier, L. Fousse, and M.-O. Killijian, "XPIR: Private Information Retrieval for Everyone", Cryptology ePrint Archive, Report 2014/1025, 2014.
[GCMSAW 2016] T. Gupta, N. Crooks, W. Mulhern, S. Setty, L. Alvisi, and M. Walfish, [https://web.archive.org/web/20160821164304/http://www.cs.utexas.edu/~trinabh/papers/popcorn-PIR-nsdi16.pdf] Scalable and private media consumption with Popcorn. USENIX NSDI, March 2016.
[Cappos 2013] J. Cappos, Avoiding theoretical optimality to efficiently and privately retrieve security updates, International Conference on Financial Cryptography and Data Security (FC), 2013.
Sergey Yekhanin. New locally decodable codes and private information retrieval schemes, {{ECCC|2006|06|127}}, 2006.
[ACLS 2018] S. Angel, H. Chen, K. Laine, S. Setty, PIR with compressed queries and amortized query processing, IEEE Symposium on Security and Privacy (S&P), 2018.

External links

[http://www.cs.ut.ee/~lipmaa/crypto/link/protocols/oblivious.php Helger Lipmaa's web links on oblivious transfer and PIR]
[http://www.cs.umd.edu/~gasarch/TOPICS/pir/pir.html William Gasarch's website on PIR including survey articles]
[http://www.cs.ucla.edu/~rafail/ Rafail Ostrovsky's website contaiting PIR articles and surveys]

Category:Cryptographic primitives

Category:Theory of cryptography