PhotoDNA

PhotoDNA is a proprietary image-identification and content filtering technology{{cite arXiv|first1=Matthijs|last1=Douze|first2=Giorgos|last2=Tolias|first3=Ed|last3=Pizzi|first4=Zoë|last4=Papakipos|title=The 2021 Image Similarity Dataset and Challenge|date=2022-02-21|first5=Lowik|last5=Chanussot|first6=Filip|last6=Radenovic|first7=Tomas|last7=Jenicek|first8=Maxim|last8=Maximov|first9=Laura|last9=Leal-Taixé|first10=Ismail|last10=Elezi|first11=Ondřej|last11=Chum|first12=Cristian Canton|last12=Ferrer|class=cs.CV |eprint=2106.09672 |quote=Image fingerprints, such as PhotoDNA from Microsoft, are used throughout the industry to identify images that depict child exploitation and abuse}} widely used by online service providers.{{cite web|access-date=2022-08-21|title=The Rise of Content Cartels|url=http://knightcolumbia.org/content/the-rise-of-content-cartels|website=knightcolumbia.org|date=2020-02-11}}{{cite news |last1=Hill |first1=Kashmir |date=2022-08-21 |title=A Dad Took Photos of His Naked Toddler for the Doctor. Google Flagged Him as a Criminal. |url=https://www.nytimes.com/2022/08/21/technology/google-surveillance-toddler-photo.html |access-date=2022-08-21 |newspaper=The New York Times |issn=0362-4331}}

History

PhotoDNA was developed by Microsoft Research and Hany Farid, professor at Dartmouth College, beginning in 2009. From a database of known images and video files, it creates unique hashes to represent each image, which can then be used to identify other instances of those images.{{cite web|publisher=Microsoft Corporation|title=New Technology Fights Child Porn by Tracking Its "PhotoDNA"|url=https://news.microsoft.com/2009/12/15/new-technology-fights-child-porn-by-tracking-its-photodna/#sm.0001mpmupctevct7pjn11vtwrw6xj |date=December 15, 2009|access-date=September 9, 2016}}

The hashing method initially relied on converting images into a black-and-white format, dividing them into squares, and quantifying the shading of the squares,{{cite web |url=http://www.microsoft.com/global/en-us/news/publishingimages/ImageGallery/Images/Infographics/PhotoDNA/flowchart_photodna_Web.jpg |title=Photo DNA: Step by step |publisher=Microsoft |access-date=February 11, 2014 |url-status=dead |archive-url=https://web.archive.org/web/20130921055218/http://www.microsoft.com/global/en-us/news/publishingimages/ImageGallery/Images/Infographics/PhotoDNA/flowchart_photodna_Web.jpg |archive-date=September 21, 2013 |df=mdy-all }} did not employ facial recognition technology, nor could it identify a person or object in the image.{{citation needed|date=August 2022}} The method sought to be resistant to alterations in the image, including resizing and minor color alterations.

Since 2015,{{cite web|url=https://news.microsoft.com/on-the-issues/2018/09/12/how-photodna-for-video-is-being-used-to-fight-online-child-exploitation/|title = How PhotoDNA for Video is being used to fight online child exploitation|date = September 12, 2018}} similar methods are used for individual video frames in video files.{{cite web|url=https://news.microsoft.com/on-the-issues/2018/09/12/how-photodna-for-video-is-being-used-to-fight-online-child-exploitation/|title=How PhotoDNA for Video is being used to fight online child exploitation|date=September 12, 2018|publisher=news.microsoft.com}}

Microsoft donated{{failed verification|date=August 2022}} the PhotoDNA technology to Project VIC, managed and supported by the International Centre for Missing & Exploited Children (ICMEC) and used as part of digital forensics operations{{cite web|url=http://gcn.com/articles/2014/08/27/image-analysis-exploited-children.aspx|title=Improved image analysis tools speed exploited children cases|publisher=GCN|date=August 27, 2014|first=William|last=Jackson}}{{cite magazine|url=https://www.wired.co.uk/news/archive/2014-04/30/video-fingerprints-child-abuse|title=Child abuse-tracking tech donated to the world|magazine=Wired UK|date=April 30, 2014|first=Liat|last=Clark}} by storing "fingerprints" that can be used to uniquely identify an individual photo.{{cite web|url=http://ec.europa.eu/justice/news/consulting_public/0009/contributions/registered_organisations/099_microsoft.pdf|title=Microsoft's response to the consultation on the European Commission Communication on the Rights of the Child (2011–2014)|archive-url=https://web.archive.org/web/20171024052111/http://ec.europa.eu/justice/news/consulting_public/0009/contributions/registered_organisations/099_microsoft.pdf|archive-date=October 24, 2017}}, European Commission The database includes hashes for millions of items.{{cite web|url=https://www.bbc.com/news/technology-26612059|title=Cloud-based archive tool to help catch child abusers|work=BBC News|date=March 23, 2014|first=Mark|last=Ward}}

In December 2014, Microsoft made PhotoDNA available to qualified organizations in a software as a service model for free through the Azure Marketplace.{{cite web|publisher=Microsoft Corporation|title=PhotoDNA Cloud Service|url=http://www.microsoft.com/en-us/photodna|website=Microsoft.com|access-date=February 19, 2015}}

In the 2010s and 2020s, PhotoDNA was put forward in connection with policy proposals relating to content moderation and internet censorship, including US Senate hearings (2019 on "digital responsibility", 2022 on the EARN IT Act{{cite web|last1=Thu|first2=Feb 10th 2022 03:30pm-Berin|last2=Szoka|first3=Ari|last3=Cohn|access-date=2022-08-21|title=The Top Ten Mistakes Senators Made During Today's EARN IT Markup|url=https://www.techdirt.com/2022/02/10/top-ten-mistakes-senators-made-during-todays-earn-it-markup/|date=2022-02-10|website=Techdirt}}) and various proposals by the European Commission dubbed "upload filters" by civil society{{cite web|first1=Christoph|last1=Schmon|access-date=2022-08-21|title=The EU Commission's Refusal to Let Go of Filters|url=https://www.eff.org/deeplinks/2021/06/eu-commissions-guidance-article-17-did-not-let-go-filters|date=2021-06-03|website=Electronic Frontier Foundation}}{{cite web|access-date=2022-08-21|title=Upload filters: a danger to free internet content?|url=https://www.ionos.com/digitalguide/websites/digital-law/upload-filters/|website=IONOS Digitalguide|date=March 28, 2019 }} such as so-called voluntary codes (in 2016{{cite web|access-date=2022-08-21|title=Fighting illegal online hate speech: first assessment of the new code of conduct|url=https://ec.europa.eu/newsroom/just/items/50840|website=ec.europa.eu|date=2016-12-06}} on hate speech{{cite web|url=https://ec.europa.eu/info/policies/justice-and-fundamental-rights/combatting-discrimination/racism-and-xenophobia/eu-code-conduct-countering-illegal-hate-speech-online_en |title=The EU Code of conduct on countering illegal hate speech online | European Commission |publisher=Ec.europa.eu |date= |accessdate=2022-08-29}} after 2015 events, 2018{{cite web | url=https://digital-strategy.ec.europa.eu/en/news/code-practice-disinformation | title=Code of Practice on Disinformation | Shaping Europe's digital future | date=September 26, 2018 }} and 2022{{cite web | url=https://digital-strategy.ec.europa.eu/en/policies/code-practice-disinformation | title=The 2022 Code of Practice on Disinformation | Shaping Europe's digital future | date=March 24, 2023 }} on disinformation), copyright legislation (chiefly the 2019 copyright directive debated between 2014{{cite web | url=https://oeil.secure.europarl.europa.eu/oeil/popups/ficheprocedure.do?lang=en&reference=2014/2256(INI) | title=Procedure File: 2014/2256(INI) | Legislative Observatory | European Parliament }} and 2021{{CELEX|52021DC0288|text=COMMUNICATION FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT AND THE COUNCIL Guidance on Article 17 of Directive 2019/790 on Copyright in the Digital Single Market}}), terrorism-related regulations (TERREG){{cite web | url=https://home-affairs.ec.europa.eu/policies/internal-security/counter-terrorism-and-radicalisation/prevention-radicalisation/terrorist-content-online_en | title=Terrorist content online }} and internet wiretapping regulations (2021 "chat control").{{cite web|first1=Markus|last1=Reuter|first2=Tomas|last2=Rudl|first3=Franziska|last3=Rau|first4=Holly|last4=Hildebr|access-date=2022-08-21|title=Why chat control is so dangerous|url=https://edri.org/our-work/why-chat-control-is-so-dangerous/|website=European Digital Rights (EDRi)}}

In 2016, Hany Farid proposed to extend usage of the technology to terrorism-related content.{{cite news|last1=Waddell|first1=Kaveh|title=A Tool to Delete Beheading Videos Before They Even Appear Online|url=https://www.theatlantic.com/technology/archive/2016/06/a-tool-to-delete-beheading-videos-before-they-even-appear-online/488105/|access-date=10 September 2016|work=The Atlantic|date=June 22, 2016}} In December 2016, Facebook, Twitter, Google and Microsoft announced plans to use PhotoDNA to remove extremist content such as terrorist recruitment videos or violent terrorist imagery.{{Cite news|url=http://newsroom.fb.com/news/2016/12/partnering-to-help-curb-spread-of-online-terrorist-content/|title=Partnering to Help Curb Spread of Online Terrorist Content {{!}} Facebook Newsroom|access-date=2016-12-06}} In 2018 Facebook stated that PhotoDNA was used to automatically remove al-Qaeda videos.{{cite news|url=http://www.europarl.europa.eu/website/webstreaming.html?event=20180619-0900-COMMITTEE-IMCO|author=Richard Allan|author-link=Richard Allan|date=2018-06-18|title=Hearing at 11:14}} in {{cite web|url=http://www.europarl.europa.eu/committees/en/events-hearings.html?id=20180613CHE04321|title=The EU's horizontal regulatory framework for illegal content removal in the DSM}}

By 2019, big tech companies including Microsoft, Facebook and Google publicly announced that since 2017 they were running the GIFCT as a shared database of content to be automatically censored. As of 2021, Apple was thought to be using NeuralHash for similar purposes.{{cite journal|first1=Hal|last1=Abelson|first2=Ross|last2=Anderson|first3=Steven M.|last3=Bellovin|first4=Josh|last4=Benaloh|title=Bugs in our pockets: The risks of client-side scanning|date=2024|first5=Matt|last5=Blaze|first6=Jon|last6=Callas|first7=Whitfield|last7=Diffie|first8=Susan|last8=Landau|first9=Peter G.|last9=Neumann|first10=Ronald L.|last10=Rivest|first11=Jeffrey I.|last11=Schiller|first12=Bruce|last12=Schneier|first13=Vanessa|last13=Teague|first14=Carmela|last14=Troncoso|journal=Journal of Cybersecurity |volume=10 |doi=10.1093/cybsec/tyad020 |arxiv=2110.07450 }}

In 2022, The New York Times covered the story of two dads whose Google accounts were closed after photos they took of their child for medical purposes were automatically uploaded to Google's servers.{{cite news|first1=Kashmir|last1=Hill|access-date=2022-08-21|title=A Dad Took Photos of His Naked Toddler for the Doctor. Google Flagged Him as a Criminal.|url=https://www.nytimes.com/2022/08/21/technology/google-surveillance-toddler-photo.html|newspaper=The New York Times|date=2022-08-21|issn=0362-4331 |quote=A bigger breakthrough came along almost a decade later, in 2018, when Google developed an artificially intelligent tool that could recognize never-before-seen exploitative images of children. [...] When Mark's and Cassio's photos were automatically uploaded from their phones to Google's servers, this technology flagged them.}} The article compares PhotoDNA, which requires a database of known hashes, with Google's AI-based technology, which can recognize previously unseen exploitative images. {{cite web|access-date=2022-08-28|title=Google Flagged Parents' Photos of Sick Children as Sexual Abuse|url=https://gizmodo.com/google-csam-photodna-1849440471|date=2022-08-22|website=Gizmodo|quote=According to Google, those incident reports come from multiple sources, not limited to the automated PhotoDNA tool.}}{{cite web|first1=Emma|last1=Roth|access-date=2022-08-28|title=Google AI flagged parents' accounts for potential abuse over nude photos of their sick kids|url=https://www.theverge.com/2022/8/21/23315513/google-photos-csam-scanning-account-deletion-investigation|date=2022-08-21|website=The Verge|quote=Google has used hash matching with Microsoft's PhotoDNA for scanning uploaded images to detect matches with known CSAM. [...] In 2018, Google announced the launch of its Content Safety API AI toolkit that can “proactively identify never-before-seen CSAM imagery so it can be reviewed and, if confirmed as CSAM, removed and reported as quickly as possible.” It uses the tool for its own services and, along with a video-targeting CSAI Match hash matching solution developed by YouTube engineers, offers it for use by others as well.}}

Usage

Microsoft originally used PhotoDNA on its own services including Bing and OneDrive.{{cite web|url=http://www.makeuseof.com/tag/unfortunate-truths-about-child-pornography-and-the-internet-feature/|title=Unfortunate Truths about Child Pornography and the Internet [Feature]|work=MUO |date=December 7, 2012}} As of 2022, PhotoDNA was widely used by online service providers for their content moderation efforts{{cite book|url=https://books.google.com/books?id=zjXn3zA7HFcC&pg=PA514|title=International Perspectives on the Assessment and Treatment of Sexual Offenders: Theory, Practice and Research|first1=Reinhard |last1=Eher |first2=Leam A. |last2=Craig |first3=Michael H. |last3=Miner |first4=Friedemann |last4=Pfäfflin |publisher=John Wiley & Sons|page=514|isbn=978-1119996200|year=2011}}{{cite book|url=https://books.google.com/books?id=lr_NNbSPfbAC&q=icmec+children&pg=PA317|title=Living with Grief: Coping with Public Tragedy|page=317|isbn=1135941513|publisher=Routledge |year=2004|first1=Marcia |last1=Lattanzi-Licht |first2=Kenneth |last2=Doka }} including Google's Gmail, Twitter,{{cite news|last=Arthur|first=Charles|title=Twitter to introduce PhotoDNA system to block child abuse images|url=https://www.theguardian.com/technology/2013/jul/22/twitter-photodna-child-abuse|access-date=July 22, 2013|newspaper=The Guardian|date=July 22, 2013}} Facebook,{{cite news|last=Smith|first=Catharine|title=Facebook Adopts Microsoft PhotoDNA To Remove Child Pornography|url=http://www.huffingtonpost.com/2011/05/20/facebook-photodna-microsoft-child-pornography_n_864695.html|access-date=July 22, 2013|newspaper=Huffington Post|date=May 2, 2011}} Adobe Systems,{{cite web|title=Adobe & PhotoDNA|url=https://www.adobe.com/in/legal/lawenforcementrequests/photodna.html|access-date=2021-08-27|website=www.adobe.com|language=en}} Reddit,{{cite web|title=Reddit use PhotoDNA to prevent child pornography|url=https://www.redditinc.com/policies/transparency-report-2019|date=March 19, 2020}} and Discord.{{cite web|date=2021-04-02|title=Discord Transparency Report: July — Dec 2020|url= https://discord.com/blog/discord-transparency-report-july-dec-2020 |access-date=2022-05-08|website=Discord Blog|language=en}}

The UK Internet Watch Foundation, which has been compiling a reference database of PhotoDNA signatures, reportedly had over 300,000 hashes of known child sexual exploitation materials.{{citation needed|date=August 2022}}

Another source of the database was the National Center for Missing & Exploited Children (NCMEC).{{cite web|url=https://www.theguardian.com/technology/2014/aug/07/microsoft-tip-police-child-abuse-images-paedophile|title=Microsoft tip led police to arrest man over child abuse images|work=The Guardian|date=August 7, 2014}}{{cite news|last=Salcito|first=Anthony|title=Microsoft donates PhotoDNA technology to make the Internet safer for kids|url=http://blogs.msdn.com/b/microsoftuseducation/archive/2009/12/17/microsoft-donates-photodna-technology-to-make-the-internet-safer-for-kids.aspx|access-date=July 22, 2013|date=December 17, 2009}}

PhotoDNA is widely used to remove content, disable accounts, and report people.

Inverting

In 2021, Anish Athalye was able to partially invert PhotoDNA hashes with a neural network, which raises concerns about the reversibility of a PhotoDNA hash.{{cite web |last=Athalye |first=Anish |title=Inverting PhotoDNA |url=https://www.anishathalye.com/2021/12/20/inverting-photodna/ |date=December 20, 2021}}