Hilary Parker

{{Short description|American biostatistician and data scientist}}

{{use mdy dates|date=August 2020}}

{{Infobox scientist

| name = Hilary S. Parker

| fields = Biostatistics, data science

| workplaces = Etsy
Stitch Fix

| alma_mater = Pomona College (BA)
Johns Hopkins Bloomberg School of Public Health (MHS, PhD)

| thesis_url = https://www.worldcat.org/oclc/875289494

| thesis_title = Practical statistical issues in translational genomics

| thesis_year = 2013

| doctoral_advisor = Jeffrey T. Leek

}}

Hilary S. Parker is an American biostatistician and data scientist. She was formerly a senior data analyst at the fashion merchandising company Stitch Fix. Parker co-hosts the data analytics podcast Not So Standard Deviations with Roger Peng. She received her PhD in biostatistics from the Johns Hopkins Bloomberg School of Public Health and has formerly been employed by Etsy.

Life and education

Parker graduated from Pomona College in 2008 with a bachelor's degree in molecular biology and mathematics. After earning her MHS, she obtained her PhD in biostatistics from the Johns Hopkins Bloomberg School of Public Health in 2013.{{cite web|date=28 July 2012|title=About Hilary|url=https://hilaryparker.com/about-hilary-parker/|accessdate=August 10, 2020|website=Not So Standard Deviations|language=en|archive-date=June 23, 2021|archive-url=https://web.archive.org/web/20210623214115/https://hilaryparker.com/about-hilary-parker/|url-status=live}} Parker resides in San Francisco.{{cite web |title=QCon San Francisco {{!}} Hilary Parker {{!}} R Enthusiast, Co-Host of the Not So Standard Deviations Podcast, & Data Scientist @stitchfix |url=https://qconsf.com/speakers/hilary-parker |website=QCon San Francisco 2020 |accessdate=August 10, 2020 |language=en }}{{Dead link|date=October 2023 |bot=InternetArchiveBot |fix-attempted=yes }}

Parker's scientific research began during her PhD in the areas of genomics and personalized medicine. Her research looked at factors like batch effects and their impact on prediction.{{cite journal |last1=Leek |first1=Jeffrey T. |last2=Johnson |first2=W. Evan |last3=Parker |first3=Hilary S. |last4=Jaffe |first4=Andrew E. |last5=Storey |first5=John D. |title=The sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments |journal=Bioinformatics |date=March 15, 2012 |volume=28 |issue=6 |pages=882–883 |doi=10.1093/bioinformatics/bts034 |issn=1367-4803 |pmc=3307112 |pmid=22257669 }}{{cite journal |last1=Parker |first1=Hilary S. |last2=Leek |first2=Jeffrey T. |title=The practical effect of batch on genomic prediction |journal=Statistical Applications in Genetics and Molecular Biology |date=January 16, 2012 |volume=11 |issue=3 |pages=Article-10 |doi=10.1515/1544-6115.1766 |pmid=22611599 |issn=1544-6115 |pmc=3760371}} Working alongside Jeffrey T. Leek, Parker developed methods for the application of genomic technologies in personalized medicine.{{cite journal |last1=Parker |first1=HS |last2=Leek |first2=JT |last3=Favorov |first3=AV |last4=Considine |first4=M |last5=Xia |first5=X |last6=Chavan |first6=S |last7=Chung |first7=CH |last8=Fertig |first8=EJ |title=Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction. |journal=Bioinformatics |date=October 2014 |volume=30 |issue=19 |pages=2757–63 |doi=10.1093/bioinformatics/btu375 |pmid=24907368|pmc=4173013 |doi-access=free }} Batch effects confound data produced by genomic sequencing technologies, like microarrays. Parker's work aims at correcting predictions that are influenced by the batch effect. This helps mitigate the effects of confounded genomic data. This is of importance since the data is used for diagnosis.{{cite web |title=Hilary Parker: Research |url=http://www.biostat.jhsph.edu/~hiparker/Research/ |publisher=Johns Hopkins Bloomberg School of Public Health |accessdate=30 July 2020 |archive-date=December 27, 2017 |archive-url=https://web.archive.org/web/20171227233105/http://www.biostat.jhsph.edu/~hiparker/Research/ |url-status=live }} In her dissertation, "Practical statistical issues in translational genomics," Parker proposed frozen surrogate variable analysis (fSVA) to improve prediction accuracy in public genomic studies and simulations.{{cite journal |last1=Parker |first1=HS |last2=Corrada Bravo |first2=H |last3=Leek |first3=JT |title=Removing batch effects for prediction problems with frozen surrogate variable analysis. |journal=PeerJ |date=2014 |volume=2 |pages=e561 |doi=10.7717/peerj.561 |pmid=25332844|pmc=4179553 |doi-access=free }}

Career and research

After her PhD, Parker went on to work as a data scientist in industry. Her first job was as a data analyst (later, senior data analyst) at Etsy, where she worked for approximately three years.{{cite news|date=January 11, 2016|title=Hilary Parker Gets Crafty with Statistics in Her Not-So-Standard Job|language=en|work=This is Statistics|url=https://thisisstatistics.org/hilary-parker-gets-crafty-with-statistics-in-her-not-so-standard-job/|access-date=July 30, 2020|archive-date=November 7, 2021|archive-url=https://web.archive.org/web/20211107122105/https://thisisstatistics.org/hilary-parker-gets-crafty-with-statistics-in-her-not-so-standard-job/|url-status=live}} Parker self-described her position as an internal statistical consultant, eventually focused on developing A/B testing and other experiments run by the company, along with analyzing the resulting data.{{cite news |last1=Machlis |first1=Sharon |title=RStudio's new enterprise platform moves out of beta |url=https://www.computerworld.com/article/3156750/rstudios-new-enterprise-platform-moves-out-of-beta.html |work=Computerworld |date=January 11, 2017 |language=en |access-date=July 31, 2020 |archive-date=November 11, 2020 |archive-url=https://web.archive.org/web/20201111172740/https://www.computerworld.com/article/3156750/rstudios-new-enterprise-platform-moves-out-of-beta.html |url-status=live }} Opportunity sizing, experimentation and impact analysis all play a role in how she helped the company development.

In 2015, Parker began work on the podcast, Not So Standard Deviations, with co-host Roger Peng.{{cite web|title=Not So Standard Deviations|url=http://nssdeviations.com/|accessdate=August 10, 2020|website=nssdeviations.com|language=en|archive-date=February 28, 2021|archive-url=https://web.archive.org/web/20210228150436/https://nssdeviations.com/|url-status=live}}{{Cite web|last=Lowndes|first=Julia Stewart|date=February 20, 2020|title=rOpenSci's Leadership in #rstats Culture|url=https://www.r-bloggers.com/ropenscis-leadership-in-rstats-culture/|access-date=2020-07-30|website=R-bloggers|language=en-US|archive-date=November 9, 2021|archive-url=https://web.archive.org/web/20211109010757/https://www.r-bloggers.com/2020/02/ropenscis-leadership-in-rstats-culture/|url-status=live}} The pair discuss data analytics, covering statistical computation, data cleaning, and R packages.{{Cite web|title=Not So Standard Deviations: Not Your Average Data Science Podcast|url=https://teachdatascience.com/nssd/|access-date=August 10, 2020|website=Teach Data Science|date=June 4, 2019|archive-date=November 9, 2021|archive-url=https://web.archive.org/web/20211109142818/https://teachdatascience.com/nssd/|url-status=live}} The show is among the more popular data science and statistics podcasts, with over half a million downloads.{{Cite web|date=March 2, 2020|title=25 Super Data Science Podcasts You Must Follow in 2020|url=https://www.techfunnel.com/information-technology/data-science-podcasts/|access-date=2020-07-30|website=Techfunnel|language=en|archive-date=June 17, 2020|archive-url=https://web.archive.org/web/20200617125806/https://www.techfunnel.com/information-technology/data-science-podcasts/|url-status=live}}{{Cite web|last=Choudhury|first=Ambika|date=May 9, 2019|title=Top 15 Data Science Podcasts To Subscribe To In 2019|url=https://analyticsindiamag.com/top-15-data-science-podcasts-to-subscribe-to-in-2019/|access-date=2020-07-30|website=Analytics India Magazine|language=en-US|archive-date=January 19, 2021|archive-url=https://web.archive.org/web/20210119221734/https://analyticsindiamag.com/top-15-data-science-podcasts-to-subscribe-to-in-2019/|url-status=live}} The two also co-authored the book, Conversations on Data Science based on their conversations during the podcast. They recorded their 100th podcast episode live on stage as a keynote presentation at the RStudio-sponsored rstudio::conf 2020.{{Cite web|date=February 6, 2020|title=Not So Standard Deviations Episode 100|url=https://rstudio.com/resources/rstudioconf-2020/not-so-standard-deviations-episode-100/|access-date=2020-07-30|website=rstudio.com|language=en|archive-date=October 8, 2020|archive-url=https://web.archive.org/web/20201008024721/https://rstudio.com/resources/rstudioconf-2020/not-so-standard-deviations-episode-100/|url-status=live}}

After leaving Etsy, Parker transitioned to a career as a data scientist at personal styling site Stitch Fix. The company employs a human-in-the-loop algorithmic process to generate a recommended box of clothing that is shipped to subscribers.{{Cite magazine|first=Arielle |last=Pardes|title=Need Some Fashion Advice? Just Ask the Algorithm|language=en-us|magazine=Wired|url=https://www.wired.com/story/stitch-fix-shop-your-looks/|date=September 12, 2019|issn=1059-1028|archive-url=https://web.archive.org/web/20200407212529/https://www.wired.com/story/stitch-fix-shop-your-looks/|archive-date=April 7, 2020 }}{{Cite web|first=Seth|last=Colaner|date=July 5, 2020|title=How Stitch Fix used AI to personalize its online shopping experience|url=https://venturebeat.com/2020/07/05/how-stitch-fix-used-ai-to-personalize-its-online-shopping-experience/|access-date=2020-07-30|website=VentureBeat|language=en-US|archive-date=February 22, 2021|archive-url=https://web.archive.org/web/20210222230457/https://venturebeat.com/2020/07/05/how-stitch-fix-used-ai-to-personalize-its-online-shopping-experience/|url-status=live}} Parker optimizes the algorithms the site uses to recommend clothes to people and helps determine what data is needed from clients to determine clothing matches. She has worked on new forms of data generation and helped build datasets powering outfits. Parker left Stitch Fix in August 2020 to join the Joe Biden 2020 presidential campaign.{{cite tweet |last=Parker |first=Hilary |user=hspter |number=1294320664732512256 |date=August 14, 2020 |title=Today is my last day at Stitch Fix. Next week I am joining the Biden campaign full-time. 81 days. Let's do this.}}

Parker speaks at conferences,{{Cite web|date=February 12, 2017|title=Opinionated Analysis Development|url=https://rstudio.com/resources/rstudioconf-2017/opinionated-analysis-development/|access-date=2020-07-30|website=rstudio.com|language=en|archive-date=December 2, 2020|archive-url=https://web.archive.org/web/20201202002136/https://rstudio.com/resources/rstudioconf-2017/opinionated-analysis-development/|url-status=live}} often as a keynote speaker.{{Cite web|title=ICOTS10: Scientific Programme: Keynote Speakers|url=https://icots.info/10/?keynotes|access-date=2020-07-30|publisher=International Conference on Teaching Statistics|archive-date=November 26, 2020|archive-url=https://web.archive.org/web/20201126095007/https://icots.info/10/?keynotes|url-status=live}} She coined the term "opinionated analysis development" to describe a framework for producing robust data analysis that resembles some aspects of software design.{{cite journal |last1=Parker |first1=Hilary |title=Opinionated Analysis Development |url=https://peerj.com/preprints/3210.pdf |journal=PeerJ |date=August 2017 |doi=10.7287/peerj.preprints.3210v1 |access-date=August 10, 2020 |archive-date=November 12, 2020 |archive-url=https://web.archive.org/web/20201112153922/https://peerj.com/preprints/3210.pdf |url-status=live |doi-access=free }}

Awards

In 2012, Parker received the Helen Abbey Award from Johns Hopkins. This award is given to a student who intends to teach biostatistics.{{cite web |title=Honors and Awards: The Helen Abbey Award |url=https://www.jhsph.edu/departments/biostatistics/about-us/honors-and-awards/ |website=Johns Hopkins Bloomberg School of Public Health |accessdate=August 10, 2020 |language=en |archive-date=May 2, 2021 |archive-url=https://web.archive.org/web/20210502052833/https://www.jhsph.edu/departments/biostatistics/about-us/honors-and-awards/ |url-status=live }}

Selected works

Parker has contributed to several different publications and projects including the following:

  • {{cite journal |authorlink1=Jeffrey T. Leek|last1=Leek |first1=Jeffrey T. |last2=Johnson |first2=W. Evan |last3=Parker |first3=Hilary S. |last4=Jaffe |first4=Andrew E. |last5=Storey |first5=John D. |title=The sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments |journal=Bioinformatics |date=March 15, 2012 |volume=28 |issue=6 |pages=882–883 |doi=10.1093/bioinformatics/bts034 |issn=1367-4803 |pmc=3307112 |pmid=22257669 }}
  • {{cite journal |last1=Parker |first1=Hilary S. |last2=Leek |first2=Jeffrey T. |title=The practical effect of batch on genomic prediction |journal=Statistical Applications in Genetics and Molecular Biology |date=January 16, 2012 |volume=11 |issue=3 |pages=Article-10 |doi=10.1515/1544-6115.1766 |pmid=22611599 |issn=1544-6115 |pmc=3760371}}
  • {{cite news |last=Parker |first=Hilary |title=Hillary: The Most Poisoned Baby Name in U.S. History |url=https://www.thecut.com/2013/01/hillary-most-poisoned-baby-name-in-us-history.html |work=The Cut |date=January 30, 2013}}
  • {{cite journal |last1=Parker |first1=Hilary S. |last2=Leek |first2=Jeffrey T. |last3=Favorov |first3=Alexander V. |last4=Considine |first4=Michael |last5=Xia |first5=Xiaoxin |last6=Chavan |first6=Sameer |last7=Chung |first7=Christine H. |last8=Fertig |first8=Elana J. |title=Preserving Biological Heterogeneity with a Permuted Surrogate Variable Analysis for Genomics Batch Correction |journal=Bioinformatics |date=October 2014 |volume=30 |issue=19 |pages=2757–2763 |doi=10.1093/bioinformatics/btu375|issn=1460-2059 |pmc=4173013 |pmid=24907368 |url=}}
  • {{cite book |last1=Peng |first1=Roger D. |last2=Parker |first2=Hilary |title=Conversations On Data Science |url=https://leanpub.com/conversationsondatascience |publisher=Leanpub |accessdate=August 10, 2020 |date=2016}}
  • {{cite journal |last1=Parker |first1=Hilary |title=Opinionated Analysis Development |url=https://peerj.com/preprints/3210.pdf |journal=PeerJ |date=August 2017 |doi=10.7287/peerj.preprints.3210v1 |doi-access=free |access-date=August 10, 2020 |archive-date=November 12, 2020 |archive-url=https://web.archive.org/web/20201112153922/https://peerj.com/preprints/3210.pdf |url-status=dead }}

References

{{reflist}}