Rexer's Annual Data Miner Survey
{{Short description|Survey of data science professionals}}
{{multiple issues|
{{COI|date=November 2012}}
{{notability|date=March 2011}}
{{news release|1=article|date=September 2015}}
}}
Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining, data science, and analytics professionals in the industry. It consists of approximately 50 multiple choice and open-ended questions that cover seven general areas of data mining science and practice: (1) Field and goals, (2) Algorithms, (3) Models, (4) Tools (software packages used), (5) Technology, (6) Challenges, and (7) Future. It is conducted as a service (without corporate sponsorship) to the data mining community, and the results are usually announced at the PAW (Predictive Analytics World) conferences and shared via freely available summary reports. In the 2013 survey, 1259 data miners from 75 countries participated.Karl Rexer, Heather Allen, & Paul Gearan (2011); [http://www.rexeranalytics.com/Data-Miner-Survey-Results-2011.html 2011 Data Miner Survey Summary], presented at Predictive Analytics World, Oct. 2011. After 2011, Rexer Analytics moved to a biannual schedule.
Surveys
- 2020 Survey: 579 participants from 71 countries.
- 2017 Survey: 1,123 participants from 91 countries.
- 2015 Survey: 1,220 participants from 72 countries.
- 2013 Survey: 68-item survey; 1,259 participants from 75 countries.
- 2011 Survey: 52-item survey; 1,319 participants from over 60 countries. Citations include:Bob Thompson (2012); [http://www.customerthink.com/interview/big_data_analytics_customer_focused_enterprise_inside_scoop_with_karl_rexer Big Data and Analytics in a Customer-Focused Enterprise: Inside Scoop with Karl Rexer], CustomerThink, August 7, 2012.Selena Welz (2012); Meet R: a programming language that makes sense of Big Data, Technology @ Work, Tendo Communications, November 2012.
- 2010 Survey: 50-item survey; 735 participants from 60 countries.Karl Rexer, Heather Allen, & Paul Gearan (2010); [http://www.rexeranalytics.com/Data-Miner-Survey-Results-2010.html 2010 Data Miner Survey Summary], presented at Predictive Analytics World, Oct. 2010.Karl Rexer, Heather Allen, & Paul Gearan (2011); [http://www.analytics-magazine.org/may-june-2011/320-understanding-data-miners Understanding Data Miners], Analytics Magazine, May/June 2011 (INFORMS: Institute for Operations Research and the Management Sciences). Citations include:Emilia Mikołajewska and Dariusz Mikołajewski (2011); System eksploracji danych na potrzeby obronności państwa], Kwartalnik Bellona, 2011, Volume 3, pages 119-129 (Data Mining system for national security purposes, Bellona Quarterly, Scientific Journal of the Polish Ministry of National Defense; Article is in Polish).Tomasz Ząbkowski (2011); [http://isim.wzim.sggw.pl/resources/ISIM_XIII_2011.pdf Data Mining - Current State and Future Trends], Information Systems in Management XIII, Business Intelligence and Knowledge Management, Warsaw University of Life Sciences Press, Warsaw, 2011, pages 122-130; {{ISBN|978-83-7583-370-6}}.Tuba Islam (2011); [http://www.sas.com/offices/europe/turkey/events/pdfs/BASeries/analytics1.pdf How to use Analytics to Improve Your Business: Real Practices] {{dead link|date=April 2018 |bot=InternetArchiveBot |fix-attempted=yes}}, SAS Business Analytics Series, Istanbul, Turkey, April, 2011 (presentation is in Turkish).Shawn Hessinger (2011); [http://www.allanalytics.com/author.asp?section_id=1412&doc_id=235538 CRM & Marketing Top Fields for Data Miners], All Analytics, November 9, 2011.Gustavo Valencia (2012); [http://www.gustavovalencia.com/app/webroot/img/Documents/DM/DataMining2012IIclase0.pdf Minería de Datos: Sesión 0], Universidad Pontificia Bolivariana, Graduate class: [http://www.gustavovalencia.com/academics/data-mining/dm-course-details Data mining and Information visualization] {{webarchive|url=https://web.archive.org/web/20140111043300/http://www.gustavovalencia.com/academics/data-mining/dm-course-details |date=2014-01-11}}, 2012 (Presentation is in Spanish).Robert A. Muenchen (2012); [http://r4stats.com/articles/popularity/ The Popularity of Data Analysis Software].
- 2009 Survey: 40-item survey; 710 participants from 58 countries.Karl Rexer, Heather Allen, & Paul Gearan (2009); [http://www.rexeranalytics.com/Data-Miner-Survey-Results-2009.html 2009 Data Miner Survey Summary], presented at SPSS Directions Conference, Oct. 2009. Citations include:M. Arthur Munson (2011); [http://www.kdd.org/sites/default/files/issues/13-2-2011-12/V13-02-09-Munson.pdf A Study on the Importance of and Time Spent on Different Modeling Steps] {{webarchive|url=https://web.archive.org/web/20120913062853/http://www.kdd.org/sites/default/files/issues/13-2-2011-12/V13-02-09-Munson.pdf |date=2012-09-13}}, ACM SIGKDD Explorations, Volume 13, Issue 2, December 2011, pages 65-71.Ervina Çergani (2009); Data Mining Survey, Survey of Businesses in Tirana, Albania; July, 2009 (Originally in Albanian, translated into English).Valerie Valentine (2010); [http://www.information-management.com/news/data_miner_survey_shows_positive_signs-10017530-1.html Data Miner Survey Shows Positive Signs], Information Management, March 25, 2010.Ajay Ohri (2009); [http://decisionstats.com/2009/06/09/interview-karl-rexer-rexer-analytics/ Interview Karl Rexer - Rexer Analytics].
- 2008 Survey: 34-item survey; 348 participants from 44 countries.Karl Rexer, Paul Gearan, & Heather Allen (2008); [http://www.rexeranalytics.com/Data-Miner-Survey-Results-2008.html 2008 Data Miner Survey Summary], presented at SPSS Directions Conference, Oct. 2008, and Oracle BIWA (Business Intelligence, Data Warehousing and Advanced Analytics) Summit, Nov. 2008. Citations include:Mayato (2008); [http://www.mayato.com/downloads/Summary_mayato_Data-Mining-Study_2009.pdf Mayato Study: Data Mining Software 2009] {{webarchive|url=https://web.archive.org/web/20120905171352/http://www.mayato.com/downloads/Summary_mayato_Data-Mining-Study_2009.pdf |date=2012-09-05}}, November 2008 (available in German and English).
- 2007 Survey: 27-item survey; 314 participants from 35 countries.Karl Rexer, Paul Gearan, & Heather Allen (2007); [http://www.rexeranalytics.com/Data-Miner-Survey-Results.html 2007 Data Miner Survey Summary], presented at SPSS Directions Conference, Oct. 2007, and Oracle BIWA Summit, Oct. 2007.Karl Rexer, Paul Gearan, & Heather Allen (2008); [http://www.quirks.com/articles/2008/20080309.aspx?searchID=10852254 Portrait of a data miner], Quirk's Marketing Research Media, March 2008.
Recent survey results
While the five Data Miner surveys have covered many data mining topics, the three topics that get the most attention in citations and at conference presentations are:
- Algorithms: Each year the surveys have consistently shown that decision trees, regression, and cluster analysis form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. This is consistent with independent polls of data miners conducted by KDnuggets over the years.Gregory Piatetsky-Shapiro (2011); [http://www.kdnuggets.com/polls/2011/algorithms-analytics-data-mining.html Algorithms for Data Analysis / Data Mining], KDnuggets, 2011.Gregory Piatetsky-Shapiro (2007); [http://www.kdnuggets.com/polls/2007/data_mining_methods.htm Data Mining Methods], KDnuggets, 2007.
- Data Mining Tools: Data miners report using an average of four software tool to conduct their analyses. Over the survey years, R has risen in popularity. In 2010 it overtook SPSS Statistics and SAS to become the tool used by the most data miners. And the 2011 survey showed that R is now being used by close to half of all data miners (47%). STATISTICA has also grown in popularity. From 2007-2009 more data miners indicated that SPSS Clementine (now IBM SPSS Modeler) was their primary data mining tool than any other tool. However, in 2010 and 2011, STATISTICA was cited most frequently as data miners' primary tool. In terms of satisfaction with their tools, in the past few years, STATISTICA, SPSS Modeler, R, KNIME, RapidMiner and Salford Systems have received the strongest satisfaction ratings from data miners in these surveys. The growing popularity of R is consistent with independent polls of data miners conducted by KDnuggets, but the KDnuggets polls show a different picture regarding the popularity of commercial data mining software.David Smith (2012); [http://java.sys-con.com/node/2288420 R Tops Data Mining Software Poll] {{Webarchive|url=https://web.archive.org/web/20161227164409/http://java.sys-con.com/node/2288420 |date=2016-12-27}}, Java Developers Journal, May 31, 2012.Gregory Piatetsky-Shapiro (2011); [http://www.kdnuggets.com/polls/2011/tools-analytics-data-mining.html Data Mining / Analytic Tools Used], KDnuggets, 2011.Gregory Piatetsky-Shapiro (2010); [http://www.kdnuggets.com/polls/2010/data-mining-analytics-tools.html Data Mining / Analytic Tools Used Poll], KDnuggets, 2010. Robert Muenchen has taken a multi-faceted approach to assessing the popularity of data analysis software - an approach that includes blog post counts, Google Scholar data, listserv subscribers, use in competitions, book publications, Google PageRank, and more. His analyses are consistent with the Rexer Analytics Surveys and KDnuggets in outlining the growth of R, but Muenchen illustrates that the popularity of software is more nuanced and one's conclusions will be different depending on what measure of popularity is used. The Rexer Analytics survey summary reports include analyses of the data miners' satisfaction with 20 dimensions of their software. Haughton et al. and Nisbet have also produced reviews of data mining software.Haughton, Dominique; Deichmann, Joel; Eshghi, Abdolreza; Sayek, Selin; Teebagy, Nicholas; and Topi, Heikki (2003); [https://www.jstor.org/pss/30037299 A Review of Software Packages for Data Mining], The American Statistician, Vol. 57, No. 4, pp. 290–309.Nisbet, Robert A. (2006); [http://www.information-management.com/specialreports/20060124/1046025-1.html Data Mining Tools: Which One is Best for CRM? Part 1], Information Management Special Reports, January 2006.
- Challenges: Consistently across the years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners report facing. Participants in the 2010 survey shared best practices for overcoming these challenges.Karl Rexer, Paul Gearan, & Heather Allen (2010); [http://www.rexeranalytics.com/Overcoming_Challenges.html Overcoming Data Mining Challenges], verbatim responses are available online.
References
{{reflist}}
External links
- [http://www.RexerAnalytics.com/ Rexer Analytics home page]
- [http://www.information-management.com/news/data_miner_survey_shows_positive_signs-10017530-1.html Data Miner Survey Shows Positive Signs]
- [http://decisionstats.com/2009/06/09/interview-karl-rexer-rexer-analytics/ 2009 Decisionstats interview of Karl Rexer], President of [http://www.RexerAnalytics.com/ Rexer Analytics]
- [http://r4stats.com/popularity The Popularity of Data Analysis Software]
- [http://www.predictiveanalyticsworld.com Predictive Analytics World]
- [http://www.kdnuggets.com/polls/index.html KDnuggets Polls]: Many single-item polls of data miners conducted from 2000 to the present.