Empirical algorithmics
{{Short description|Use of empirical methods to study algorithms}}
In computer science, empirical algorithmics (or experimental algorithmics) is the practice of using empirical methods to study the behavior of algorithms. The practice combines algorithm development and experimentation: algorithms are not just designed, but also implemented and tested in a variety of situations. In this process, an initial design of an algorithm is analyzed so that the algorithm may be developed in a stepwise manner.{{cite book|editor-last=Fleischer |editor-first=Rudolf |display-editors=etal |title=Experimental Algorithmics, From Algorithm Design to Robust and Efficient Software| publisher=Springer International Publishing AG| year=2002| url=https://www.springer.com/us/book/9783540003465#aboutBook}}
Overview
Methods from empirical algorithmics complement theoretical methods for the analysis of algorithms.{{cite book| last=Moret| first=Bernard M. E.| s2cid=2221596| title=Towards A Discipline Of Experimental Algorithmics| series=DIMACS Series in Discrete Mathematics and Theoretical Computer Science| publisher=DIMACS Series in Discrete Mathematics and Theoretical Computer Science| year=1999| volume=59| pages=197–213| doi=10.1090/dimacs/059/10| isbn=9780821828922}} Through the principled application of empirical methods, particularly from statistics, it is often possible to obtain insights into the behavior of algorithms such as high-performance heuristic algorithms for hard combinatorial problems that are (currently) inaccessible to theoretical analysis.{{cite book| last=Hromkovic| first=Juraj| title=Algorithmics for Hard Problems| publisher=Springer International Publishing AG| year=2004| url=https://www.springer.com/us/book/9783540441342}} Empirical methods can also be used to achieve substantial improvements in algorithmic efficiency.{{cite journal| journal=Journal of Software| volume=12| number=12| last1=Guzman| first1=John Paul| last2=Limoanco| first2=Teresita| title=An Empirical Approach to Algorithm Analysis Resulting in Approximations to Big Theta Time Complexity| year=2017| url=http://www.jsoftware.us/vol12/308-TE023.pdf}}
American computer scientist Catherine McGeoch identifies two main branches of empirical algorithmics: the first (known as empirical analysis) deals with the analysis and characterization of the behavior of algorithms, and the second (known as algorithm design or algorithm engineering) is focused on empirical methods for improving the performance of algorithms.{{cite book |title=A Guide to Experimental Algorithmics |first = Catherine |last=McGeoch |year=2012 |publisher=Cambridge University Press |isbn= 978-1-107-00173-2}} The former often relies on techniques and tools from statistics, while the latter is based on approaches from statistics, machine learning and optimization. Dynamic analysis tools, typically performance profilers, are commonly used when applying empirical methods for the selection and refinement of algorithms of various types for use in various contexts.{{cite journal| journal=IEEE Transactions on Software Engineering| volume=40| number=12| last1=Coppa| first1=Emilio| last2=Demetrescu| first2=Camil| last3=Finocchi| first3=Irene| title=Input-Sensitive Profiling| pages=1185–1205| year=2014| url=https://www.academia.edu/19232999| doi=10.1109/TSE.2014.2339825| citeseerx=10.1.1.707.4347}}{{cite journal| journal=The Journal of Supercomputing| volume=22| issue=1| pages=99–111| last1=Moret| first1=Bernard M. E.| last2=Bader| first2=David A.| last3=Warnow| first3=Tandy| title=High-Performance Algorithm Engineering for Computational Phylogenetics| year=2002| url=https://infoscience.epfl.ch/record/97856/files/mbw_iccs.pdf| doi=10.1023/a:1014362705613| s2cid=614550}}{{cite conference| work=33rd ACM SIGPLAN Conference on Programming Language Design and Implementation| pages=67–76| last1=Zaparanuks| first1=Dmitrijs| last2=Hauswirth| first2=Matthias| title=Algorithmic Profiling| publisher=ACM Digital Library| year=2012| citeseerx=10.1.1.459.4913}}
Research in empirical algorithmics is published in several journals, including the [http://www.jea.acm.org ACM Journal on Experimental Algorithmics] (JEA) and the [http://www.jair.org Journal of Artificial Intelligence Research] (JAIR). Besides Catherine McGeoch, well-known researchers in empirical algorithmics include Bernard Moret, Giuseppe F. Italiano, Holger H. Hoos, David S. Johnson, and Roberto Battiti.{{cite journal| journal=Ubiquity| volume=2011| issue=August| title=On experimental algorithmics: an interview with Catherine McGeoch and Bernard Moret| publisher=ACM Digital Library| year=2011| url=https://ubiquity.acm.org/article.cfm?id=2015997}}
Performance profiling in the design of complex algorithms
In the absence of empirical algorithmics, analyzing the complexity of an algorithm can involve various theoretical methods applicable to various situations in which the algorithm may be used.{{cite web| last=Grzegorz| first=Mirek| title=Big-O Ambiguity| publisher=performant code_| year=2018| url=http://performantcode.com/complexity/big-o-inaccuracy/}} Memory and cache considerations are often significant factors to be considered in the theoretical choice of a complex algorithm, or the approach to its optimization, for a given purpose.{{cite web| last=Kölker| first=Jonas| title=When does Big-O notation fail?| publisher=Stack Overflow| year=2009| url=https://stackoverflow.com/questions/941283/when-does-big-o-notation-fail}}{{cite web| last=Lemire| first=Daniel| title=Big-O notation and real-world performance| publisher=WordPress| year=2013| url=https://lemire.me/blog/2013/07/11/big-o-notation-and-real-world-performance/}} Performance profiling is a dynamic program analysis technique typically used for finding and analyzing bottlenecks in an entire application's code{{cite web| work=dotTrace 2018.1 Help| title=Finding Application Bottlenecks| publisher=JetBrains| year=2018| url=https://www.jetbrains.com/help/profiler/Studying_Profiling_Results__Finding_Application_Bottlenecks.html}}{{cite web| last=Shmeltzer| first=Shay| work=Oracle Technology Network JDeveloper documentation| title=Locating Bottlenecks in Your Code with the Event Profiler| publisher=Oracle Corp.| year=2005| url=http://www.oracle.com/technetwork/developer-tools/jdev/event-profiler-082901.html}}{{cite book| last1=Shen| first1=Du| last2=Poshyvanyk| first2=Denys| last3=Luo| first3=Qi| last4=Grechanik| first4=Mark| title=Proceedings of the 2015 International Symposium on Software Testing and Analysis| chapter=Automating performance bottleneck detection using search-based application profiling| pages=270–281| publisher=ACM Digital Library| year=2015| chapter-url=https://www.cs.uic.edu/~drmark/index_htm_files/ISSTA15-GAProf.pdf| doi=10.1145/2771783.2771816| isbn=9781450336208| s2cid=8625903}} or for analyzing an entire application to identify poorly performing code.{{ cite web| work=The Profile Learning Center| title=Performance & Memory Profiling and Code Coverage| publisher=SmartBear Software| year=2018| url=https://smartbear.com/learn/code-profiling/#_ga=2.227401693.1994316574.1526935817-1104429500.1526935817}} A profiler can reveal the code most relevant to an application's performance issues.{{cite web| last=Janssen| first=Thorben| work=Stackify Developer Tips, Tricks and Resources| title=11 Simple Java Performance Tuning Tips| year=2017| url=https://stackify.com/java-performance-tuning/}}
A profiler may help to determine when to choose one algorithm over another in a particular situation.{{cite book| last1=O'Sullivan| first1=Bryan| last2=Stewart| first2=Don| last3=Goerzen| first3=John| title=Real World Haskell| chapter=25. Profiling and optimization| publisher=O'Reilly Media| year=2008| chapter-url=http://book.realworldhaskell.org/read/profiling-and-optimization.html#id679553}} When an individual algorithm is profiled, as with complexity analysis, memory and cache considerations are often more significant than instruction counts or clock cycles; however, the profiler's findings can be considered in light of how the algorithm accesses data rather than the number of instructions it uses.{{cite web| last=Linden| first=Doug| publisher=Second Life Wiki| year=2007| title=Profiling and Optimization| url=http://wiki.secondlife.com/wiki/Profiling_and_Optimization}}
Profiling may provide intuitive insight into an algorithm's behavior{{cite web| last=Pattis| first=Richard E.| title=Analysis of Algorithms, Advanced Programming/Practicum, 15-200| publisher=School of Computer Science, Carnegie Mellon University| year=2007| url=https://www.cs.cmu.edu/~pattis/15-1XX/15-200/lectures/aa/lecture.html#Profiling}} by revealing performance findings as a visual representation.{{cite web| work=Advanced R| last=Wickham| first=Hadley| title=Optimising code| publisher=Chapman and Hall/CRC| year=2014| url=http://adv-r.had.co.nz/Profiling.html#measure-perf}} Performance profiling has been applied, for example, during the development of algorithms for matching wildcards. Early algorithms for matching wildcards, such as Rich Salz' wildmat algorithm,{{cite web| last=Salz| first=Rich| title=wildmat.c| publisher=GitHub| year=1991| url=https://github.com/trevor/transmission/blob/master/libtransmission/wildmat.c}} typically relied on recursion, a technique criticized on grounds of performance.{{cite web| last=Cantatore| first=Alessandro| title=Wildcard matching algorithms| year=2003| url=http://xoomer.virgilio.it/acantato/dev/wildcard/wildmatch.html}} The Krauss matching wildcards algorithm was developed based on an attempt to formulate a non-recursive alternative using test cases{{cite magazine| last=Krauss| first=Kirk| title=Matching Wildcards: An Algorithm| magazine=Dr. Dobb's Journal| year=2008| url=http://www.drdobbs.com/architecture-and-design/matching-wildcards-an-algorithm/210200888}} followed by optimizations suggested via performance profiling,{{cite magazine| last=Krauss| first=Kirk| title=Matching Wildcards: An Empirical Way to Tame an Algorithm| magazine=Dr. Dobb's Journal| year=2014| url=http://www.drdobbs.com/architecture-and-design/matching-wildcards-an-empirical-way-to-t/240169123}} resulting in a new algorithmic strategy conceived in light of the profiling along with other considerations.{{cite web| last=Krauss| first=Kirk| title=Matching Wildcards: An Improved Algorithm for Big Data| publisher=Develop for Performance| year=2018| url=http://www.developforperformance.com/MatchingWildcards_AnImprovedAlgorithmForBigData.html}} Profilers that collect data at the level of basic blocks{{cite web| last=Grehan| first=Rick| title=Code Profilers: Choosing a Tool for Analyzing Performance| publisher=Freescale Semiconductor| year=2005| url=https://www.nxp.com/docs/en/white-paper/CWCODEWP.pdf}} If, on the other hand, you need to step through your code with microscopic accuracy, fine-tuning individual machine instructions, then an active profiler with cycle-counting cannot be beat. or that rely on hardware assistance{{cite journal| last=Hough| first=Richard|display-authors=etal| journal=Proceedings of Workshop on Introspective Architecture| title=Cycle-Accurate Microarchitecture Performance Evaluation| publisher=Georgia Institute of Technology| year=2006| citeseerx=10.1.1.395.9306}} provide results that can be accurate enough to assist software developers in optimizing algorithms for a particular computer or situation.{{cite conference| work=IEEE International conference on Emerging trends in Computing, Communication and Nanotechnology| last1=Khamparia| first1=Aditya| last2=Banu| first2=Saira| title=Program Analysis with Dynamic Instrumentation Pin and Performance Tools| publisher=IEEE Xplore Digital Library| year=2013| url=https://www.researchgate.net/publication/259620907}} Performance profiling can aid developer understanding of the characteristics of complex algorithms applied in complex situations, such as coevolutionary algorithms applied to arbitrary test-based problems, and may help lead to design improvements.{{cite journal| last1=Jaskowski| first1=Wojciech| last2=Liskowski| first2=Pawel| last3=Szubert| first3=Marcin Grzegorz| last4=Krawiec| first4=Krzysztof| journal=Applied Mathematics and Computer Science| volume=26| pages=216| title=The performance profile: A multi-criteria performance evaluation method for test-based problems| publisher=De Gruyter| year=2016| url=https://www.degruyter.com/downloadpdf/j/amcs.2016.26.issue-1/amcs-2016-0015/amcs-2016-0015.pdf}}