Surrogate data

{{Short description|Time series data technique}}

Surrogate data, sometimes known as analogous data, usually refers to time series data that is produced using well-defined (linear) models like ARMA processes that reproduce various statistical properties like the autocorrelation structure of a measured data set.{{cite journal

|title=Generating surrogate data for time series with several simultaneously measured variables

|url=http://public.lanl.gov/jt/Papers/SurrogateMultiple.pdf

|journal=Physical Review Letters

|volume=73

|number=7

|pages=951–954

|year=1994

|author1=Prichard

|author2=Theiler

|doi=10.1103/physrevlett.73.951

|pmid=10057582|arxiv=comp-gas/9405002

|bibcode=1994PhRvL..73..951P

|s2cid=32748996

}} The resulting surrogate data can then for example be used for testing for non-linear structure in the empirical data; this is called surrogate data testing.

Surrogate or analogous data also refers to data used to supplement available data from which a mathematical model is built. Under this definition, it may be generated (i.e., synthetic data) or transformed from another source.{{cite thesis |degree=M.Sc. |last=Kaefer |first=Paul E. |date=2015 |title=Transforming Analogous Time Series Data to Improve Natural Gas Demand Forecast Accuracy |publisher=Marquette University |url=http://epublications.marquette.edu/theses_open/320/ |access-date=2016-02-18 |archive-date=2016-03-12 |archive-url=https://web.archive.org/web/20160312044649/http://epublications.marquette.edu/theses_open/320/ |url-status=live }}

Uses

Surrogate data is used in environmental and laboratory settings, when study data from one source is used in estimation of characteristics of another source.{{cite web |url=http://www.caslab.com/Surrogate_Data_Meaning/ |title=Surrogate Data Meaning |publisher=Columbia Analytical Services, Inc., now ALS Environmental |quote=What is Surrogate Data? Data from studies of test organisms or a test substance that are used to estimate the characteristics or effects on another organism or substance. |access-date=February 15, 2017 |archive-date=February 16, 2017 |archive-url=https://web.archive.org/web/20170216130237/http://www.caslab.com/Surrogate_Data_Meaning/ |url-status=live }} For example, it has been used to model population trends in animal species.{{cite journal |title=The Use of Surrogate Data in Demographic Population Viability Analysis: A Case Study of California Sea Lions |first1=Claudia J. |last1=Hernández-Camacho |first2=Victoria. J. |last2=Bakker |first3=David |last3=Aurioles-Gamboa |first4=Jeff |last4=Laake |first5=Leah R. |last5=Gerber |author-link5=Leah Gerber|editor=Aaron W. Reed |journal=PLOS ONE |volume=10 |number=9 |date=September 2015 |doi=10.1371/journal.pone.0139158 |pmid=26413746 |page=e0139158|bibcode=2015PLoSO..1039158H |pmc=4587556 |doi-access=free }} It can also be used to model biodiversity, as it would be difficult to gather actual data on all species in a given area.{{cite journal |title=Environmental diversity: on the best-possible use of surrogate data for assessing the relative biodiversity of sets of areas |first1=D.P. |last1=Faith |first2=P.A. |last2=Walker |journal=Biodiversity and Conservation |volume=5 |number=4 |year=1996 |publisher=Springer Nature |pages=399–415 |doi=10.1007/BF00056387|bibcode=1996BiCon...5..399F |s2cid=24066193 }}

Surrogate data may be used in forecasting. Data from similar series may be pooled to improve forecast accuracy.{{cite book |chapter=Forecasting Analogous Time Series |title=Principles of Forecasting: A Handbook for Researchers and Practitioners |first1=George T. |last1=Duncan |first2=Wilpen L. |last2=Gorr |first3=Janusz |last3=Szczypula |editor=J. Scott Armstrong |editor-link=J. Scott Armstrong |publisher=Kluwer Academic Publishers |year=2001 |pages=195–213 |isbn=0-7923-7930-6}} Use of surrogate data may enable a model to account for patterns not seen in historical data.{{cite conference |title=Using Surrogate Data to Mitigate the Risks of Natural Gas Forecasting on Unusual Days |url=https://forecasters.org/wp-content/uploads/gravity_forms/7-621289a708af3e7af65a7cd487aee6eb/2015/07/Kaefer_Paul_ISF2015.pdf |first1=Paul E. |last1=Kaefer |first2=Babatunde |last2=Ishola |first3=Ronald H. |last3=Brown |first4=George F. |last4=Corliss |conference=International Institute of Forecasters: 35th International Symposium on Forecasting |year=2015 |website=forecasters.org/isf |access-date=2022-07-20 |archive-date=2021-05-17 |archive-url=https://web.archive.org/web/20210517041956/https://forecasters.org/wp-content/uploads/gravity_forms/7-621289a708af3e7af65a7cd487aee6eb/2015/07/Kaefer_Paul_ISF2015.pdf |url-status=live }}

Another use of surrogate data is to test models for non-linearity. The term surrogate data testing refers to algorithms used to analyze models in this way.{{cite journal |first1=Thomas |last1=Schreiber |first2=Andreas |last2=Schmitz |title=Surrogate time series |journal=Physica D |volume=142 |issue=3–4 |pages=346–382 |year=1999 |doi=10.1016/s0167-2789(00)00043-9|arxiv=chao-dyn/9909037|bibcode=2000PhyD..142..346S |citeseerx=10.1.1.46.3999 |s2cid=13889229 }} These tests typically involve generating data, whereas surrogate data in general can be produced or gathered in many ways.

Methods

One method of surrogate data is to find a source with similar conditions or parameters, and use those data in modeling. Another method is to focus on patterns of the underlying system, and to search for a similar pattern in related data sources (for example, patterns in other related species or environmental areas).

Rather than using existing data from a separate source, surrogate data may be generated through statistical processes, which may involve random data generation using constraints of the model or system.

See also

References

{{Reflist|30em}}

Further reading

  • {{Cite journal | last1 = Schreiber | first1 = T. | last2 = Schmitz | first2 = A. | doi = 10.1103/PhysRevLett.77.635 | title = Improved Surrogate Data for Nonlinearity Tests | journal = Physical Review Letters | volume = 77 | issue = 4 | pages = 635–638 | year = 1996 | pmid = 10062864|bibcode = 1996PhRvL..77..635S | arxiv = chao-dyn/9909041 | s2cid = 13193081 }}

Category:Statistical data types

Category:Nonlinear time series analysis