Group method of data handling
Group method of data handling (GMDH) is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric optimization of models.
GMDH is used in such fields as data mining, knowledge discovery, prediction, complex systems modeling, optimization and pattern recognition.{{cite book|last1=Madala|first1=H.R.|last2=Ivakhnenko|first2=O.G.|title=Inductive Learning Algorithms for Complex Systems Modeling|date=1994|publisher=CRC Press|location=Boca Raton|isbn=978-0849344381|url=http://articles.gmdh.net/theory/GMDHbook.zip|access-date=2019-11-17|archive-url=https://web.archive.org/web/20171231104312/http://articles.gmdh.net/theory/GMDHbook.zip|archive-date=2017-12-31|url-status=dead}} GMDH algorithms are characterized by inductive procedure that performs sorting-out of gradually complicated polynomial models and selecting the best solution by means of the external criterion. The last section of {{Cite journal |last=Farlow |first=Stanley J. |date=November 1981 |title=The GMDH Algorithm of Ivakhnenko |url=http://www.tandfonline.com/doi/abs/10.1080/00031305.1981.10479358 |journal=The American Statistician |language=en |volume=35 |issue=4 |pages=210–215 |doi=10.1080/00031305.1981.10479358 |issn=0003-1305}} contains a summary of the applications of GMDH in the 1970s.
Other names include "polynomial feedforward neural network",{{Cite journal |last1=Nikolaev |first1=N.Y. |last2=Iba |first2=H. |date=March 2003 |title=Learning polynomial feedforward neural networks by genetic programming and backpropagation |url=https://ieeexplore.ieee.org/document/1189632 |journal=IEEE Transactions on Neural Networks |language=en |volume=14 |issue=2 |pages=337–350 |doi=10.1109/TNN.2003.809405 |pmid=18238017 |issn=1045-9227}} or "self-organization of models". It was one of the first deep learning methods, used to train an eight-layer neural net in 1971.{{Cite journal|last=Ivakhnenko|first=Alexey|date=1971|title=Polynomial theory of complex systems|url=http://gmdh.net/articles/history/polynomial.pdf |journal=IEEE Transactions on Systems, Man, and Cybernetics |pages=364–378|doi=10.1109/TSMC.1971.4308320|volume=SMC-1|issue=4}}{{cite journal |last=Schmidhuber |first=Jürgen |year=2015 |title=Deep learning in neural networks: An overview |journal=Neural Networks |volume=61 |pages=85–117 |arxiv=1404.7828 |doi=10.1016/j.neunet.2014.09.003 |pmid=25462637 |s2cid=11715509}}
Mathematical content
= Polynomial regression =
This is the general problem of statistical modelling of data: Consider a dataset , with points. Each point contains observations, and one target to predict. How to best predict the target based on the observations?
First, we split the full dataset into two parts: a training set and a validation set. The training set would be used to fit more and more model parameters, and the validation set would be used to decide which parameters to include, and when to stop fitting completely.
The GMDH starts by considering degree-2 polynomial in 2 variables. Suppose we want to predict the target using just the parts of the observation, and using only degree-2 polynomials, then the most we can do is this:where the parameters are computed by linear regression. Now, the parameters depend on which we have chosen, and we do not know which we should choose, so we choose all of them. That is, we perform all such polynomial regressions:obtaining polynomial models of the dataset.
We do not want to accept all the polynomial models, since it would contain too many models. To only select the best subset of these models, we run each model on the validation dataset, and select the models whose mean-square-error is below a threshold. We also write down the smallest mean-square-error achieved as .
Suppose that after this process, we have obtained a set of models. We now run the models on the training dataset, to obtain a sequence of transformed observations: . The same algorithm can now be run again.Image:Combinatorial_GMDH_optimal_complexity.png
The algorithm continues, giving us . As long as each is smaller than the previous one, the process continues, giving us increasingly deep models. As soon as some , the algorithm terminates. The last layer fitted (layer ) is discarded, as it has overfit the training set. The previous layers are outputted.
More sophisticated methods for deciding when to terminate are possible. For example, one might keep running the algorithm for several more steps, in the hope of passing a temporary rise in .
= In general =
Instead of a degree-2 polynomial in 2 variables, each unit may use higher-degree polynomials in more variables:
:
{\sum\limits_{j = i}^n {a_{i j} } } x_i x_j+\sum\limits_{i = 1}^n
{\sum\limits_{j = i}^n{\sum\limits_{k = j}^n {a_{i j k} } } }x_i x_j x_k+\cdots
And more generally, a GMDH model with multiple inputs and one output is a subset of components of the base function (1):
:
where fi are elementary functions dependent on different sets of inputs, ai are coefficients and m is the number of the base function components.
= External criteria =
External criteria are optimization objectives for the model, such as minimizing mean-squared error on the validation set, as given above. The most common criteria are:
- Criterion of Regularity (CR) – least mean squares on a validation set.
- Least squares on a cross-validation set.
- Criterion of Minimum bias or Consistency – squared difference between the estimated outputs (or coefficients vectors) of two models fit on the A and B set, divided by squared predictions on the B set.
Idea
Like linear regression, which fits a linear equation over data, GMDH fits arbitrarily high orders of polynomial equations over data.{{cite book |last1=Ivakhnenko |first1=O.G. |url=http://articles.gmdh.net/theory/bookNoiseIm.pdf |title=Pomekhoustojchivost' Modelirovanija (Noise Immunity of Modeling) |last2=Stepashko |first2=V.S. |date=1985 |publisher=Naukova Dumka |location=Kyiv |access-date=2019-11-18 |archive-url=https://web.archive.org/web/20171231104218/http://articles.gmdh.net/theory/bookNoiseIm.pdf |archive-date=2017-12-31 |url-status=dead}}{{cite book |last1=Ivakhnenko |first1=O.G. |url=https://archive.org/details/cyberneticsforec0000ivak |title=Cybernetics and Forecasting Techniques |last2=Lapa |first2=V.G. |date=1967 |publisher=American Elsevier |edition=Modern Analytic and Computational Methods in Science and Mathematics, v.8 |url-access=registration}}
To choose between models, two or more subsets of a data sample are used, similar to the train-validation-test split.
GMDH combined ideas from:{{Cite journal |last1=Ivakhenko |first1=A.G. |last2=Savchenko |first2=E.A.. |last3=Ivakhenko |first3=G.A. |date=October 2003 |title=Problems of future GMDH algorithms development |url=http://www.tandfonline.com/doi/abs/10.1080/0232929032000115029 |journal=Systems Analysis Modelling Simulation |language=en |volume=43 |issue=10 |pages=1301–1309 |doi=10.1080/0232929032000115029 |issn=0232-9298}} black box modeling, successive genetic selection of pairwise features,Ivakhnenko, Aleksei G., and Grigorii A. Ivakhnenko. "Problems of further development of the group method of data handling algorithms. Part I." Pattern Recognition and Image Analysis c/c of raspoznavaniye obrazov i analiz izobrazhenii 10.2 (2000): 187-194. the Gabor's principle of "freedom of decisions choice",{{cite book |last1=Gabor |first1=D. |title=Perspectives of Planning. Organization of Economic Cooperation and Development |date=1971 |publisher=Imp.Coll. |location=London}} and the Beer's principle of external additions.{{cite book |last1=Beer |first1=S. |title=Cybernetics and Management |date=1959 |publisher=English Univ. Press |location=London}}
Inspired by an analogy between constructing a model out of noisy data, and sending messages through a noisy channel,{{cite book |last1=Ivahnenko |first1=O.G. |url=http://articles.gmdh.net/theory/bookInductModel.pdf |title=Inductive Method of Models Self-organisation for Complex Systems |date=1982 |publisher=Naukova Dumka |location=Kyiv |access-date=2019-11-18 |archive-url=https://web.archive.org/web/20171231104130/http://articles.gmdh.net/theory/bookInductModel.pdf |archive-date=2017-12-31 |url-status=dead}} they proposed "noise-immune modelling": the higher the noise, the less parameters must the optimal model have, since the noisy channel does not allow more bits to be sent through.
The model is structured as a feedforward neural network, but without restrictions on the depth, they had a procedure for automatic models structure generation, which imitates the process of biological selection with pairwise genetic features.
History
Image:Photo of Prof. Alexey G. Ivakhnenko.jpg
The method was originated in 1968 by Prof. Alexey G. Ivakhnenko in the Institute of Cybernetics in Kyiv.
Period 1968–1971 is characterized by application of only regularity criterion for solving of the problems of identification, pattern recognition and short-term forecasting. As reference functions polynomials, logical nets, fuzzy Zadeh sets and Bayes probability formulas were used. Authors were stimulated by very high accuracy of forecasting with the new approach. Noise immunity was not investigated.
Period 1972–1975. The problem of modeling of noised data and incomplete information basis was solved. Multicriteria selection and utilization of additional priory information for noiseimmunity increasing were proposed. Best experiments showed that with extended definition of the optimal model by additional criterion noise level can be ten times more than signal. Then it was improved using Shannon's Theorem of General Communication theory.
Period 1976–1979. The convergence of multilayered GMDH algorithms was investigated. It was shown that some multilayered algorithms have "multilayerness error" – analogous to static error of control systems. In 1977 a solution of objective systems analysis problems by multilayered GMDH algorithms was proposed. It turned out that sorting-out by criteria ensemble finds the only optimal system of equations and therefore to show complex object elements, their main input and output variables.
Period 1980–1988. Many important theoretical results were received. It became clear that full physical models cannot be used for long-term forecasting. It was proved, that non-physical models of GMDH are more accurate for approximation and forecast than physical models of regression analysis. Two-level algorithms which use two different time scales for modeling were developed.
Since 1989 the new algorithms (AC, OCC, PF) for non-parametric modeling of fuzzy objects and SLP for expert systems were developed and investigated.{{cite journal |last1=Ivakhnenko |first1=O.G. |last2=Ivakhnenko |first2=G.A. |date=1995 |title=The Review of Problems Solvable by Algorithms of the Group Method of Data Handling (GMDH) |url=http://www.gmdh.net/articles/review/algorith.pdf |journal=Pattern Recognition and Image Analysis |volume=5 |issue=4 |pages=527–535 |citeseerx=10.1.1.19.2971 }} Present stage of GMDH development can be described as blossom out of deep learning neuronets and parallel inductive algorithms for multiprocessor computers. Such procedure is currently used in deep learning networks.{{cite journal |last1=Takao |first1=S. |last2=Kondo |first2=S. |last3=Ueno |first3=J. |last4=Kondo |first4=T. |date=2017 |title=Deep feedback GMDH-type neural network and its application to medical image analysis of MRI brain images |journal=Artificial Life and Robotics |volume=23 |issue=2 |pages=161–172 |doi=10.1007/s10015-017-0410-1 |s2cid=44190434}}
GMDH-type neural networks
There are many different ways to choose an order for partial models consideration. The very first consideration order used in GMDH and originally called multilayered inductive procedure is the most popular one. It is a sorting-out of gradually complicated models generated from base function. The best model is indicated by the minimum of the external criterion characteristic. Multilayered procedure is equivalent to the Artificial Neural Network with polynomial activation function of neurons. Therefore, the algorithm with such an approach usually referred as GMDH-type Neural Network or Polynomial Neural Network. Li showed that GMDH-type neural network performed better than the classical forecasting algorithms such as Single Exponential Smooth, Double Exponential Smooth, ARIMA and back-propagation neural network.{{Cite journal|last1=Li|first1=Rita Yi Man |last2=Fong |first2=Simon |last3=Chong|first3=Kyle Weng Sang |date=2017 |title=Forecasting the REITs and stock indices: Group Method of Data Handling Neural Network approach |journal=Pacific Rim Property Research Journal |volume=23 |issue=2 |pages=123–160 |doi=10.1080/14445921.2016.1225149|s2cid=157150897 }}
Combinatorial GMDH
Another important approach to partial models consideration that becomes more and more popular is a combinatorial search that is either limited or full. This approach has some advantages against Polynomial Neural Networks, but requires considerable computational power and thus is not effective for objects with a large number of inputs. An important achievement of Combinatorial GMDH is that it fully outperforms linear regression approach if noise level in the input data is greater than zero. It guarantees that the most optimal model will be founded during exhaustive sorting.
Basic Combinatorial algorithm makes the following steps:
- Divides data sample at least into two samples A and B.
- Generates subsamples from A according to partial models with steadily increasing complexity.
- Estimates coefficients of partial models at each layer of models complexity.
- Calculates value of external criterion for models on sample B.
- Chooses the best model (set of models) indicated by minimal value of the criterion.
- For the selected model of optimal complexity recalculate coefficients on a whole data sample.
In contrast to GMDH-type neural networks Combinatorial algorithm usually does not stop at the certain level of complexity because a point of increase of criterion value can be simply a local minimum, see Fig.1.
Algorithms
- Combinatorial (COMBI)
- Multilayered Iterative (MIA)
- GN
- Objective System Analysis (OSA)
- Harmonical
- Two-level (ARIMAD)
- Multiplicative–Additive (MAA)
- Objective Computer Clusterization (OCC);
- Pointing Finger (PF) clusterization algorithm;
- Analogues Complexing (AC)
- Harmonical Rediscretization
- Algorithm on the base of Multilayered Theory of Statistical Decisions (MTSD)
- Group of Adaptive Models Evolution (GAME)
Software implementations
- [https://web.archive.org/web/20080213145150/http://neuron.felk.cvut.cz/game/project.html FAKE GAME Project] — Open source. Cross-platform.
- [https://web.archive.org/web/20080418084252/http://research.guilan.ac.ir/gevom/ GEvom] — Free upon request for academic use. Windows-only.
- [https://gmdhsoftware.com/predictive-analytics-software GMDH Shell] — GMDH-based, predictive analytics and time series forecasting software. Free Academic Licensing and Free Trial version available. Windows-only.
- [http://www.knowledgeminer.eu/about.html KnowledgeMiner] — Commercial product. Mac OS X-only. Free Demo version available.
- [http://pnn.pnnsoft.com/index.html PNN Discovery client] — Commercial product.
- [http://sourceforge.net/projects/sciengyrpf/ Sciengy RPF!] — Freeware, Open source.
- [http://wgmdh.irb.hr/en/project/ wGMDH] — Weka plugin, Open source.
- [https://cran.r-project.org/web/packages/GMDH/ R Package] – Open source.
- [https://CRAN.R-project.org/package=GMDHreg R Package for regression tasks] – Open source.
- [https://github.com/kvoyager/GmdhPy/ Python library of MIA algorithm] - Open source.
- [https://pypi.org/project/gmdh Python library of basic GMDH algorithms (COMBI, MULTI, MIA, RIA)] - Open source.
References
{{reflist}}
Further reading
- A.G. Ivakhnenko. [http://www.gmdh.net/articles/history/heuristic.pdf Heuristic Self-Organization in Problems of Engineering Cybernetics], Automatica, vol.6, 1970 — p. 207-219.
- S.J. Farlow. Self-Organizing Methods in Modelling: GMDH Type Algorithms. New-York, Bazel: Marcel Decker Inc., 1984, 350 p.
- H.R. Madala, A.G. Ivakhnenko. [http://gmdh.net/articles/theory/GMDHbook.pdf Inductive Learning Algorithms for Complex Systems Modeling]. CRC Press, Boca Raton, 1994.
External links
- [http://gmdh.net/articles/index.html Library of GMDH books and articles]
- [http://gmdh.net Group Method of Data Handling]
Category:Computational statistics
Category:Artificial neural networks
Category:Classification algorithms