UDK УДК 519.68 Doi: 10.31772/2587-6066-2018-19-4-589-597
FILLING THE GAPS IN THE INPUT AND OUTPUT DATA USING THE ALGORITHM OF NONPARAMETRIC IDENTIFICATION
P. A. Osipov, Y. S. Osipova, A. V. Khorkush, P. E. Vdovykh, M. V. Verkhoturova
Siberian Federal University, Institute of Space and Information Technology, 26, Kirensky Str., Krasnoyarsk, 660074, Russian Federation
The task of identifying systems, that is, determining the structure and parameters of systems from observations, is one of the main tasks of a modern theory and technology of automatic control. The accuracy of solving the identification problem directly depends on the quality of the initial data (sample of observations). However, the data may contain various shortcomings, in particular, gaps. Gaps in the data are due to a variety of reasons, such as inability to observe, lack of necessary tools, and so on. The easiest method of working with such data is to exclude from the table an indicator (column) or an object (line) with a space. With a large number of gaps in the data, this approach leads to a reduction in the accuracy of the model due to a reduction in the sample size. It is important to note that in the described case the complexity of solving the identification problem increases, especially when the density of passes is high, their location is irregular, and the data is insufficient (very little). The aim of the paper is to improve the accuracy of solving the problem of identifying discrete-continuous multidimensional processes from samples of observations with gaps. To achieve this goal, methods of mathematical statistics, data analysis, and mathematical modelings were used. In the article the algorithm of a non-parametric estimation of the regression curve in a discrete-continuous process in the task of filling out the admissions of the observation matrix is described. Moreover, a model is built based on this algorithm. Two computational experiments were carried out. The first experiment was conducted in the presence of gaps in the output variable matrix of observations. The second experiment was conducted with gaps in the input variables. The experiments were conducted at different sample sizes. Based on the results of the algorithm under various conditions, conclusions are given. The results of the work can be useful in creating control systems for multidimensional discrete-continuous processes.
Keywords: nonparametric identification, regression curve estimation, modeling, data analysis, data gaps.
References

1. Karlov I. A. [Methods for restoring missing values using the DataMining toolkit]. Vestnik SibGAU. 2011, Vol. 161, No. 7 (40), P. 29–33 (In Russ.).

2. L’yung L. Identifikatsiya sistem [Identification of systems]. Moscow, Nauka Publ., 1991, 423 p.

3. Raybman N. S. Chto takoe identifikatsiya [What is identification]. Moscow, Nauka Publ., 1970, 119 p.

4. Tsypkin Ya. Z. Adaptatsiya i obuchenie v avtomaticheskikh sistemakh [Adaptation and training in automatic systems]. Moscow, Nauka Publ., 1968, 400 p.

5. Eykkhoff P. Osnovy identifikatsii sistem upravleniya [Basics of Identification of Management Systems] Moscow, Mir Publ., 1975, 681 p.

6. Keesman Karel J. Sistema identifikatsii. Vvedenie [System identification. An introduction]. London, Springer, 2011, 351 p.

7. Ruban A. I. Metody analiza dannykh [Methods of data analysis: a tutorial]. Krasnoyarsk, IPTs KGTU Publ., 2004, 319 p.

8. Shulenin V. P. Matematicheskaya statistika. Ch. 2. Neparametricheskaya statistika [Math statistics. Part 2. Nonparametric statistics]. Tomsk, NTL Publ., 2012, 388 p.

9. Korneyeva A. A., Medvedev A. V. [To the analysis of data in the identification problem] Kibernetika i vysokie tekhnologii XXI veka: trudy XIII mezhdunarodnoy nauchno-tekhnicheskoy konferentsii [Cybernetics and high technologies of the XXI century: proceedings of the XIII international scientific and technical conference]. Voronezh, 2012, P. 52–62 (In Russ.).

10. Semenov A. D., Artamonov D. V., Bryukhachev A. V. Identifikatsiya ob’’ektov upravleniya [Identification of management objects: a tutorial]. Penza, publishing house of the Penza state university, 2003, 211 p.

11. Medvedev A. V. [Analysis of data in the identification problem]. Komp’yuternyy analiz dannykh modelirovaniya. 1995, Vol. 2, P. 201–206 (In Russ.).

12. Khardle V. Prikladnaya neparametricheskaya regressiya [Applied nonparametric regression]. Moscow, Mir Publ., 1993, 349 p.

13. Nadaraya E. A. Neparametricheskoe otsenivanie plotnosti veroyatnostey i krivoy regressii [Nonparametric estimation of probability density and regression curve]. Tbilisi, Izdatel’stvo Tbilisskogo universiteta Publ., 1983, 194 p.

14. Gasser T. Yadrovaya otsenka funktsii regressii [Kernel estimation of regression function]. Heidelberg, Springer, 1979, P. 23–68.

15. Zagoruyko N. G. Metody raspoznavaniya i ikh primenenie [Methods of recognition and their application]. Moscow, Sovetskoe Radio Publ., 1972.


Osipov Pavel Andreevich – Master’s degree student, Institute of Space and information technologies, Siberian

Federal University. Е-mail: uoo-ikit@mail.ru.

Osipova Jana Sergeevna – Master’s degree student, Institute of Space and information technologies, Siberian

Federal University. Е-mail: yana_is_storm@mail.ru.

Khorkush Anatolii Vladimirovich – Master’s degree student, Institute of Space and information technologies,

Siberian Federal University. Е-mail: cloha@mail.ru.

Vdovykh Polina Sergeevna – student, Institute of Space and information technologies, Siberian Federal University.

Е-mail: polina.vdovykh@gmail.com.

Verkhoturova Mariya Vladimirovna – student, Institute of Space and information technologies, Siberian Federal

University. Е-mail: adventuretime66@yandex.ru.


  FILLING THE GAPS IN THE INPUT AND OUTPUT DATA USING THE ALGORITHM OF NONPARAMETRIC IDENTIFICATION