UDK 616.37-002 Doi: 10.31772/2587-6066-2019-20-2-153-159
APPLIED CLASSIFICATION PROBLEMS USING RIDGE REGRESSION
Kononova N. V., Mangalova Е. S., Stroev А. V., Cherdantsev D. V., Chubarova О. V.
Siberian Federal University, 79, Svobodny Av., 660041, Krasnoyarsk, Russian Federation; ООО “RD Science”, 19, Kirova St., Krasnoyarsk, 660017, Russian Federation; Krasnoyarsk State Medical University named after Prof. V. F. Voino-Yasenetsky, 1, Partizana Zheleznyaka St., Krasnoyarsk, 660022, Russian Federation; Reshetnev Siberian State University of Science and Technology, 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037, Russian Federation
The rapid development of technical devices and technology allows monitoring the properties of different physical nature objects with very small discreteness of the data. As a result, one can accumulate large amounts of data that can be used with advantage to manage an object, a multiply connected system, and a technological enterprise. However, regardless of the field of activity, the tasks associated with small amounts of data remains. In this case the dynamics of data accumulation depends on the objective limitations of the external world and the environment. The conducted research concerns high-dimensional data with small sample sizes. In this connection, the task of selecting informative features arises, which will allow both to improve the quality of problem solving by eliminating “junk” features, and to increase the speed of decision making, since algorithms are usually dependent on the dimension of the feature space, and simplify the data collection procedure (do not collect uninformative data). As the number of features can be large, it is impossible to use a complete search of all features spaces. Instead of it, for the selection of informative features, we propose a two-step random search algorithm based on the genetic algorithm uses: at the first stage, the search with limiting the number of features in the subset to reduce the feature space by eliminating “junk” features, at the second stage - without limitation, but on a reduced set features. The original problem formulation is the task of supervised classification when the object class is determined by an expert. The object attributes values vary depending on its state, which makes it belong to one or another class, that is, statistics has an offset in class. Without breaking the generality, for carrying out simulation modeling, a two-alternative formulation of the supervised classification task was used. Data from the field of medical diagnostics of the disease severity were used to generate training samples.
Keywords: small samples, supervised classification, ridge-regression, quantile transformation, meta-classifier, significance of features, genetic algorithm.
References

1. Vafaie H., De Jong K. Robust Feature Selection Algorithms. Proceedings of the IEEE International Conference on Tools with Artificial Intelligence. 1993, P. 356–363.

2. Cormen T. H., Leiserson C. E., Rivest R. L., Stein C. Introduction to Algorithms. 3rd edition. The MIT Press. 2009, 1320 p.

3. Narendra P., Fukunaga K. A Branch and Bound Algorithm for Feature Subset Selection. IEEE Transactions on Computers. 1977, Vol. 26, P. 917–922.

4. Foroutan I., Sklansky J. Feature Selection for Automatic Classification of non- Gaussian Data. IEEE Transactions on Systems, Man and Cybernetics. 1987, Vol. 17, P. 187–198.

5. Kira K., Rendell L. A Practical Approach to Feature Selection. Proceedings of the Ninth International Conference on Machine Learning (Morgan Kaufmann). 1992, P. 249–256.

6. Modrzejewski M. Feature Selection Using Rough Sets Theory. Proceedings of the European Conference on Machine Learning (Springer). 1993, P. 213–226.

7. Liu H., Setiono R. Chi2: Feature Selection and Discretization of Numeric Attributes. Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence. 1995.

8. John G., Kohavi R., Peger K. Irrelevant Features and the Subset Selection Problem. Machine Learning: Proceedings of the Eleventh International Conference (Morgan Kaufmann). 1994, P. 121–129.

9. Kohavi R., Frasca B. Useful Feature Subsets and Rough Set Reducts. Third International Workshop on Rough Sets and Soft Computing. 1994.

10. Kohavi R. Feature Subset Selection as Search with Probabilistic Estimates. AAAI Fall Symposium on Relevance. 1994 .

11. Koller D., Sahami M. Toward Optimal Feature Selection. Machine Learning: Proceedings of the Thirteenth International Conference (Morgan Kaufmann). 1996.

12. Liu H., Setiono R. A Probabilistic Approach to Feature Selection – A Filter Solution. Proceedings of the Thirteenth International Conference on Machine Learning (Morgan Kaufmann). 1996.

13. Liu H., Setiono R. Feature Selection and Classification – A Probabilistic Wrapper Approach. Proceedings of the Ninth International Conference on Industrial and Engineering Applications of AI and ES. 1996.

14. Siedlecki W., Sklansky J. A Note on Genetic Algorithms for Large-scale Feature Selection. IEEE Transactions on Computers. 1989, Vol. 10, P. 335–347.

15. Punch W., Goodman E., Pei M., Chia-Shun L., Hovland P., Enbody R. Further Research on Feature Selection and lassification Using Genetic Algorithms. Proceedings of the International Conference on Genetic Algorithms (Springer). 1993, P. 557–564.

16. Brill F., Brown D., Martin W. Fast Genetic Selection of Features for Neural Network Classiers. IEEE Transactions on Neural Networks. 1992, Vol. 3(2), P. 324–328.

17. Richeldi M., & Lanzi P. Performing Effective Feature Selection by Investigating the Deep Structure of the Data. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (AAAI Press). 1996, P. 379–383.

18. Goldberg D. Genetic Algorithms in Search, Optimization, and Machine Learning. New York, Addison-Wesley, 1989.

19. Mitchell M. An Introduction to Genetic algorithms. Cambridge, MA: MIT Press. 1996.

20. Dreiper N., Smit G. Applied regression analysis. 1986, 351 p.



Kononova Nadezhda Vladimirovna – Сand. Sc., associate professor; Informational systems department, Siberian Federal University. E-mail: koplyarovanv@mail.ru.

Mangalova Ekaterina Sergeevna – software developer; “RD Scienc” (Research. Development. Science.). E-mail: e.s.mangalova@hotmail.com.

Stroev Anton Vladimirovich – Postgraduate Student; Krasnoyarsk State Medical University named after Prof. V. F. Voino-Yasenetsky. E-mail: antoxa134@mail.ru.

Cherdantsev Dmitry Vladimirovich – Dr. Sc., Professor; Krasnoyarsk State Medical University named after Prof. V. F. Voino-Yasenetsky. E-mail: gs7@mail.ru.

Chubarova Olesya Victorovna – Сand. Sc., associate professor; System analysis and operations research department,
Reshetnev Siberian State University of Science and Technology. E-mail: kuznetcova_o@mail.ru.


  APPLIED CLASSIFICATION PROBLEMS USING RIDGE REGRESSION