UDK 004.021 Doi: 10.31772/2587-6066-2019-20-3-312-319
DIFFERENTIAL EVOLUTION IN THE DECISION TREE LEARNING ALGORITHM
S. A. Mitrofanov, E. S. Semenkin
Reshetnev Siberian State University of Science and Technology; 31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037, Russian Federation. E-mail: sergeimitrofanov95@gmail.com
Decision trees (DT) belong to the most effective classification methods. The main advantage of decision trees is a simple and user-friendly interpretation of the results obtained. But despite its well-known advantages the method has some disadvantages as well. One of them is that DT training on high-dimensional data is very time-consuming. The paper considers the way to reduce the DT learning process duration without losses of classification accuracy. There are different algorithms of DT training; the main of them being ID3 and CART algorithms. The paper proposes a modification of DT learning algorithms by means of the information criterion optimization for some selected attribute. The use of this modification allows avoiding optimization by means of enumeration search over the entire data set. The Separation Measure method is used to select the attribute. The method selects the attribute whose class-based averages are most distant from each other. Optimization of the selected attribute is carried out using the method of differential evolution, which is one of the evolutionary modeling methods designed to solve problems of multidimensional optimization. Self-configuring at the population level based on the probabilities of using mutation operator’s variants was applied for differential evolution. The classification problems were solved to compare standard DT learning algorithms with the modified ones. Algorithm efficiency refers to the percentage of correctly classified test sample objects. Statistical analysis based on Student's t-test was carried out to compare the efficiency of the algorithms. The analysis showed that the use of the proposed modification of the DT learning algorithm makes it possible to significantly speed up the training process without losses in the classification effectiveness.
decision tree, classification, optimization, Separation Measure, differential evolution, Population-Level Dynamic Probabilities, Success History Adaptation
References

1. Breiman L., Friedman J. H., Olshen R. A., Stone C. T. Classification and Regression Trees. Wadsworth. Belmont. California. 1984, 128 p.
2. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. Springer, 2009, 189 p.
3. Ross Quinlan J. C4.5: Programs for Machine learning. Morgan Kaufmann Publishers. 1993, 302 p.
4. Quinlan J. R. Induction of decision trees. Machine learning. 1986, No. 1(1), P. 81–106.
5. David L. Davies, Donald W. Bouldin. A Cluster Separation Measure. IEEE Transactions on Pattern
Analysis and Machine Intelligence. 1979, Vol. PAMI-1, Iss. 2, P. 224–227.
6. Storn R. On the usage of differential evolution for function optimization. Biennial Conference of the North American Fuzzy Information Processing Society (NAFIPS). 2009, P. 519–523.
7. Goldberg D. E. Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley. 1989, 432 p.
8. Qin A. K., Suganthan P. N. Self-adaptive differential evolution algorithm for numerical optimization. Proceedings
of the IEEE congress on evolutionary computation (CEC). 2005, P. 1785–1791.
9. Semenkin E. S., Semenkina M. E. Self-configuring Genetic Algorithm with Modified Uniform Crossover
Operator. Advances in Swarm Intelligence. Lecture Notes in Computer Science 7331. Springer-Verlag, Berlin Heidelberg.
2012, P. 414–421.
10. Semenkin E., Semenkina M. Spacecrafts' control systems effective variants choice with self-configuring
genetic algorithm. ICINCO 2012 – Proceedings of the 9th International Conference on Informatics in Control,
Automation and Robotics. 2012, P. 84–93.
11. Tanabe R., Fukunaga A. Success-history based parameter adaptation for Differential Evolution. IEEE Congress on Evolutionary Computation. Cancun. 2013, P. 71–78.
12. Machine Learning Repository. Available at: https://archive.ics.uci.edu/ml/index.php (accessed 19.08.2018).
13. Gmurman V. E. Teoriya veroyatnostey i matematicheskaya statistika [Probability theory and mathematical
statistics]. Moscow, Vysshaya shkola Publ., 2003, P. 303–304 (In Russ.)
14. Ayvazyan S. A., Enyukov I. S., Meshalkin L. D. Prikladnaya statistika: Osnovy modelirovaniya i pervichnaya
obrabotka dannykh [Applied Statistics: Basics of modeling and primary data processing]. Moscow,
Finansy i statistika Publ., 1983, 471 p. (In Russ.).
15. Rumshiskiy L. Z. Matematicheskaya obrabotka rezul’tatov eksperimenta [The mathematical processing of
the experimental results]. Moscow, Nauka Publ., 1971, 192 p. (In Russ.).


Mitrofanov Sergei Alexandrovich – Master student of the Department of System Analysis and Operations
Research; Reshetnev Siberian State University of Science and Technology. E-mail: sergeimitrofanov95@gmail.com.


Semenkin Evgeny Stanislavovich – Dr. Sc., Professor, Professor of the Department of System Analysis and Operations
Research; Reshetnev Siberian State University of Science and Technology. E-mail: eugenesemenkin@yandex.ru.


  DIFFERENTIAL EVOLUTION IN THE DECISION TREE LEARNING ALGORITHM