Data Preprocessing for Modeling Socioeconomic Systems in View of Uncertainty
Краткое описание
The paper addresses the issue of data preparation for contemporary models of socioeconomic systems. Such models mainly focus on the low level of determinism and high fundamental complexity associated with the peculiarities of human behavior. The paper puts forward a technique of data preprocessing for the analysis and modeling socioeconomic systems in view of uncertainty. Five basic approaches to data preprocessing are considered, including maximum likelihood estimation, Bayesian data analysis, spline interpolation, probability distribution approximation, and an imprecise probabilistic model based on hierarchical intervals. For each method examined, we analyze its sources and assess its applicability to modeling, as well as its rapidity and accuracy in terms of general data preprocessing. It has been found that the methods of accounting for probabilities differ significantly depending on the field of application. In numerous instances, the primary aim is optimization, and optimization techniques are tailored to a particular task. All methods imply probabilities of events known at the time of the modeling start and, often, known a posteriori or approximating probability distributions. The proposed technique prioritizes the accuracy and rapidity of data preprocessing. Following the technique framework, a decision tree is introduced, whereby an informed decision regarding the approach selection can be made by relying on expert opinions on the modeling issue. In the future, it is proposed to develop formal criteria for selecting a method of data preprocessing. Furthermore, the present study may undergo refinement by testing the suggested approach in diverse socioeconomic systems, and by elucidating the diverse iterations of the decision tree in application to specific tasks.
Ключевые слова
Data preprocessing, Modeling, Socioeconomic systems, Maximum likelihood estimation, Bayesian data analysis, Spline interpolation, Probability distribution approximation, Imprecise probabilistic model