Business intelligence systems and mathematical models for decision-making can achieve accurate and effective results only when the input data are highly reliable.
However, the data extracted from the available primary sources and gathered into a data mart may have several anomalies which analysts must identify and correct.
This chapter deals with the activities involved in the creation of a high-quality dataset for subsequent use for business intelligence and data mining analysis. Several techniques can be employed to reach this goal: data validation, to identify and remove anomalies and inconsistencies; data integration and transformation, to improve the accuracy and efficiency of learning algorithms; data size reduction and discretization, to obtain a dataset with a lower number of attributes and records but which is as informative as the original dataset.
0 Comments