CONFABULATION BASE INFREQUENT WEIGHTED ITEMSET MINING USING FREQUENT PATTERN GROWTH
High utility itemset mining (HUIM) has emerged as an important research topic in data mining, with applications to retail-market data analysis, stock market prediction, and recommender systems, etc. However there are very few empirical studies that systematically compare the performance of state-of-the-art HUIM algorithms. In this paper, we present an experimental evaluation on major HUIM algorithms, using real world and synthetic datasets to evaluate their performance. Our experiments show that EFIM and d2HUP are generally the top two performers in running time, while EFIM also consumes the least memory in most cases. In order to compare these two algorithms in depth, we use another synthetic datasets with varying parameters so as to study the influence of the related parameters, in particular the number of transactions, the number of distinct items and average transaction length, on the running time and memory consumption of EFIM and d2HUP. In this work, we demonstrate that, d2HUP is more efficient than EFIM under low minimum utility values and with large sparse datasets, in terms of running time; although EFIM is the fastest in dense real datasets, it is among the slowest algorithms in sparse datasets. Suggest that, when a dataset is very sparse or the average transaction length is large, and running time is favoured over memory consumption, d2HUP should be chosen. Finally, compare d2HUP and EFIM with two newest algorithms, HUI -Miner and ULB-Miner, and find these two algorithms have moderate performance. This work has reference value for researchers and practitioners when choosing the most appropriate HUIM algorithm for their specific applications.