Contact Project Developer Ashish D. Tiwari []
Download Synopsis Abstract
Java JSP Data Mining Data Warehousing BE-Engineering(CO/IT) ME-Engineering(CO/IT) BCS MCS BCA MCA MCM BSC Computer/IT MSC Computer/IT Diploma (CO/IT) IEEE-2016

FiDoop-Parallel Mining of Frequent Itemsets Using MapReduce

We develop FiDoop-HD, an extension of FiDoop, to speed up the mining performance for high-dimensional data analysis

FiDoop-Parallel Mining of Frequent Itemsets Using MapReduce


Existing parallel mining algorithms for frequent itemsets lack a mechanism that enables automatic parallelization, load balancing, data distribution, and fault tolerance on large clusters. As a solution to this problem, we design a parallel frequent itemsets mining algorithm called FiDoop using the MapReduce programming model. To achieve compressed storage and avoid building conditional pattern bases, FiDoop incorporates the frequent items ultrametric tree, rather than conventional FP trees. In FiDoop, three MapReduce jobs are implemented to complete the mining task. In the crucial third MapReduce job, the mappers independently decompose itemsets, the reducers perform combination operations by constructing small ultrametric trees, and the actual mining of these trees separately. We implement FiDoop on our in-house Hadoop cluster. We show that FiDoop on the cluster is sensitive to data distribution and dimensions, because itemsets with different lengths have different decomposition and construction costs. To improve FiDoop’s performance, we develop a workload balance metric to measure load balance across the cluster’s computing nodes. We develop FiDoop-HD, an extension of FiDoop, to speed up the mining performance for high-dimensional data analysis. Extensive experiments using real-world celestial spectral data demonstrate that our proposed solution is efficient and scalable.

Proposed System:

In this paper, we introduced a metric to measure the load balance of FiDoop. As a future research direction, we will apply this metric to investigate advanced load balance strategies in the context of FiDoop. For example, we plan to implement a data-aware load balancing scheme to substantially improve the load-balancing performance of FiDoop. In one of our previous studies, we have addressed the data-placement issue in heterogeneous Hadoop clusters, where data are placed across nodes in a way that each node has a balanced data processing load. Our data placement scheme is conducive to balancing the amount of data stored in each heterogeneous node to achieve improved data-processing performance. We will integrate FiDoop with the data-placement mechanism on heterogeneous clusters. One of the goals is to investigate the impact of heterogeneous data placement strategy on Hadoop-based parallel mining of frequent itemsets. In addition to performance issues, energy savings and thermal management will be of our future research interests. We will propose various approaches to improving energy efficiency of Fidoop running on Hadoop clusters

Comment is Only Available for registered users! Create Account or Login Now!