ARPN Journal of Science and Technology Logo

ARPN Journal of Science and Technology >> Volume 7, Issue 2, November 2017

ARPN Journal of Science and Technology

An Effective Method to Preprocess the Data in Web Usage Mining

Full Text Pdf Pdf
Author B.Uma Maheswari, P.Sumathi
ISSN 2225-7217
On Pages 277-282
Volume No. 3
Issue No. 3
Issue Date April 01, 2013
Publishing Date April 01, 2013
Keywords Data preprocessing, Web usage mining, Path completion algorithm, Data cleaning, User session identification.


The Web mining field encompasses a wide array of issues, primarily aimed at deriving actionable knowledge from the Web, and includes researchers from information retrieval, database technologies, and artificial intelligence. Most data used for mining is collected from Web servers, clients, proxy servers, or server databases, all of which generate noisy data. Because Web mining is sensitive to noise, data cleaning methods are necessary. A data preprocessing system for web usage mining has been proposed in this paper. Data preprocessing includes data cleaning, user identification, session identification and path completion. The inexact data in web access log are mainly caused by local caching and proxy servers which are used to improve performance and minimize network traffic. The proposed method uses path completion algorithm to preprocess the data. We collect the datas from our college website and it is preprocessed based on the proposed method. The proposed path completion algorithm efficiently append the lost information and improves the consistency of access data for further web usage mining calculations.

    Journal of Computing | Journal of Systems and Software     
2013 ARPN Publishers