CoDE Publications CoDE Publications
IRIDIA Publications IRIDIA Publications
SMG Publications
WIT Publications
WIT Publications
SMG Publications
Home People Research Activities Publications Teaching Resources
By Class By Topic By Year Technical Reports
By Class By Topic By Year Technical Reports
login
J.-P. Norguet, E. Zimányi, and R. Steinberger. Semantic Analysis of Web Site Audience by Integrating Web Usage Mining and Web Content Mining. In I.-H. Ting and H.-J. Wu, editors, Web Mining Applications in E-commerce and E-services, number 172 in Studies in Computational Intelligence, chapter 4, pages 65-80. Springer-Verlag, 2009.

Abstract

With the emergence of the World Wide Web, analyzing and improving Web communication has become essential to adapt the Web content to the visitors' expectations. Web communication analysis is traditionally performed by Web analytics software, which produce long lists of page-based audience metrics. These results suffer from page synonymy, page polysemy, page temporality, and page volatility. In addition, the metrics contain little semantics and are too detailed to be exploited by organization managers and chief editors, who need summarized and conceptual information to take high-level decisions. To obtain such metrics, we propose a method based on output page mining. Output page mining is a new kind of Web usage mining, between Web usage mining and Web content mining. In our method, we first collect the Web pages output by the Web server. Then, for a given taxonomy covering the Web site knowledge domain, we aggregate the term weights in the output pages using OLAP tools, in order to obtain topic-based metrics representing the audience of the Web site topics. To demonstrate how our approach solves the cited problems, we compute topic-based metrics with SQL Server OLAP Analysis Service and our prototype WASA for real Web sites. Finally, we compare our results against those obtained with Google Analytics, a popular Web analytics tool.


Updated: 2017-03-27