Academia.eduAcademia.edu

Clickstream Analysis

description353 papers
group340 followers
lightbulbAbout this topic
Clickstream analysis is the process of collecting, analyzing, and interpreting the sequence of clicks made by users on a website. This data helps in understanding user behavior, optimizing web design, and enhancing user experience by revealing patterns in navigation and interaction.
lightbulbAbout this topic
Clickstream analysis is the process of collecting, analyzing, and interpreting the sequence of clicks made by users on a website. This data helps in understanding user behavior, optimizing web design, and enhancing user experience by revealing patterns in navigation and interaction.

Key research themes

1. How can client interactivity in streaming media be characterized and modeled for diverse content domains?

This theme focuses on analyzing clickstreams in streaming media to understand client interactive behaviors such as pausing, rewinding, or jumping within media files. Such characterization aids in generating realistic synthetic workloads for performance evaluation of streaming systems and informs caching strategies. Addressing various content domains (education, entertainment audio/video) recognizes the differing interactive patterns, which matter for designing scalable and efficient streaming protocols.

Key finding: This paper presents a hierarchical workload characterization separating client session and interactive request levels across educational video, entertainment video, and entertainment audio workloads. Notably, it finds that... Read more

2. What are effective data preprocessing and session reconstruction methods for accurate web log mining?

Data preprocessing transforms raw web server logs into analyzable formats crucial for valid mining of user clickstream patterns. Reconstruction of web visitor sessions involves identifying unique users and segmenting their navigational sequences despite complications like shared IPs and crawlers. Precise session delineation directly impacts the quantity and quality of extracted behavioral rules and pattern analyses, influencing web personalization and portal optimization.

Key finding: Through experiments on web server logs, the paper delineates essential steps for reconstructing individual visitor sessions under challenges posed by shared IPs, proxies, and crawlers. It finds that identification of visitors... Read more
Key finding: This work highlights the critical role of session identification in web usage mining by defining sessions as sequences of requests from a single user for a singular navigation intent. It surveys methods addressing session... Read more
Key finding: The paper demonstrates the application of systematic web log preprocessing steps—data cleaning, user and session identification, and path completion—to convert unstructured logs into navigational patterns. The use of... Read more

3. How can user browsing strategies be identified by combining quantitative clickstream data and qualitative ethnographic observations?

This research theme investigates the fusion of server-side clickstream logs with ethnographic data from direct user observation and surveys to uncover user web browsing strategies and their evolution over time. The combined approach overcomes the limitations of clickstream data alone, which lack insights into user intentions and interactions like use of the back button, as well as the scalability constraints of ethnographic studies. Understanding user strategies guides better website design and usability evaluation.

Key finding: Using a longitudinal study with university students, this paper uncovers browsing strategies by categorizing server log page visits into navigational patterns and corroborating these with survey and direct observation data.... Read more

4. What machine learning approaches improve modeling and prediction of user navigation behavior from clickstream data?

This theme explores the application of advanced machine learning models—including neural networks, clustering algorithms, and ensemble methods—to extract and predict meaningful user navigation and browsing patterns from clickstream data. Reducing model complexity while maintaining predictive accuracy is a key concern addressed through pattern extraction (e.g., longest repeating subsequences) and classifier use. Accurate modeling supports improved recommendation systems, personalization, and web service optimization.

Key finding: The study introduces a model complexity reduction approach by extracting longest repeating subsequences (LRS) from web surfing paths to focus on significant predictive patterns. Coupled with a weighted specificity... Read more
Key finding: This paper proposes a web log mining method utilizing Adaptive Resonance Theory (ART) neural networks to process large, heterogeneous, and evolving web data effectively. The self-learning ART structure allows websites to... Read more
Key finding: Focusing on IPTV subscribers, this study benchmarks seven ML models (including LogitBoost and others) to analyze channel surfing behavior from clickstream logs, identifying features like gender, peak hour, age, and genre as... Read more

5. How can clickstream and traffic gap analysis differentiate between user think times and network-induced outages impacting Quality of Experience?

This area deals with analyzing network traffic to distinguish between natural user inactivity (think times) and disruptions caused by network problems that degrade streaming or browsing experience. The differentiation is critical for ISPs and service providers to identify network faults and improve QoE. Methodological innovations include ON-OFF modeling and wavelet-based criteria that operate on packet traffic flows without deep packet inspection.

Key finding: The paper presents a revised ON-OFF model differentiating deliberate user think times from network-induced outages by analyzing inter-packet traffic gaps using a wavelet-based criterion. Evaluations on live video streaming... Read more

6. Can machine learning applied to eye-tracking data improve understanding and classification of web user behaviors?

Eye-tracking offers granular quantitative data on visual attention during web interactions, but the interpretation of gaze trajectories into distinct behaviors remains challenging. This research theme investigates using advanced machine learning classifiers (LSTM, random forest, MLP) on scanpath data to discriminate among different web browsing tasks, offering a novel quantitative complement to traditional qualitative eye-tracking analyses and enhancing user experience evaluation.

Key finding: By testing six simulated web user tasks, this study applies LSTM, random forest, and MLP classifiers to scanpath eye-tracking data and achieves reliable differentiation among attentional, comparison, reading, and free surfing... Read more

7. How can clickstream analytics support real-time personalized viewer profiling and recommendation in streaming environments?

This research investigates dynamic profiling of viewers based on continuous streaming of interaction data such as ratings and preferences, using incremental learning methods tailored for stream data. Accurate viewer models underpin personalized recommendation systems that adapt in real time, helping to improve prediction accuracy and viewer engagement in multimedia streaming platforms.

Key finding: This paper introduces an incremental matrix factorization method with stochastic gradient descent to update individual viewer profiles dynamically as new streaming interaction data arrive. Experiments with MovieLens datasets... Read more

8. What metrics-based approaches can quantify and improve web usage patterns to enhance website performance and customer behavior understanding?

Metrics-based web analytics involve defining and leveraging quantitative indicators to measure website performance and visitor behavior. Proper metric selection and analysis enable identification of popular pages, behavioral transitions, and bottlenecks, guiding optimization efforts to improve user experience and business objectives.

Key finding: Using data from a university website as a case study, this paper defines and utilizes 15 web metrics (e.g., number of visitors, page views, bounce rates) over five months to analyze visitor behavior. It demonstrates how these... Read more

9. How can differences in web browsing behavior across cultures be analyzed using mouse tracking as a proxy for eye-tracking?

This theme examines the validation and application of remote proxy-based mouse tracking to investigate culturally influenced browsing behaviors, comparing groups such as Chinese and European users. Mouse-tracking provides a scalable and less intrusive approach than eye-tracking, enabling large-scale behavioral comparisons that can inform culturally sensitive web design.

Key finding: Applying a proxy-based mouse tracker to capture cursor movements without disrupting user behavior, this study compares Chinese and European users’ search performance on websites with varied menu placements. It identifies... Read more

All papers in Clickstream Analysis

Community detection in social networks is highly influenced by the noise, outliers, and choice of clustering parameter tuning methods. Graph autoencoder (GAE) models and their variants have been developed for community detection in the... more
This paper assesses machine learning algorithms in predicting purchase intentions in real-time in huge trade ecosystems. Findings indicate that these high-end models like Gradient Boosting and Neural Networks far surpass the performance... more
The exponential growth of Massive Open Online Courses (MOOCs) has expanded educational access but has also overwhelmed learners with an excessive number of choices, making it difficult to identify courses that align with their skills,... more
Click-through rate (CTR) is the most common metric used to assess the performance of an online advert; another performance of an online advert is the user post-click experience. In this paper, we describe the method we have implemented in... more
Online social networking is the latest craze that has captured the attention of masses, people use these sites to communicate with their friends and family. These sites offer attractive means of social interactions and communications, but... more
Purpose: This paper presents competing risks models, and shows how dwell times can be applied to predict users’ online behavior. This information enables real-time personalization of Web content. Design/methodology/approach: This papers... more
With the development of Massive Open Online Courses (MOOC) in recent years, discussion forums there have become one of the most important components for both students and instructors to widely exchange ideas. And actually MOOC forums play... more
Web Usage mining is a very important tool to extract the hidden business intelligence data from large databases. The extracted information provides the organizations with the ability to produce results more effectively to improve their... more
Following the approach described by Heckerman et al. ([5]), we present an application of Dependency Networks and Bayesian Networks to the analysis of a clickstream data set. Our target is to discover which paths are more often followed by... more
Personalized recommender systems rely on each user's personal usage data in the system, in order to assist in decision making. However, privacy policies protecting users' rights prevent these highly personal data from being publicly... more
Understanding how users behave when they connect to social networking sites creates opportunities for better interface design, richer studies of social interactions, and improved design of content distribution systems. In this paper, we... more
Understanding how users navigate and interact when they connect to social networking sites creates opportunities for better interface design, richer studies of social interactions, and improved design of content distribution systems. In... more
Self-explanation is designed to increase coherence by encouraging students to activate prior knowledge, generate inferences, and make casual connections . The current study used natural language processing to examine how readers'... more
Video clickstream behaviors such as pause, forward, and backward offer great potential for educational data mining and learning analytics since students exhibit a significant amount of these behaviors in online courses. The purpose of... more
Clickstream data, the record of web pages a user visits, has become a valuable asset for businesses aiming to enhance user experience and target advertising. However, the collection, storage, and analysis of clickstream data raise... more
Online shopping caters to the needs of millions of users daily. Search, recommendations, personalization have become essential building blocks for serving customer needs. Efficacy of such systems is dependent on a thorough understanding... more
The accelerated development of e-commerce has been a concern for businesspeople. Businesspeople should be able to gain customer interest in a variety of ways so that their companies can compete with others. Analyzing click-flow data will... more
The total reliance on internet connectivity and World Wide Web (WWW) based services is forcing many organizations to look for alternative solutions for providing adequate access and response time to the demand of their ever increasing... more
The use of proxy server could help provide adequate access and response time to large numbers of World Wide Web (WWW) users requesting previously accessed page. While some studies have reported performance increase due to the use of proxy... more
To provide personalized services such as online-product recommendations, it is usually necessary to model clickstream behavior of users if implicit preferences are taken into account. To accomplish this, web log mining is a promising... more
Due to the rapidly rising popularity of Massive Open Online Courses (MOOCs), there is a growing demand for scalable automated support technologies for student learning. Transferring traditional educational resources to online contexts has... more
Today, Massive Open Online Courses (MOOCs) have the potential to enable free online education on an enormous scale. However, a concern often raised about MOOCs is the consistently high dropout rate of MOOC learners. Although many... more
The lack of current network dynamics studies that evaluate the effects of new application and protocol deployment or long-term studies that observe the effect of incremental changes on the Internet, and the change in the overall stability... more
This cross-lagged longitudinal study was conducted with 862 seventh and eighth graders (secondary school) in the province of Québec (Canada) to study the effects of two important perceptual variables (self-concept and individual interest)... more
Sequential pattern mining is an important task in data mining. Its subproblem, clickstream pattern mining, is starting to attract more research due to the growth of the Internet and the need to analyze online customer behaviors. To date,... more
Course objective ''Just because you haven't found your talent yet, doesn't mean you don't have one.'' -Kermit the Frog Those of us born in the '70s or later share a set of childhood friends. You may not always think about them, but Kermit... more
This study builds on prior research by leveraging natural language processing (NLP), click-stream analyses, and survey data to predict students' mathematics success and math identity (namely, self-concept, interest, and value of... more
Self-regulated learning (SRL) is a critical component of mathematics problem-solving. Students skilled in SRL are more likely to effectively set goals, search for information, and direct their attention and cognitive process so that they... more
Research shows that anxiety can disrupt learning processes, but few studies have examined anxiety's relationships to online learning behaviors. This study considers the interplay between students' anxiety about science and... more
This study builds on prior research by leveraging natural language processing (NLP), click-stream analyses, and survey data to predict students’ mathematics success and math identity (namely, self-concept, interest, and value of... more
Previous studies have demonstrated strong links between students’ linguistic knowledge, their affective language patterns and their success in math. Other studies have shown that demographic and click-stream variables in online learning... more
This study concentrates on the opportunities of developing email marketing performance based on testing the design of an email newsletter. Drawing from existing literature, the paper presents a model for testing email newsletter design.... more
Introduction: Clustering algorithms play a key role in grouping data objects based on their similarities. A popular method, K-means, works by repeatedly adjusting the center of each cluster until convergence is achieved. This method,... more
Competition on e-commerce platforms is becoming increasingly fierce, due to the ease of online searching for comparing products and services. We examine how the sequential browsing behavior of consumers can enable targeted marketing... more
With the rapid expansion of mobile, blended, and seamless learning, researchers claim two factors, lack of self-discipline and poor time management, adversely impact learning performance. In online educational environments, reduced social... more
First and foremost, I would like to sincerely express my gratitude to my advisor Dr Andy W. H. Khong for his continual support during my PhD study. His guidance and encouragement, together with his willingness to unremittingly improve the... more
Information retrieval on the web is significant and furthermore complex activity for web mining. Because of enormously increased the number of websites on the Internet, the execution of PageRank Algorithm should be easy and faster in... more
Co-browsing is a synchronous class of collaborative applications, which allows a group of users to surf the Web together. Such an application can be deployed in an education environment in several ways. One example of where it can be used... more
Beds are important in several hospital operations decisions, such as admitting patients to a hospital room. Lack of information regarding bed effectiveness can lead to long wait times and even rejection of patients, which impedes hospital... more
This study investigated how content and context features of headlines drive selective exposure when choosing between headlines of a monthly e-mail health newsletter in a naturalistic setting over a period of nine months. Study... more
Recently data stream has been extensively explored due to its emergence in a great deal of applications such as sensor networks, web click streams and network flows. One of the most important challenges in data streams is concept change... more
Following the approach described by Heckerman et al. ([5]), we present an application of Dependency Networks and Bayesian Networks to the analysis of a clickstream data set. Our target is to discover which paths are more often followed by... more
Download research papers for free!