Blog post

Only time will tell: Using temporal multimodal analysis to predict learning

Jennifer K. Olsen, Postdoctoral Researcher at Swiss Federal Institute of Technology Lausanne Kshitij Sharma, Senior Researcher at Norwegian University of Science and Technology Nikol Rummel, Professor at Institute of Educational Research, Ruhr-Universität Bochum Vincent Aleven, Professor at Carnegie Mellon University 30 Oct 2020

By analysing the learning process, one can understand how student behaviours are related to learning outcomes. When assessing the learning process, researchers have shown multimodal learning analytics to provide better predictions than single data streams in individual learning due to each unimodal measure providing different information (Cukurova, Kent, & Luckin, 2019; Giannakos, Sharma, Pappas, Kostakos, & Velloso, 2019). Moreover, by including temporal aspects of the data, in which data points are collected across multiple time points (such as for each 10-second window), rather than only counts or averages, one can understand the correlations and impacts around the change in behaviours (Csanadi, Eagan, Kollar, Shaffer, & Fischer, 2018), which may lead to better predictions. In our recent article, ‘Temporal analysis of multimodal data to predict collaborative learning outcomes’, published in the British Journal of Educational Technology (Olsen, Sharma, Rummel, & Aleven, 2020), we investigate how multimodal data can aid in understanding the temporal inter-relationship of variables explaining learning from the collaborative process. The work expands our understanding of the use of multimodal learning analytics for collaborative learning beyond what unimodal data can provide and systematically assesses the benefits of different data streams in a temporal analysis.

A systematic comparison of data streams

Multimodal data does not refer to a specific combination of data. Rather, multimodal data refers to any combination of multiple types of data. For example, multimodal data may consist of audio, gaze and log data, or EEG and dialogue data all being collected from the same participant. On the other hand, if only one of these streams is collected, such as audio, this is unimodal data, even if multiple measures are used, such as tempo or energy from audio. Consequently, what the combination of data includes in terms of which and how many data streams are collected can impact how beneficial the use of multimodal data may be. We analysed multimodal data collected from 25 9–11-year-old dyads as they collaborated using a fractions intelligent tutoring system, which is a system that provides step-by-step and adaptive instructional support to students. Using data streams that spanned time scales (Newell, 1990) – in other words, measured interactions at a biological, cognitive or social level – we investigated how different combinations of data streams impacted the prediction of learning gains and post-test scores. Specifically, we assessed the relation of gaze, tutor log, audio (speech at the signal level) and dialogue (speech at the content level) data.

Expanding data in time and type

When we remove the temporal aspect of the process data by just using counts or averages for the different measures, we found few relations between the process data and learning gains. However, through our temporal analysis, in which we analysed each of the measures in 120-second windows, it is clear that these relationships do exist and may just be masked when we used counts and averages. We saw that addressing the temporal aspect of the data provides more information, although not equally. The variables that are measured at a smaller time scale, such as the gaze and audio measures, provided a more accurate prediction of learning gains than the measures at a higher time scale, such as the log data.

‘The variables that are measured at a smaller time scale, such as the gaze and audio measures, provided a more accurate prediction of learning gains than the measures at a higher time scale, such as the log data.’

As with the expansion of the analysis across time by considering the temporal aspects, we also found benefits of expanding the data across type through a multimodal analysis, supporting previous research (Vrzakova, Amon, Stewart, Duran, & D’Mello, 2020). However, this is not without a caveat. It is not enough to just have multimodal data, as some of our combinations actually had a less accurate prediction of learning gains than the unimodal data. What data is combined matters. We saw that combining the data streams of different time scales is beneficial to predict learning gains. One explanation for the benefit of the different time scales may be that they provide information on different dimensions. It may be less about the combination of multimodal data that is a benefit in itself, and more about what unique information each data stream brings – with the time scales being one dimension to consider.

This blog is based on the article ‘Temporal analysis of multimodal data to predict collaborative learning outcomes’ by Jennifer Olsen, Khsitij Sharma, Nicol Rummel and Vincent Aleven, published in the British Journal of Educational Technology. It has been made free-to-view for those without a subscription for a limited period, courtesy of our publisher, Wiley.

References

Csanadi, A., Eagan, B., Kollar, I., Shaffer, D. W., & Fischer, F. (2018). When coding-and-counting is not enough: Using epistemic network analysis (ENA) to analyze verbal data in CSCL research. International Journal of Computer-Supported Collaborative Learning, 13(4), 419–438. https://doi.org/10.1007/s11412-018-9292-z

Cukurova, M., Kent, C., & Luckin, R. (2019). Artificial intelligence and multimodal data in the service of human decision-making: A case study in debate tutoring. British Journal of Educational Technology, 50(6), 3032–3046. https://doi.org/10.1111/bjet.12829

Giannakos, M. N., Sharma, K., Pappas, I. O., Kostakos, V., & Velloso, E. (2019). Multimodal data as a means to understand the learning experience. International Journal of Information Management, 48, 108–119. https://doi.org/10.1016/j.ijinfomgt.2019.02.003

Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.

Olsen, J. K., Sharma, K., Rummel, N., & Aleven, V. (2020). Temporal analysis of multimodal data to predict collaborative learning outcomes. British Journal of Educational Technology. https://doi.org/10.1111/bjet.12982

Vrzakova, H., Amon, M. J., Stewart, A., Duran, N. D., & D’Mello, S. K. (2020). Focused or stuck together: Multimodal patterns reveal triads’ performance in collaborative problem solving. In LAK 2020 Conference Proceedings – Celebrating 10 years of LAK: Shaping the Future of the Field – 10th International Conference on Learning Analytics and Knowledge (pp. 295–304). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3375462.3375467

Jennifer K. Olsen, Dr

Postdoctoral Researcher at Swiss Federal Institute of Technology Lausanne

Dr Jennifer K. Olsen is a postdoctoral researcher at the Swiss Federal Institute of Technology Lausanne (EPFL) in the Computer Human Interaction for Learning and Instruction lab. Her research focuses on how collaboration can support learning and how to design educational technology to support these practices from both the learners’ and instructors’ perspectives.

Kshitij Sharma, Dr

Senior Researcher at Norwegian University of Science and Technology

Kshitij Sharma is a senior researcher in the Department of Computer Science at the Norwegian University of Science and Technology (NTNU). He received his PhD in computer science from the École polytechnique fédérale de Lausanne (EPFL). His research interests include eye‐tracking, MOOCs, collaborative learning, applied machine learning, multimodal learning and statistics

Nikol Rummel, Dr

Professor at Institute of Educational Research, Ruhr-Universität Bochum

Dr Nikol Rummel is a full professor in the Institute of Educational Research at the Ruhr-Universität Bochum, Germany. One of her main research interests is on adaptive support for computer-supported collaborative learning (CSCL). Another focus of her work is on developing methods for automated analyses of process data combining multiple data sources.

Vincent Aleven, Dr

Professor at Carnegie Mellon University

Dr Vincent Aleven is a professor of human–computer interaction at Carnegie Mellon University. His research focuses on the design of innovative adaptive learning technologies. He has investigated widely how such technologies can be most effective, with projects ranging from computer-based tutoring of help seeking, to a website with intelligent tutoring software for middle-school mathematics, to a real-time mixed-reality teacher awareness tool.

BERA news

BERA & Black History Month 2025

News1 Oct 2025

2025 Early Career Researcher Career Development Fund Recipients

News18 Sep 2025

BERA journals virtual issue: Evaluating the worth of race, ethnicity and education over the last five years

News4 Sep 2025

Announcing the 2025 BERA Educational Research Book of the Year shortlist

News3 Sep 2025

Only time will tell: Using temporal multimodal analysis to predict learning

A systematic comparison of data streams

Expanding data in time and type

References

More content by Jennifer K. Olsen, Kshitij Sharma, Nikol Rummel and Vincent Aleven

More related content

BERA news