Skip to content
 

Blog post

Natural language processing: A tool for microgenetic analysis

Florence R. Sullivan, University of Massachusetts, Amherst

Microgenetic analysis is a Vygotskyan educational research method that supports uncovering student sense-making activity during collaborative learning. Investigating learning using microgenetic analysis requires paying close attention to the social interactions, speech acts and the use of tools within the learning environment in order to understand the genesis of conceptual development for children. While microgenetic analysis is a powerful educational research method, it is difficult to employ with large datasets; the very nature of the close and detailed work belies wide application. However, advances in artificial intelligence have led to computational methods that have the potential to support microgenetic analysis of larger datasets.

In our paper, ‘Exploring the potential of natural language processing to support microgenetic analysis of collaborative learning discussions’ (Sullivan & Keith, 2019), we provide a detailed account of how we deployed a natural language processing (NLP) approach known as parts of speech (POS) analysis to assist in microgenetic data analysis. POS refers to the grammatical role of a word in a sentence (noun, verb, adverb, preposition, and so on). We ground our use of the POS NLP method in Bakhtin’s (1986) theory of speech genres and Goffman’s (1974) notion of social frameworks. Bakhtin characterises speech genres as relatively stable types of utterances occurring within a particular sphere of human activity. Meanwhile, Goffman notes that participants in a specific, culturally recognisable activity, share a social framework for the type of interactions that may unfold in the activity. This social framework helps to guide interaction.

‘Our goal in this study was to identify grammatical clusters that perform specific types of “work” within the group towards solving the robotics challenge.’

In prior work (Sullivan, 2011), we developed a qualitative model of student problem-solving activity with robotics that we term the troubleshooting cycle (TSC) (writing, testing, discussing, debugging, re-writing, re-testing). The TSC is a relatively regular and stable feature of student activity while solving robotics problems. We argue that within this bounded sphere of robotics learning activity, the utterances that accompany the TSC activity will likewise be stable and specific. Our goal in this study was to identify grammatical clusters that perform specific types of ‘work’ within the group towards solving the robotics challenge.

The participants in our study were a group of three 12-year-old students in a sixth-grade science class. Our dataset consists of a 30-minute segment of collaborative problem solving we had already analysed by hand (Sullivan, 2011). Our research goal was to investigate the development of conceptual understanding of the role of a light sensor in solving a line-following robotics challenge. We sought to replicate our prior research findings with the aid of the new POS NLP method. To do so, we clustered words at the level of the bigram and the trigram. We selected these n-gram configurations because, arguably, they are the smallest levels at which complete utterances might be made. Halliday and Matthiessen (2014) point out that while the clause is the smallest semantic unit in the English language, clauses are made up of smaller grammatical units that also have meaning, including the nominal group, the verbal group, the adverbial group, and the prepositional phrase. Importantly, the theme of a clause will be carried by one of these smaller structural elements (p. 92).

Through a deliberative process we assigned problem-solving codes to specific POS bigrams and trigrams, we then developed a temporal view of the clustering of n-grams at specific times over the 30-minute segment. This allowed us to visually identify robust periods of problem-solving discussions, which we then subjected to deeper analysis. Through this deeper analysis, we identified a trajectory of improving understanding of the role and function of the light sensor in the activity, partially replicating prior results.

Our work demonstrates that AI techniques, such as POS NLP can aid researchers in conducting microgenetic analysis and expanding the approach to larger datasets.


This blog is based on the article ‘Exploring the potential of natural language processing to support microgenetic analysis of collaborative learning discussions’ by Florence Sullivan and P. Kevin Keith, published in the British Journal of Educational Technology. It has been made free-to-view until 31 January 2020, courtesy of our publishing partners, Wiley.


References

Bakhtin, M. M. (1986). The problem of speech genres. In V.W. McGee, Trans., C. Emerson, & M. Holquist (Eds.), Speech genres and other late essays (pp. 60–102). Austin, TX: University of Texas Press.

Goffman, E. (1974). Frame analysis: An essay on the organization of experience. New York, NY: Harper and Row.

Halliday, M. A. K., & Matthiessen, C. M. I. M. (2014). Halliday’s introduction to functional grammar (4th ed.). New York, NY: Routledge Press.

Sullivan, F. R. (2011). Serious and playful inquiry: Epistemological aspects of collaborative creativity. Journal of Educational Technology and Society, 14(1), 55–65.

Sullivan, F. R., & Keith, P. K. (2019). Exploring the potential of natural language processing to support microgenetic analysis of collaborative learning discussions. British Journal of Educational Technology, 50(6), 3047–3063. https://doi.org/10.1111/bjet.12875.