
Can machine learning techniques analyze qualitative focus group data effectively?

Using data science techniques to enhance perceptual learning


More than ever before, researchers have access to a variety of broad, rich, educationally relevant text data from several sources, such as literature databases (e.g., ERIC [Education Resources Information Center]), open-ended responses from online courses/surveys, online discussion forums, transcribed audio of face-to-face classes or focus groups, digital essays, and social media.

数据可用性方面的这种进步(加上新兴的分析技术)可以极大地增加发现新模式的可能性, efficiencies in research, and testing of new theories in educational contexts. For example, can emerging analytic techniques based in machine learning, such as topic modeling, be used to analyze qualitative focus group data effectively, and what limitations and or recommendations can be identified?


我们为伍斯特理工学院提供了支持,将主题建模的使用扩展到从焦点小组收集的数据上,这些小组的教师对学生实施了一些数学技术干预:, a game-based perceptual learning intervention; DragonBox 12+ (DragonBox), a widely used game-based technology application; and 2 versions of ASSISTments (Immediate Feedback and Active Control).

We compared the results of qualitative coding and topic modeling. Unlike the textual data typically used in text mining, and as described above, the focus groups in this study involved dynamic human communications—that is, teachers sharing thoughts and experiences in a meaningful way while taking in, processing, and responding to both the facilitator and other participants. In this process, actors (in this case, 教师)不断地轮流(并同时聆听)扮演演讲者和听众的角色(Watzlawick等)., 1967*; DeVito, 2016*). As such, the patterns of communication constantly evolve, 在焦点小组中交换信息的方向和深度可能会因教师群体和调解人的技能而大不相同.

这项调查使用的信息是与伍斯特理工学院合作进行的一项更大的随机对照试验(RCT)的一部分, the University of Maine, and Indiana University during the COVID-19 pandemic.

The larger study was an RCT across 10 middle schools (9 in-person and 1 virtual), including 3,600+ 7th-grade students. 本研究测试了三种教育技术干预对七年级学生代数理解的影响, across 4 conditions: (a) FH2T, (b) DragonBox, (c) Immediate Feedback, and (d) Active Control. The FH2T and DragonBox conditions represent use of game-based applications. Immediate Feedback entails problem sets using an online homework system, ASSISTments. For purposes of this study, 主动控制条件模仿传统的家庭作业,但仍然使用技术.

As part of the larger study, 教师参与焦点小组讨论(a)教师对学生对数学技术的反应的看法, (b)教师和学生在使用数学技术时遇到的挑战, (c) impacts of the various applications on student learning, (d) the unique impact of the pandemic on instructing students, and (e) suggestions for changes or improvements to the mathematics technologies. Out of the 34 teachers who implemented the study, 16 (47%) participated in 1 of the 4 focus group sessions.

In this investigation, 我们检验了主题建模是否能够从教师焦点小组数据中提取出与更多定性分析方法一致的模式. Specific questions we explored in this study were:

  • What themes emerged from the qualitative coding approach and the topic modeling approach?
  • What limitations does topic modeling have regarding the analysis of the focus group data?
  • 对于其他可能尝试使用从焦点小组收集的数据进行主题建模的研究人员,我们有什么建议?

与其他形式的文本数据(如文学、书籍或论文)不同,在这些文本数据中,一种类型的信息(如.g., information about actors, information about data source, information about findings) is organized within a section, data generated from focus groups reflected dynamic human communications (i.e., participants share thoughts and experiences in a meaningful way while taking in, processing, and responding to the moderator and other participants). 引导者还可以引导参与者回到焦点小组的问题上,或者跟随焦点小组讨论的方向, depending on the research questions posed.


主题建模结果与定性编码结果高度吻合. 主题建模和定性编码的最大区别是在更高的层次上对主题的组织/分类.g., 5 themes vs. 3 themes). At the lower level, both approaches identified many of the same sub themes and, therefore, 结果表明,学生们对使用不同的数学技术的反应是相似的. 在检查两种不同方法(主题建模和定性编码)的编码和叙事结果时, 两位研究人员发现,主题建模方法在捕获定性编码能够识别的细微信息方面效果较差. For example, 定性编码能够更好地提供学生对每种学习工具的具体反应的细节, thus, was able to provide more nuanced findings than topic modeling. Similarly, 定性编码能够更好地识别学生群体的差异和特定信息.e.特殊教育学生和速成学生)比主题建模方法.

本研究成功地将话题建模的应用扩展到焦点小组数据上. Used together, 研究结果表明,主题建模是对焦点群体数据进行多种编码的可行方法. It can easily be used prior to qualitative analysis to identify nodes (i.e., 主题)或同时进行或在定性分析之后,以确定定性分析中可能固有的优势和劣势. 定性编码员的好处是技术的快速特性,它允许使用主题建模进行更快的编码.


DeVito, J.A. (2016). The interpersonal communication book (14th ed.). London, England: Pearson Education Limited.

Watzlawick, P., Bavelas, J.B., and Jackson, D.D. (1967). 人类交际的语用学:对互动模式、病态和悖论的研究. New York, NY: W.W. Norton & Company.


Deep Dive with Our Experts

view all insights


Keep Reading

view all projects

How can we help?

We welcome messages from job seekers, collaborators, and potential clients and partners.

Get in Contact

Want to work with us?

You’ll be in great company.

Explore Careers
Back to Top