Phoster

Research and Development

Educational Multimodal Question-answering and Dialogue Systems

Introduction

Today, generative artificial-intelligence systems can create and revise charts, diagrams, graphs, images, and infographics via natural-language interaction (Dibia, 2023). In the future, conversational multimedia search engines and generative artificial-intelligence systems will, together, enable more extemporaneous and improvisational uses of data and visual media, including video content, during lectures, meetings, and presentations.

Let us imagine a lecture or presentation where a teacher or presenter could, in response to questions and discussion, very quickly create new slides or multimedia content by interacting conversationally with their artificial-intelligence assistants.

Conversational Search and Design

When human users or artificial-intelligence agents desire to search for and quickly retrieve existing multimedia resources, they will be able to make use of conversational multimedia search engines.

When human users or artificial-intelligence agents desire to create new multimedia resources, instead, they will be able to make use of conversational multimedia design systems.

By using situational, conversational, task, and other contextual cues, artificial-intelligence systems will be able to more often retrieve or create sought multimedia resources earlier in agentic conversational search and design processes.

Multimodal Question-answering and Dialogue Systems

Today, agentic artificial-intelligence systems can autonomously conduct research (Zhang, Li, Zhang, Jia, Wang, Guo, Liu, and Zhao, 2025) and produce textual reports.

In the future, these systems will also be able to produce multimedia reports, presentations, and video output in response to users' questions and dialogue.

Education and Pedagogy

Today, while preparing for their lessons beforehand, teachers can search for, retrieve, segment, arrange, and edit video clips together to suit the needs of their particular lessons; teachers can create and reuse both linear and interactive forms of video content.

With artificial-intelligence systems capable of providing real-time conversational search and design, teachers would be able to more easily prepare content before class lectures and to co-create educational multimedia content, on-the-fly, during classroom lectures, including in response to students' questions and classroom discussions.

Simplifying content production, artificial-intelligence systems under discussion could also be tools for students to complete homework assignments. Students could record and utilize video clips, excerpts from multimodal conversations, when creating their video essays.

Students could also be assessed, e.g., with respect to their historical thinking and reasoning, during classroom discussions (Johnson, 2021) and while interacting with conversational artificial-intelligence systems (Yildirim-Erbasli, 2022).

Content Evaluation

How might multimedia content created by or co-created with artificial-intelligence systems and any voice-over narrations be evaluated?

With respect to evaluating the quality of digital learning resources, El Mhouti, Nasseh, and Erradi (2013) indicate a means to produce an evaluation instrument involving: academic, pedagogical, didactic, and technical quality.

With respect to evaluating the quality of artificial-intelligence-generated digital educational resources for university teaching and learning, Huang, Lv, Lu, and Tu (2025) indicate another hierarchical model with four high-level dimensions: content, expression, user, and technical characteristics.

Cognitive theories of multimedia learning can also be of use when evaluating educational video content (Mayer, 2002).

With respect to the future of multimedia learning, studies can be conducted in authentic learning venues such as classrooms and online courses and with new forms of media such as virtual reality, interactive simulations and games, animated pedagogical agents, instructional video, and narrated animation (Mayer, 2022).

Discussion

In the future, artificial-intelligence systems will be able to generate, in real-time, high-quality educational multimedia content resembling the History Channel, The Story of Maths, and Cosmos: A Spacetime Odyssey.

Let us imagine a university history laboratory some years from now. In this laboratory, there is a room. In this room, there is a student holding a remote-control device and standing in front of a wall-mounted video display. On this video display is content resembling that of a History Channel show. A narrator is speaking atop high-quality historical footage, depictions of historical figures, historical reenactments, interviews with historians and other experts, and more.

Next, the student presses a button on the remote-control device to activate microphones in it to ask a question to the wall-mounted video display. The video content, being generated in real-time by an artificial-intelligence system, smoothly segues to answer the student's question.

The narrator of what at first appeared to be an educational television show is an artificial-intelligence system searching for, retrieving, selecting, generating, and presenting video content in real-time for the student, weaving together both reused and freshly-created content into seamless, responsive, personalized educational video.

Let us next imagine a university history classroom. A history teacher is in front of a class making use of such a tool, holding a remote-control device. The teacher instructs a multimodal artificial-intelligence system to retrieve and present video content on a larger screen for an entire class of students. This man-machine dialogue is, in part, extemporaneous and improvisational, resulting from a dynamic classroom discussion about historical topics.

Conclusion

In the future, students will be able to interrupt, command, ask questions of, and engage in dialogues with artificial-intelligence systems creating educational video content for them in real-time. Teachers will be able to conversationally co-create and share video content with classes, including in response to students' questions and dynamic discussions. More generally, during meetings, presenters will be able to conversationally co-create slides and other multimedia content, including in response to participants' questions and dynamic discussions.

Students, teachers, and presenters will be able to easily, quickly, and effectively use artificial-intelligence systems to retrieve and create multimedia content during various extemporaneous and improvisational situations which can arise during classes, lectures, meetings, and presentations. This will greatly enhance communication, performance, and productivity.

Bibliography

Dibia, Victor. "LIDA: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models." arXiv preprint arXiv:2303.02927 (2023).

El Mhouti, Abderrahim, Azeddine Nasseh, and Mohamed Erradi. "How to evaluate the quality of digital learning resources." International Journal of Computer Science Research and Application 3, no. 03 (2013): 27-36.

Huang, Qian, Chunlan Lv, Li Lu, and Shuang Tu. "Evaluating the quality of AI-generated digital educational resources for university teaching and learning." Systems 13, no. 3 (2025): 174.

Johnson, Jennifer L. "Evidences of historical thinking in dialogic discussions." PhD diss., University of South Florida, 2021.

Lévesque, Stéphane, and Penney Clark. "Historical thinking: Definitions and educational applications." In The Wiley International Handbook of History Teaching and Learning (2018): 117-148.

Mayer, Richard E. "Multimedia learning." In Psychology of Learning and Motivation, vol. 41, pp. 85-139. Academic Press, 2002.

Mayer, Richard E. "The future of multimedia learning." The Journal of Applied Instructional Design 11, no. 4 (2022): 69-77.

Perry‐Kates, Adi, and Anat Cohen. "From viewers to participants: The evolution of learning through interactive video." Journal of Computer Assisted Learning 41, no. 3 (2025): e70061.

Van Drie, Jannet, and Carla Van Boxtel. "Historical reasoning: Towards a framework for analyzing students' reasoning about the past." Educational Psychology Review 20 (2008): 87-110.

Yildirim-Erbasli, Seyma Nur. "Conversation-based assessments: Measuring student learning with human-like communication." PhD diss., University of Alberta, 2022.

Zhang, Wenlin, Xiaopeng Li, Yingyi Zhang, Pengyue Jia, Yichao Wang, Huifeng Guo, Yong Liu, and Xiangyu Zhao. "Deep research: A survey of autonomous research agents." arXiv preprint arXiv:2508.12752 (2025).