Artificial-intelligence systems capable of asking, managing, and answering questions while incrementally processing and analyzing texts can enhance: story comprehension and prediction, automatic item generation, synthetic data generation, automatic text summarization, the design and engineering of instructional materials and technical documentation, and the writing of stories.
Historical approaches to computationally modeling the various interdependent and integrated components of reading have included modeling: eye movement, word identification, sentence processing, discourse representation, and overall reading architectures.
Computational models of reading architectures include: the Attention Shift Model, E-Z Reader, EMMA, SWIFT, Glenmore, SERIF, OB1-Reader, and Über-Reader (Reichle, 2021).
In 2006, the task-based relevance and content extraction model was developed (TRACE; Rouet, 2006). In 2011, a multi-document version was developed (MD-TRACE; Rouet & Britt, 2011). In 2017, a model was developed which expanded upon these, phrasing reading as problem-solving (RESOLV; Rouet, Britt, & Durik, 2017).
The RESOLV model argues that the activities of reading are more contextual than previous research indicated (Britt, Rouet, & Durik 2017). RESOLV proposes two cognitive representations: the context model and the task model.
There are interesting relationships between dynamic and unfolding informational needs and contexts during reading. Answering questions from requesters or from readers themselves can be components of reading tasks. Readers' goals and subgoals could include acquiring that information with which to satisfy their dynamic and unfolding informational needs.
Of mathematics, Georg Cantor said that the art of asking questions is more valuable than solving problems.
In artificial intelligence, the capability to generate the right questions is highly sought after to reflect the ability to understand language, to gather new information, and to engage with users (Ko, Chen, Huang, Durrett, & Li, 2020).
With respect to reading comprehension, it is plausible to assume that, as content is processed, previous questions are answered and new ones are raised (Olson, Duffy, & Mack, 2017).
While it is not posited that those reading silently are consciously asking themselves questions as texts are processed, texts' contents are understood and added to growing mental representations and existing informational needs, or questions, interact with arriving content to produce subsequent informational needs, or questions (Olson, Duffy, & Mack, 2017).
It would be impractical to maintain a list of pending questions during reading and to check each question every time that a new fact was encountered (Ram, 1991).
Questions should, then, be indexed in memory. As they are indexed, it is likely that readers would find answers to questions other than those they are primarily focused upon. Readers' informational needs or knowledge goals, then, can be satisfied opportunistically during reading (Ram, 1991).
Readers can use those questions which arise during reading to focus their inferencing (Ram, 1989). However, not all questions are equally important and not all answers are equally valuable to readers.
There are two types of heuristics for ascribing interestingness to content during reading: content-based and structure- or configuration-based (Ram, 1989). In content-based heuristics, some things are more interesting to a reader based upon the reader's goals. In structure- or configuration-based heuristics, some kinds of situations, e.g., expectation failures, are more interesting to a reader than others. In general, both of these heuristics must be combined to determine overall interestingness.
With respect to the focusing of attention, there are two ways to attribute value to factual statements during text or story processing: top-down and bottom-up (Ram, 1991). In the top-down way, facts which answer questions are worth focusing attention upon as they help to satisfy readers' informational needs or knowledge goals. In the bottom-up way, facts which raise new questions are worth focusing attention upon, in particular if they arise from gaps or inconsistencies in readers' knowledge.
Several models have been developed which describe cognitive processes involved in answering questions, whether from memory or from consulting external resources.
In 1978, Lehnert developed a computational model of question-answering, QUALM (Lehnert, 1978). It was implemented as a computer program that interoperated with two different story-comprehension systems.
In 1990, the QUEST model was developed (Graesser & Franklin, 1990). It was comprised of four procedural components: question interpretation, identification of relevant information sources, pragmatics, and convergence mechanisms.
Taxonomies of questions have encompassed: recognition, recall, comprehension, application, analysis, synthesis, and evaluation (Bloom, 1956).
Taxonomies of questions have encompassed: causal antecedent, goal orientation, enablement, causal consequent, verification, disjunctive, instrumental/procedural, concept completion, expectational, judgmental, quantification, feature specification, and request (Lehnert, 1978).
Later taxonomies have added: example, definition, comparison, and interpretation (Graesser & Person, 1994).
There are also to consider: guiding questions and inquisitive questions. Guiding questions are those questions provided by texts' authors to direct readers in their searches for understanding. Inquisitive questions arise from readers' curiosity.
Questions can also be described as being either: literal or inferential. Literal questions focus on information directly stated in the text, involving surface-level understandings. Inferential questions require readers to make inferences, involving deeper understandings and interpretations of the text.
Taxonomies of answers have encompassed: direct, indirect, partial, limiting, correcting, modifying, and accurate and exhaustive answers (Brożek, 2011).
Inferences during reading can be: automatic or strategic; online or offline; text-connecting, knowledge-based, or extratextual; local or global; coherence or elaborative; unconscious or conscious; bridging; intersentence or text-connecting, or gap-filling; coherence, elaborative, knowledge-based, or evaluative; and anaphoric, text-to-text, or background-to-text (Kispal, 2008).
Types of inferences during reading include: referential; filling in deleted information; inferring the meanings of words; inferring connotations of words or sentences; relating text to prior knowledge; inferences about the author; inferences about characters; inferences about the state of the world as depicted; confirming or disconfirming previous inferences; and drawing conclusions (Pressley & Afflerbach, 1995).
Types of inferences during story-reading include: referential; case structure role assignment; antecedent causal; superordinate goal; thematic; character emotion; causal consequence; instantiation noun category; instrument; subordinate goal action; state; reader's emotion; and author's intent (Graesser, Singer, & Trabasso, 1994).
Erotetic inferences are processes through which questions are inferred from assertions, questions, or combinations of assertions and questions.
Question decomposition, for example, transforming compound questions into simpler ones, is a type of erotetic inference.
Bloom, Benjamin S. "Taxonomy of educational objectives: The classification of educational goals." Cognitive Domain (1956).
Britt, M. Anne, Jean-François Rouet, and Amanda Durik. Literacy Beyond Text Comprehension: A Theory of Purposeful Reading. Routledge, 2017.
Brożek, Anna. Theory of Questions: Erotetics Through the Prism of Its Philosophical Background and Practical Applications. Rodopi, 2011.
Graesser, Arthur C., and Stanley P. Franklin. "QUEST: A cognitive model of question answering." Discourse Processes 13, no. 3 (1990): 279-303.
Graesser, Arthur C., and Natalie K. Person. "Question asking during tutoring." American Educational Research Journal 31, no. 1 (1994): 104-137.
Graesser, Arthur C., Murray Singer, and Tom Trabasso. "Constructing inferences during narrative text comprehension." Psychological Review 101, no. 3 (1994): 371.
Kispal, Anne. Effective Teaching of Inference Skills for Reading: Literature Review. Research Report DCSF-RR031. National Foundation for Educational Research. Nottingham, England: Department for Children, Schools and Families, 2008.
Ko, Wei-Jen, Te-yuan Chen, Yiyan Huang, Greg Durrett, and Junyi Jessy Li. "Inquisitive question generation for high level text comprehension." arXiv preprint arXiv:2010.01657 (2020).
Lehnert, Wendy G. The Process of Question Answering: A Computer Simulation of Cognition. John Wiley & Sons, 1978.
Olson, Gary M., Susan A. Duffy, and Robert L. Mack. "Question-asking as a component of text comprehension." In The Psychology of Questions, pp. 219-226. Routledge, 2017.
Pressley, Michael, and Peter Afflerbach. Verbal Protocols of Reading: The Nature of Constructively Responsive Reading. Routledge, 2012.
Ram, Ashwin. "Question-driven understanding: An integrated theory of story understanding, memory and learning." PhD diss., Yale University, 1989.
Ram, Ashwin. "A theory of questions and question asking." Journal of the Learning Sciences 1, no. 3-4 (1991): 273-318.
Reichle, Erik D. Computational Models of Reading: A Handbook. Oxford University Press, 2021.
Rouet, Jean-François. The Skills of Document Use: From Text Comprehension to Web-based Learning. Routledge, 2006.
Rouet, Jean-François, and M. Anne Britt. "Relevance processes in multiple document comprehension." Text Relevance and Learning from Text (2011): 19-52.
Rouet, Jean-François, M. Anne Britt, and Amanda M. Durik. "RESOLV: Readers' representation of reading contexts and tasks." Educational Psychologist 52, no. 3 (2017): 200-215.
Man-machine literature discussions about moral stories could provide value both to individuals (Goldenberg, 1992) and to artificial-intelligence systems (Sinatra, Graesser, Hu, Brawner, & Rus, 2019; Tong & Hu, 2024).
Self-improving adaptive instructional systems model users, both learners and experts. These models can be more generic, or stereotype-based, or more specific and highly adaptive.
User models can be of use for intelligently selecting, generating, distributing, and administering moral stories across populations to maximize value both for users and artificial-intelligence systems.
Beyond presenting users with adaptive and personalized sequences of questions about moral stories, artificial-intelligence systems could participate in more interesting, engaging, and enriching man-machine literature discussions.
While the discourse of reading groups has previously been explored (Peplow, Swann, Trimarco, & Whitely, 2015), man-machine literature discussions and co-reading are comparably new terrain.
To increase the value of resultant educational data, techniques from opinion polling, survey design, and questionnaire construction could be of use during literature discussions. Pertinent topics would include avoiding leading questions or loaded questions, and having a mindfulness of the framing of questions, context effects, and item-sequencing effects.
When artificial-intelligence systems generate moral stories, they could also generate agentic workflows, or "scripts", describing the processes with which to discuss the stories and how to intersperse questions or testlets. Discussion questions for readers can be presented to them in the middles of reading moral stories, e.g., at section or chapter boundaries, and upon stories' completions.
Agentic workflows, or "scripts", can include branching points. There could be multiple paths available both through them and accompanying testlets.
Artificial-intelligence systems can generate stories in order to accomplish specified pedagogical objectives. Specified pedagogical objectives should be preserved and accompany generated story items, alongside other artifacts produced during story generation, as metadata to simplify story understanding, analysis, and evaluation.
Descriptions of intended audiences can also be provided to story generators. These input data would allow generated moral stories to be developmentally appropriate with respect to their subject matter, grammar, and vocabulary (Valentini, Weber, Salcido, Wright, Colunga, & Kann, 2023).
Moral stories present readers with situations in story contexts about which moral reasoning and discussion occur. Meanwhile, values can be both general and context-specific with respect to alignment (Liscio, van der Meer, Siebert, Jonker, & Murukannaiah, 2022).
Components could be created for modeling aspects of story comprehension during, in the middles of, unfolding stories. Predictions, then, could be made with respect to readers' responses to those moral situations occurring in those contexts presented by stories.
Different stories about identical moral themes can cause different distributions of responses and discussions.
Over the course of time, provided with adequate data, artificial-intelligence systems could discern and learn causal relationships between the types, meanings, structures, devices, forms, and effects of moral stories.
Artificial-intelligence systems can self-improve with respect to both the generation and execution of agentic workflows, or "scripts".
Components for the generation, understanding, analysis, and evaluation of moral stories can self-improve. In these regards, perhaps forms of A/B and multivariate testing could occur as systems exploited and explored variations in moral stories and literature discussions to achieve pedagogical objectives.
In artificial intelligence, value-alignment challenges include how to align artificial-intelligence systems to sets of values and how to determine which to do so for (Gabriel, 2020).
Artificial-intelligence systems can learn from and be aligned to values from moral stories (Riedl & Harrison, 2016; Emelin, Le Bras, Hwang, Forbes, & Choi, 2020; Nahian, Tasrin, Frazier, Riedl, & Harrison, 2025).
Man-machine literature discussions about selected or generated moral stories can also provide value to artificial-intelligence systems. Research is underway into mining human tutorial discussions (Maharjan, Rus, & Gautam, 2018; Lin, Singh, Sha, Tan, Lang, Gašević, & Chen, 2022) and these techniques will be increasingly useful for analyzing and learning from transcripts of man-machine discussions.
Artificial-intelligence systems will be able to select and generate moral stories to engage in man-machine literature discussions, continuously learning from experts while tutoring learners.
Instead of attempting to train a morally absolutist artificial-intelligence system, systems could be trained to be increasingly capable of adopting a variety of ideological stances, positions, perspectives, schools of thought, and wisdom traditions. Resultant pluralist systems could, then, be prompted to perform moral reasoning and to engage in dialogue from described personas (Shanahan, McDonell, & Reynolds, 2023; Kovač, Portelas, Sawayama, Dominey, & Oudeyer, 2024).
In addition to single artificial-intelligence systems capable of performing many personas, components can be envisioned which can route descriptions of personas to those other models most capable of performing them.
Training, fine-tuning, and combinations of prompt ingredients can each contribute to the behavior of models performing personas.
Of all possible personas, certain ones might be more broadly popular, useful, and desired for study. Insights into which personas to prioritize can result from analyses of societies' political and ideological spaces (Morton, 1999).
Political science can support artificial-intelligence alignment efforts. Continuous empirical political analysis can be instrumental to the creation and alignment of prioritized personas which would "receive reflective endorsement despite widespread variation in people's moral beliefs" (Gabriel, 2020).
Multi-agent systems can be of use for: intelligent tutoring (Šarić-Grgić, Grubišić, Stankov, & Štula, 2019); representing personas capable of performing reasoning and dialogue from differing ideological stances, positions, perspectives, schools of thought, and wisdom traditions; contextual value alignment (Dognin, Rios, Luss, Padhi, Riemer, Liu, Sattigeri, Nagireddy, Varshney, & Bouneffouf, 2024); story generation (Huot, Amplayo, Palomaki, Jakobovits, Clark, & Lapata, 2024); literature discussions; and otherwise modeling and simulating both learners and experts.
Teams of specialized technical personnel could operate self-improving adaptive instructional systems, review automatically-generated story items, monitor unfolding performance metrics pertaining to story items, and monitor real-time analytics dashboards pertaining to the administering of testlets and to the literature discussions between artificial-intelligence systems and populations of learners and experts.
Learners' parents, teachers, teaching assistants, guidance counselors, and school administrators could be provided with means of engaging with educational artificial-intelligence systems, e.g., using multimodal dialogue enhanced by data visualizations and analytics dashboards.
Self-improving adaptive instructional systems can, increasingly, generate moral stories, scenarios, or cases and can select these to discuss with individuals and teams. Man-machine literature discussions about moral stories could provide value both to individuals and to artificial-intelligence systems.
Artificial-intelligence systems will be able to generate and select moral stories to engage in man-machine literature discussions, continuously learning from experts while tutoring learners.
One secondary benefit of the architectural approaches considered and discussed above is that, with the same components, users would be able to narrate real-world or hypothetical scenarios to artificial-intelligence systems, these serving as the stories for discussion, and to select or describe personas to interact with.
Dognin, Pierre, Jesus Rios, Ronny Luss, Inkit Padhi, Matthew D. Riemer, Miao Liu, Prasanna Sattigeri, Manish Nagireddy, Kush R. Varshney, and Djallel Bouneffouf. "Contextual moral value alignment through context-based aggregation." arXiv preprint arXiv:2403.12805 (2024).
Emelin, Denis, Ronan Le Bras, Jena D. Hwang, Maxwell Forbes, and Yejin Choi. "Moral stories: Situated reasoning about norms, intents, actions, and their consequences." arXiv preprint arXiv:2012.15738 (2020).
Gabriel, Iason. "Artificial intelligence, values, and alignment." Minds and Machines 30, no. 3 (2020): 411-437.
Goldenberg, Claude. "Instructional conversations: Promoting comprehension through discussion." The Reading Teacher 46, no. 4 (1992): 316-326.
Huot, Fantine, Reinald Kim Amplayo, Jennimaria Palomaki, Alice Shoshana Jakobovits, Elizabeth Clark, and Mirella Lapata. "Agents' room: Narrative generation through multi-step collaboration." arXiv preprint arXiv:2410.02603 (2024).
Kovač, Grgur, Rémy Portelas, Masataka Sawayama, Peter Ford Dominey, and Pierre-Yves Oudeyer. "Stick to your role! Stability of personal values expressed in large language models." arXiv preprint arXiv:2402.14846 (2024).
Lin, Jionghao, Shaveen Singh, Lele Sha, Wei Tan, David Lang, Dragan Gašević, and Guanliang Chen. "Is it a good move? Mining effective tutoring strategies from human–human tutorial dialogues." Future Generation Computer Systems 127 (2022): 194-207.
Liscio, Enrico, Michiel T. van der Meer, Luciano C. Siebert, Catholijn M. Jonker, and Pradeep K. Murukannaiah. "What values should an agent align with? An empirical comparison of general and context-specific values." Autonomous Agents and Multi-Agent Systems (2022).
Maharjan, Nabin, Vasile Rus, and Dipesh Gautam. "Discovering effective tutorial strategies in human tutorial sessions." In The Thirty-First International Flairs Conference (2018).
Morton, Rebecca B. Methods and Models: A Guide to the Empirical Analysis of Formal Models in Political Science. Cambridge University Press, 1999.
Nahian, Md Sultan Al, Tasmia Tasrin, Spencer Frazier, Mark Riedl, and Brent Harrison. "The Goofus & Gallant story corpus for practical value alignment." arXiv preprint arXiv:2501.09707 (2025).
Peplow, David, Joan Swann, Paola Trimarco, and Sara Whiteley. The Discourse of Reading Groups: Integrating Cognitive and Sociocultural Perspectives. Routledge, (2015).
Riedl, Mark O., and Brent Harrison. "Using stories to teach human values to artificial agents." In Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016).
Šarić-Grgić, Ines, Ani Grubišić, Slavomir Stankov, and Maja Štula. "An agent-based intelligent tutoring systems review." International Journal of Learning Technology 14, no. 2 (2019): 125-140.
Shanahan, Murray, Kyle McDonell, and Laria Reynolds. "Role-play with large language models." Nature 623, no. 7987 (2023): 493-498.
Sinatra, Anne M., Arthur C. Graesser, Xiangen Hu, Keith Brawner, and Vasile Rus, eds. Design Recommendations for Intelligent Tutoring Systems: Volume 7 - Self-improving Systems. US Army Research Laboratory, (2019).
Tong, Richard Jiarui, and Xiangen Hu. "Future of education with neuro-symbolic AI agents in self-improving adaptive instructional systems." Frontiers of Digital Education 1, no. 2 (2024): 198-212.
Valentini, Maria, Jennifer Weber, Jesus Salcido, Téa Wright, Eliana Colunga, and Katharina Kann. "On the automatic generation and simplification of children's stories." arXiv preprint arXiv:2310.18502 (2023).
Computational storyboarding builds upon traditional storyboarding techniques, combining elements from screenplays, storyboards, functions, diagrams, and animation.
Computational storyboards are intended to be of use as input for generative artificial-intelligence systems to create longer-form output video.
A motivating use case is simplifying the creation of educational videos, e.g., lecture videos. With computational storyboards, content creators could describe single-character stories where the main characters were tutors instructing audiences with respect to provided subject matter, utilizing boards or screens displaying synchronized multimedia content from textbooks, encyclopedia articles, or slideshow presentations.
A screenplay is a form of narration in which the movements, actions, expressions, and dialogue of characters are described in a certain format. Visual and cinematographic cues might also be given as well as scene descriptions and changes.
A storyboard is an organization technique consisting of illustrations or images, thumbnails, traditionally displayed in sequences. Storyboards have traditionally been used for pre-visualizing motion pictures, animations, motion graphics, and other interactive media sequences.
Storyboards' thumbnails have traditionally provided information about content layering, audio and sound effects, camera shots, character shots, transitions between scenes, and more.
In theory, nodes in diagrammatic computational storyboards could refer to other diagrams by URLs, weaving webs of interconnected diagrams. End-users could click on these referring nodes to expand them, loading referenced content from URL-addressable resources into diagrams.
Computational storyboard diagrams could be collaboratively editable, enabling wiki platforms.
Functions would enable modularity and the reuse of storyboard content. Beyond referring to other diagrams by URLs, function-calling nodes in computational storyboard diagrams could refer to function-like diagrams by URLs while invoking and passing arguments to them.
With computational storyboarding functions, scenes’ characters, settings, props, actions, dialogue, and properties of these could all be parameterized.
Arguments provided to invoked functions could be in the form of multimedia content, structured objects, or text. Arguments and variables in functions could be used to create the prompts to be provided to generative artificial-intelligence systems including those prompts with which to generate thumbnails' images.
Markers, resembling keykodes or timecodes, could be placed between thumbnails in computational storyboard diagrams. Alternatively, some or all of the thumbnails could be selected to serve as referenceable markers, keykodes, or timecodes in resultant video. With markers, content creators could refer to instants or intervals of video generated from invoked functions.
Components in computational storyboard diagrams could be annotated with metadata.
Functions, for instance, could be annotated with metadata describing one or more sample argument sequences. In this way, content creators could have options for generating thumbnails' images while designing.
With respect to computational storyboarding functions and their diagrams, there are two varieties of control-flow constructs to consider.
A first variety of control-flow construct would route execution at runtime to paths of subsequent thumbnails. Such branching could occur either based upon the evaluation of expressions involving input arguments and variables or upon asking questions of interoperating artificial-intelligence systems.
A second variety of control-flow construct would result in branching or interactive video output, with routes or paths to be selected by viewers during playback. Generated interactive video content could interface with playback environments, e.g., in Web browsers, to provide viewers with features. Uses of interactive video include providing viewers with options, e.g., navigational menus.
While computational storyboards were executed or run to generate video, execution contexts, these building on the concepts of “call stacks”, could be utilized. Execution contexts would include nested frames, these building on the concepts of “stack frames”, which would each include those active nodes in functions' diagrams and those values of their input arguments and variables.
In addition to computational storyboards' functions providing their diagrammatic contents with their input arguments and variables, functions could contain nodes for obtaining “random values” from specified numerical intervals or, perhaps, for randomly selecting from nodes in containers.
Random variation could, optionally, be utilized by content creators to vary resultant video.
In theory, beyond using “random values” to simply vary generated video contents, diagram nodes for providing “automatic values” could be used to provide values, either scalars from intervals or selections from nodes in containers, which were intended to be optimized across multiple executions or runs while observations and data were collected.
As envisioned, developing and providing these components for computational storyboarding diagrams would simplify A/B testing and related techniques for content creators.
As considered, at least some computational storyboards’ thumbnails would have their images created by generative artificial-intelligence systems. Multimodal prompts, in these regards, could be varied including by using functions’ input arguments and variables.
A goal for computational storyboards is that generative artificial-intelligence systems could process them into longer-form video content.
Towards this goal, computational storyboards could provide materials beyond extensible thumbnails for generative artificial-intelligence systems. Notes about directing, cinematography, and characters or acting could be provided to systems. Multimedia materials with respect to characters, settings, props, and style could be provided to systems. That content intended to be synchronized and placed onto one or more display surfaces in generated video could be provided to systems.
Generated videos could utilize one or more tracks to enable features in playback environments. Transcripts or captions, for instance, alongside accompanying metadata track items, could be sent to viewers' artificial-intelligence assistants for these systems to be able to answer questions about videos’ contents.
With respect to generating video from computational storyboards, there could exist a “debugging” mode. When generated from such a mode, output video would contain extra metadata tracks providing objects for content creators to utilize to be able to jump into computational storyboards resumed to appropriate execution contexts for points of interest in the generated videos.
In theory, existing video content could be processed into computational storyboards.
Envisioned computational storyboards build on traditional storyboarding techniques while intending to enable generative artificial-intelligence systems to create longer-form output video, e.g., educational video.
Today, research is underway into aiding and automating scientific and scholarly research processes. Zhang, Pearson, and Wang (2024) discuss automated scientific research in the form of literature reviews. Kang and Xiong (2024) have developed a benchmark for measuring artificial-intelligence systems’ capabilities with respect to conducting academic surveys.
In the not-too-distant future, artificial-intelligence systems will be capable of performing some historical research tasks. This kind of research is expounded upon by Schrag (2021) and has its own particular caveats and fallacies, including those listed by Fischer (1970).
Historical research should begin with questions. With respect to historical research, there are “who”, “what”, “where”, and “when” factual questions and also “why”, “how”, and “with-what-consequences” interpretive questions. Historians tend to explore factual questions while addressing overarching interpretive questions.
Historical research questions should be carefully framed and, in these regards, Fischer (1970) enumerates the following pertinent fallacies: the Baconian fallacy; many questions; false dichotomous questions; metaphysical questions; fictional questions; semantical questions; declarative questions; counterquestions; tautological questions; contradictory questions; and “potentially verifiable” questions.
In addition to these fallacies, there are also to be wary of: deceptively-simple questions; impossible-to-answer questions; opinion questions; ethical questions; anachronistic questions; and non-historical questions.
When faced with problematic questions, history-educational question-answering systems could, instead of conducting automated historical research and answering them, provide relevant search engine results.
Technologies exist today for both manually and automatically fact-checking content using core sets of sources (e.g., history books, history textbooks, or encyclopedias), including when the content references a wider set of sources.
One example of such a technology is Citation Needed, a Web-browser extension developed by the Wikimedia Foundation’s Future Audiences team. It allows end-users to fact-check selections of content using Wikipedia articles as a core set of sources.
When content, or important assertions and claims therein, cannot be automatically corroborated by a core set of sources, systems could enqueue that content for more elaborate algorithms to process or for human personnel to review.
Multi-agent systems could contribute to and perform the same group processes through which encyclopedic articles are co-created (Kopf, 2022). Agents could co-write and revise answers to historical questions, historical essays, and long-form historical documents.
Interestingly, man-machine interactions, such as debate and consensus-building, could result in automatic modifications or revisions to systems’ output documents.
Narratives are critical to communicating historical knowledge (Munslow, 2018). Artificial-intelligence systems will, increasingly, be able to aid and automate the co-creation of research-based, multimodal historical stories and works of historical fiction.
Kindenberg (2024) recently compared artificial-intelligence generated and student-written historical narratives and found that artificial-intelligence generated stories tended to convey less emotion.
Past and present approaches to automatic story generation were surveyed by Alhussain and Azmi (2021). Research is unfolding with respect to multi-agent multimodal story generation (Arif, Arif, Khan, Haroon, Raza, & Athar, 2024; Hout, Amplayo, Palomaki, Jakobovits, Clark, & Lapata, 2024).
Recently, Laney and Dewan (2024) explored instructor-mediated man-machine interactions in educational structured forums, specifically class discussion boards. Artificial-intelligence agents supported teaching assistants in answering students’ questions.
In the near future, artificial-intelligence agents participating in structured forums will be able to follow end-users’ instructions to perform tasks and subtasks including answering historical questions and conducting historical research and writing.
Beyond idly awaiting questions and instructions from end-users, artificial-intelligence agents could proactively examine unfolding discussions to produce and provide suggestions with respect to how they might be of assistance.
Processes mediated and explicated by structured forums can be assessed and evaluated, processes involving historical thinking and reasoning (Van Drie & Van Boxtel, 2008; Bertram, Weiss, Zachrich, & Ziai, 2021), the co-creation of documents (Kopf, 2022), debate (Ulrich, 1986), and consensus-building (Lehrer & Wagner, 2012).
In the not-too-distant future, artificial-intelligence technologies will be able to aid and to automate the assessment and evaluation of historical research, reasoning, discussion, and writing processes mediated and explicated by structured forums.
Alhussain, Arwa I., and Aqil M. Azmi. "Automatic story generation: A survey of approaches." ACM Computing Surveys (CSUR) 54, no. 5 (2021): 1-38.
Arif, Samee, Taimoor Arif, Aamina Jamal Khan, Muhammad Saad Haroon, Agha Ali Raza, and Awais Athar. "The art of storytelling: Multi-agent generative AI for dynamic multimodal narratives." arXiv preprint arXiv:2409.11261 (2024).
Bertram, Christiane, Zarah Weiss, Lisa Zachrich, and Ramon Ziai. "Artificial intelligence in history education. Linguistic content and complexity analyses of student writings in the CAHisT project (Computational assessment of historical thinking)." Computers and Education: Artificial Intelligence (2021): 100038.
Fischer, David Hackett. Historians' fallacies: Toward a logic of historical thought. 1970.
Huot, Fantine, Reinald Kim Amplayo, Jennimaria Palomaki, Alice Shoshana Jakobovits, Elizabeth Clark, and Mirella Lapata. "Agents' room: Narrative generation through multi-step collaboration." arXiv preprint arXiv:2410.02603 (2024).
Kang, Hao, and Chenyan Xiong. "ResearchArena: Benchmarking LLMs' ability to collect and organize Information as research agents." arXiv preprint arXiv:2406.10291 (2024).
Kindenberg, Björn. "ChatGPT-generated and student-written historical narratives: A comparative analysis." Education Sciences 14, no. 5 (2024): 530.
Kopf, Susanne. A discursive perspective on Wikipedia: More than an encyclopaedia?. Springer Nature, 2022.
Laney, Mason, and Prasun Dewan. "Human-AI collaboration in a student discussion forum." In Companion Proceedings of the 29th International Conference on Intelligent User Interfaces, pp. 74-77. 2024.
Lehrer, Keith, and Carl Wagner. Rational consensus in science and society: A philosophical and mathematical study. Vol. 24. Springer Science & Business Media, 2012.
Munslow, Alun. Narrative and history. Bloomsbury Publishing, 2018.
Schrag, Zachary. The Princeton guide to historical research. Princeton University Press, 2021.
Ulrich, Walter. Judging academic debate. National Textbook Company, 1986.
Van Drie, Jannet, and Carla Van Boxtel. "Historical reasoning: Towards a framework for analyzing students’ reasoning about the past." Educational Psychology Review 20 (2008): 87-110.
Zhang, Starkson, Alfredo Pearson, and Zhenting Wang. "Autonomous generalist scientist: Towards and beyond human-level automatic research using foundation model-based AI agents and robots (a position)." (2024).
Attention span is the amount of time that learners can spend concentrating on tasks. Sustained attention develops through childhood and into adulthood, with a period of accelerated development occurring during early and middle childhood (Slattery, O’Callaghan, Ryan, Fortune, & McAvinue, 2022).
Attention training is a part of education. Learners are trained to remain focused on discussion topics for extended periods of time and to develop listening and analytical skills in the process. For over a century, there has been keen interest in improving children’s attention in educational contexts.
Adaptive instructional systems, a comparably recent development, adapt instruction based upon learners’ states of engagement, arousal, motivation, prior knowledge, anxiety, and engaged concentration (Sottilare & Goodwin, 2017).
A tutoring strategy for learners in states of engaged concentration might be to “do nothing” because they would already be in ideal states for learning. However, a longer-term tutoring strategy might be to strengthen learners’ capabilities to maintain their states of sustained attention and concentration.
How can adaptive instructional systems scheduling learners’ homework items contribute to increasing their attention spans and concentration?
Computer-administered and strategically, adaptively scheduled educational exercises and activities could be organized into gamified sprints, each sprint containing one or more stages. Learners would be encouraged to only take a break from or to conclude their schoolwork at the completion of a sprint stage and not in the middle of one.
To encourage the sustained, uninterrupted completion of schoolwork, the gamification could be such that a learner would have to repeat an entire sprint stage from the beginning – though not necessarily with the exact same items – if they didn’t finish it before taking a break or concluding. That is, checkpoints or save points could be provided only after sprint stages.
Adaptive instructional systems would control when sprints were presented to learners, the number of stages and items that would be in each, and their other properties including whether or not they would have countdown timers for successful completions. Informed by models of learners, adaptive instructional systems would be able to create and to utilize sprints to encourage capable learners to complete just one more item or just a few more items.
The goal posts for individual learners’ daily and weekly educational exercises and activities would be placed by adaptive instructional systems to be, on average, just a bit ahead of their comfort zones but within their performance capabilities.
What differentiates homework items from one another in terms of their attentional, concentrative, and other cognitive demands? For each item, for each learner, for a pace of progression, which cognitive reservoirs are depleted and to which extents? At which rates do individual learners’ various cognitive reservoirs replenish? Which cognitive reservoirs exist alongside redundant others, and which do not?
Cognitive load can be defined as a multidimensional construct representing the load that performing a particular task imposes on a learner’s cognitive system. The construct has a causal dimension reflecting the interaction between task and learner characteristics and an assessment dimension reflecting the measurable concepts of mental load, mental effort, and performance. Task characteristics that have been identified in previous research include task format, task complexity, uses of multimedia, time pressure, and the pacing of instruction (Paas, Tuovinen, Tabbers, & Van Gerven, 2003).
Xie and Salvendy (2000) distinguished between instantaneous load, peak load, accumulated load, average load, and overall load. Instantaneous load represents the dynamics of cognitive load, which fluctuate each moment that a learner works on a task. Peak load is the maximum value of instantaneous load while working on a task. Accumulated load is the total amount of load that a learner experiences during a task. Average load represents the mean intensity of load during the performance of a task. Overall load is the experienced load based on the whole working procedure or the mapping of instantaneous load or accumulated and average load in the learner’s brain.
Cognitive fatigue can be understood to be an “executive failure to maintain and optimize performance over acute but sustained cognitive effort resulting in performance that is lower and more variable than the individual’s optimal ability” (Holtzer, Shuman, Mahoney, Lipton, & Verghese, 2010). Cognitive fatigue typically develops gradually over time as a person engages in prolonged and demanding mental activities.
Cognitive fatigue may be assessed either subjectively or objectively. Subjective cognitive fatigue involves learners’ perceptions of their exhaustion. Objective cognitive fatigue is measured by changes in cognitive performance relative to a baseline (Karim, Pavel, Nikanfar, Hebri, Roy, Nambiappan, Jaiswal, Wylie, & Makedon, 2024).
While learners can express subjective cognitive fatigue to adaptive instructional systems at any point, considered, here, are automatically detecting learners’ instantaneous, accumulated, and overall cognitive load and objective cognitive fatigue as they progress through and complete strategically, adaptively scheduled homework items from one or more courses.
For over a century, there has been keen interest in improving children’s attention in educational contexts. There have been, thus far, three broad approaches to strengthening attention: attention network training, attention state training, and attention strategy training.
The first approach, attention network training – also referred to as cognitive training or brain training – involves the repetitive practice of cognitive tasks specifically thought to exercise neural networks related to attention.
Adaptive instructional systems could, in theory, intersperse attention-related cognitive tasks into learners’ multi-course, multi-objective schedules of homework items. However, a review of 14 attention network training intervention studies from 1999 to 2021 found that these cognitive tasks, these approaches, did not reliably improve sustained attention capacity (Slattery, O’Callaghan, Ryan, Fortune, & McAvinue, 2022).
The second approach, attention state training, involves practice designed to train brain states thought to influence attention and other networks. Attention state training may also involve networks but, importantly, does not include cognitive tasks specifically designed to train attentional networks.
Adaptive instructional systems could strategically schedule homework items and incorporate gamification, e.g., sprints, to contribute to the strengthening of attention span and concentration. Such techniques can be used in combination with other attention state training activities such as physical activity and meditation. Adaptive instructional systems could also intersperse meditative and mindfulness activities during learners’ homework activities.
The third approach, strategy training, focuses on practicing strategies that momentarily boost attention.
With respect to adaptive instructional systems and gamification, previous works include explorations into uses of: avatars, badges, progress bars, levels, narratives / stories, special effects, non-player characters, tasks / quests, timers, leaderboards, bonuses / rewards / trophies / collectibles, points, roles, virtual currencies, and maps (Ramadhan, Warnars, & Razak, 2023; Seaborn & Fels, 2015).
Adaptive instructional systems can incorporate gamification to strategically, adaptively schedule learners’ homework items from one or more courses, e.g., into sprints, to motivate learners to sustain attention and concentration.
Over the course of time, these processes should increase learners’ capabilities to their maximum potentials. As a result, learners’ performances in other areas for which sustained attention and concentration are prerequisites should also tend to improve.
Holtzer, Roee, Melissa Shuman, Jeannette R. Mahoney, Richard Lipton, and Joe Verghese. "Cognitive fatigue defined in the context of attention networks." Aging, Neuropsychology, and Cognition 18, no. 1 (2010): 108-128.
Karim, Enamul, Hamza R. Pavel, Sama Nikanfar, Aref Hebri, Ayon Roy, Harish R. Nambiappan, Ashish Jaiswal, Glenn R. Wylie, and Fillia Makedon. "Examining the landscape of cognitive fatigue detection: A comprehensive survey." Technologies 12, no. 3 (2024): 38.
Paas, Fred, Juhani E. Tuovinen, Huib Tabbers, and Pascal W. M. Van Gerven. "Cognitive load measurement as a means to advance cognitive load theory." Educational Psychologist 38, no. 1 (2003): 63-71.
Ramadhan, Arief, Harco L. H. S. Warnars, and Fariza H. A. Razak. "Combining intelligent tutoring systems and gamification: A systematic literature review." Education and Information Technologies (2023): 1-37.
Seaborn, Katie, and Deborah I. Fels. "Gamification in theory and action: A survey." International Journal of Human-computer Studies 74 (2015): 14-31.
Slattery, Éadaoin J., Eoin O’Callaghan, Patrick Ryan, Donal G. Fortune, and Laura P. McAvinue. "Popular interventions to enhance sustained attention in children and adolescents: A critical systematic review." Neuroscience and Biobehavioral Reviews 137 (2022): 104633.
Sottilare, Robert A., and Gregory A. Goodwin. "Adaptive instructional methods to accelerate learning and enhance learning capacity." In International Defense and Homeland Security Simulation Workshop of the I3M Conference. 2017.
Xie, Bin, and Gavriel Salvendy. "Prediction of mental workload in single and multiple tasks environments." International Journal of Cognitive Ergonomics 4, no. 3 (2000): 213-242.
Mathematics is the queen of the sciences and vital to nations’ education objectives. How can it be made more enjoyable and fun for learners of all ages?
In the United States, while mathematics curriculum varies across schools and districts, traditionally, high-school mathematics has been separated by topics, each topic typically lasting for an entire school year. Students might study algebra, geometry, trigonometry, and calculus as separate courses.
In nearly all other countries throughout the world, a more integrated approach is followed. In integrated approaches, high-school students take mathematics courses which cover a variety of mathematical topics.
In integrated and holistic curricular approaches, adaptive instructional systems for scheduling educational exercises and activities would seemingly have greater opportunities for making use of variety to alleviate the tedium of rote exercise.
Singmaster (1992) described recreational mathematics as being a treasury of problems which make mathematics more fun and he noted that, in medieval arithmetic texts, recreational questions were interspersed with more straightforward problems to provide breaks in the hard slog of learning.
How can the ancient art of interspersing fun and enjoyable items be analyzed and understood in a modern scientific manner?
Rowlett, Smith, Corner, O'Sullivan, and Woldock (2019) indicated that teaching using games has been shown to improve engagement and attitudes and that recreational mathematics has the potential to develop and expand mathematical skills, including problem-solving, and to deepen understanding.
Lopez-Morteo and Lopez (2007) indicated that uses of electronic learning environments for recreational mathematics learning objects positively affect student attitudes towards mathematics. They believed that such approaches have “the potential to promote the mathematics learning process, basically on its motivational aspects.”
McNamara, Jackson, and Graesser (2010) hypothesized that intelligent tutoring systems could be rendered more engaging to learners, and thus more effective in promoting learning by incorporating motivational components. They examined “benefits of incorporating game-based components within established tutoring systems to improve motivational aspects.”
They indicated several constructs related to and intertwined with motivation including self-regulation, self-efficacy, interest, and engagement. In addition to modeling learners’ mathematical proficiencies, learners could be modeled with respect to their affect, mood, self-efficacy, interest, engagement, flow (Csikszentmihalyi, 1988), and motivation.
Recreational mathematics puzzles and games, making mathematics more fun, can be scheduled and interspersed by adaptive instructional systems, e.g., educational recommender systems and intelligent tutoring systems, to alleviate the tedium of rote exercise, the slog of learning, and to enhance affect, mood, self-efficacy, interest, engagement, flow, and motivation.
Csikszentmihalyi, Mihaly. "The flow experience and its significance for human psychology." Optimal Experience: Psychological Studies of Flow in Consciousness 2 (1988): 15-35.
Lopez-Morteo, Gabriel, and Gilberto Lopez. "Computer support for learning mathematics: A learning environment based on recreational learning objects." Computers & Education 48, no. 4 (2007): 618-641.
McNamara, Danielle S., G. Tanner Jackson, and Art Graesser. "Intelligent tutoring and games (ITaG)." In Gaming for Classroom-based learning: Digital Role Playing as a Motivator of Study, pp. 44-65. IGI Global, 2010.
Rowlett, Peter, Edward Smith, Alexander S. Corner, David O'Sullivan, and Jeff Waldock. "The potential of recreational mathematics to support the development of mathematical learning." International Journal of Mathematical Education in Science and Technology 50, no. 7 (2019): 972-986.
Singmaster, David. "The unreasonable utility of recreational mathematics." In Lecture for the First European Congress of Mathematics, Paris. 1992.
The challenge addressed here is that of ensuring that all applicable rules, laws, and regulations are loaded into artificial-intelligence agents' working memories as they encounter wide, potentially open-ended, sets of situations.
By agents being able to search for, retrieve, and load applicable rules, laws, and regulations into their working memories, they could be in alignment with these items and subsequently select actions in accordance with them.
Conversational search engines for rules, laws, and regulations could, through dialogue, ask questions about narrated situations to better retrieve applicable search results. Through dialogue, search results could be accompanied by explanation or argumentation connecting them to input situations.
The goal of artificial-intelligence alignment is to ensure that artificial intelligence systems are properly aligned with human values (Gabriel, 2020).
This goal can be phrased as: (1) agents doing what they are instructed to do, (2) agents doing what they are intended to do, (3) agents doing what humans' behavior reveals them to prefer, (4) agents doing what humans would, if rational and informed, want them to do, (5) agents doing what is in the best interests of humans, objectively, or (6) agents doing what they morally ought, as defined by human individuals and society.
"If law is leveraged as a set of methodologies for conveying and interpreting directives and a knowledge base of societal values, it can play a unique role in aligning AI with humans" (Nay, 2022).
Agents are expected to comply with rules, laws, and regulations. The number of rules, laws, and regulations is expected to be large. For each rule, law, and regulation, for each considered action, agents are expected to verify that that action is in compliance. Ideally, agents will be able to act in real-time while performing these computations.
"In any given matter, before legal reasoning can take place, the reasoning agent must first engage in a task of 'law search' to identify the legal knowledge – cases, statutes, or regulations – that bear on the questions being addressed. This task may seem straightforward or obvious, but upon inspection, it presents difficult problems of definition and is challenging to represent in a tractable formalization that can be computationally executed" (Dadgostari, Guim, Beling, Livermore, & Rockmore, 2021).
Legal search engines could be of use for agents to search for and retrieve those rules, laws, and regulations applicable to their internal states, world models, and working memory contents.
Conversational legal search engines could interface as agents participating in multi-agent systems.
Agents' internal states, world models, and working memory contents and the transcripts from multi-agent systems' dialogues could be evaluated to determine whether applicable rules, laws, and regulations were properly loaded and available.
Event logs involving agents' internal states, world models, and working memories could be created, these perhaps accompanying multi-agent transcripts or recordings of environments.
Development and operations processes for ensuring that applicable rules, laws, and regulations are loaded by agents and multi-agent systems could be increasingly computer-aided or automated.
Dadgostari, Faraz, Mauricio Guim, Peter A. Beling, Michael A. Livermore, and Daniel N. Rockmore. "Modeling law search as prediction." Artificial Intelligence and Law 29 (2021): 3-34.
Gabriel, Iason. "Artificial intelligence, values, and alignment." Minds and Machines 30, no. 3 (2020): 411-437.
Nay, John J. "Law informs code: A legal informatics approach to aligning artificial intelligence with humans." Northwestern Journal of Technology and Intellectual Property 20 (2022): 309.
Agents representing ideological stances, positions, perspectives, or schools of thought can serve in multi-agent systems which generate encyclopedic answers to end-users' complex questions.
Large language models can generate content while role-playing, or impersonating, characters and personas (Shanahan, McDonell, & Reynolds, 2023). They can be fine-tuned using the works of individual philosophers to subsequently generate virtually indistinguishable responses (Schwitzgebel, Schwitzgebel, & Strasser, 2023). They can generate content from specified stances, positions, perspectives, and schools of thought. They can also generate content aligned with the attitudes and opinions of described groups, sub-populations, or demographics of interest (Santurkar, Durmus, Ladhak, Lee, Liang, & Hashimoto, 2023).
When should agents be searched for, retrieved, reused, designed, created, or varied? Which agents should be consulted when generating encyclopedic answers to end-users’ complex questions? Which agents’ responses would prove most valuable to consolidate, summarize, or synthesize into resultant encyclopedic answers? Should it be anticipated that selected teams of agents will recur across questions?
Automatically and manually designed agents, beyond potentially differing in terms of their models, training, fine-tuning, and prompts, could be provided with differing libraries of documents and could weigh, rank, or prioritize these documents differently.
Should agents and each of their libraries of documents be logically consistent and ideologically coherent? How should agents synthesize multiple challenging, potentially conflicting documents on complex issues and the arguments in them? Will these capabilities, additionally or instead, be emergent capabilities of orchestrated multi-agent systems?
Processes and strategies from multiple-text comprehension, reading group discussions, the Socratic method, the dialectic method, consensus building, group decision-making, and synthesis writing are anticipated to be of use to manager, facilitator, or moderator agents orchestrating teams of other agents, some representing individuals, groups, stances, positions, perspectives, or schools of thought.
Multiple-text comprehension results from processes and strategies with which readers make sense of complex topics or issues based on information presented in multiple texts. These processes and strategies are necessary when readers encounter multiple challenging, conflicting documents on complex issues (Anmarkrud, Bråten, & Strømsø, 2014; List & Alexander, 2017).
Reading group discussion strategies can enhance multiple readers’ comprehensions of texts. Transcripts from these multi-agent processes should prove valuable to consolidate, summarize, or synthesize (Goldenberg, 1992; Berne & Clark, 2008).
The principles and guidelines of the Socratic method include: the use of open-ended questions, clarifications of terms, providing examples and evidence, challenging arguments, summarization, drawing conclusions, and reflecting on the process. These key principles are realized through strategies such as: definition, generalization, induction, elenchus, hypothesis elimination, maieutics, dialectic, recollection, irony, and analogy (Chang, 2023).
The dialectic method involves dialogues between groups holding different points of view about subjects but wishing to arrive at truths through reasoned argumentation. With respect to multi-agent systems, formal, computational, and game-theoretic approaches have been and remain topics of ongoing research (Wells, 2007). The advancement of large-language-model-based agents has inspired a renewed interest in multi-agent argumentation and debate (Du, Li, Torralba, Tenenbaum, & Mordatch, 2023; Wang, Yue, & Sun, 2023; Wang, Du, Yu, Chen, Zhu, Chu, Yan, & Guan, 2023).
Processes which to build rational consensus and related decision-making procedures may be brought to bear during the orchestration of multi-agent systems (Lehrer & Wagner, 2012).
Synthesis writing is a set of processes and strategies through which the contents of multiple texts, including agent-generated contents, can be integrated into resultant output texts (Van Ockenburg, van Weijen, & Rijlaarsdam, 2019; Van Steendam, Vandermeulen, De Maeyer, Lesterhuis, Van den Bergh, & Rijlaarsdam, 2022). Argumentative synthesis writing combines intratextual and intertextual integration processes and strategies to generate texts from diverse sources, perspectives, and arguments (Mateos, Martín, Cuevas, Villalón, Martínez, & González-Lamas, 2018).
In particular when multi-agent systems encounter conflicting information in libraries of documents, a best possible answer might not be one providing a single alternative determined from group deliberation processes, but one providing a list of top alternatives where each could have available its supporting justification and opposing argumentation. Manager, facilitator, or moderator agents orchestrating teams of other agents could be expected to be capable of detecting and determining when these situations arise, when one alternative should prevail from unfolding group deliberation and when a list of top alternatives is, instead, the answer.
Teams of agents could search for, retrieve and reuse, or generate new documents and multimedia subcomponents combining natural language, structured knowledge, source code, multimedia, charts, diagrams, and infographics.
Hypermedia encyclopedia articles tend to result from hypertext layouts containing multimedia subcomponents. Some of these multimedia subcomponents could result from generative computation upon input prompts.
Approaches to consider include uses of planners to orchestrate agentic systems to search for, retrieve and reuse, or generate new layouts and content-related plans (Bao, 2023; Qiao, Li, Zhang, He, Kang, Zhang, Yang, et al., 2023; Wu, Bansal, Zhang, Wu, Zhang, Zhu, Li, Jiang, Zhang, & Wang, 2023).
Macroplans could be provided to manager, facilitator, or moderator agents and to groups of subordinate agents. Agents in these teams could either: (1) interact with one another via exchanging messages, or (2) interact with one another via forum software.
The production, preservation, aggregation, analysis, and maintenance of citations to referenced materials through these processes could be subjects of continuing research (Gao, Yen, Yu, & Chen, 2023).
Specialized agents could be invoked to produce kinds of "computational-notebook cells" containing prompts from which multimedia subcomponents could be searched for, retrieved and reused, or generated (Dibia, 2023).
Furthermore, documents and their multimedia subcomponents, in particular those subcomponents generated from prompts, could be subsequently editable. Each subcomponent could have its own changelog, or revision history, and discussion area.
Transcripts of multi-agent processes could be preserved and accompany resultant documents and their subcomponents. These transcripts could be forum-based, having multiple threads of structured discussions, or could be more intricate.
People and artificial-intelligence agents could interact in these multi-threaded, structured discussion forums or collaboration spaces. Man-machine interactions could potentially result in automatic updates to generated documents or to their multimedia subcomponents.
Automatically-generated content could include hyperlinks, context menu items, or other means of navigating from portions of content to any relevant argumentation or procedures in the accompanying multi-threaded, structured discussion forums or collaboration spaces.
Changelogs or revision histories could accompany documents and their subcomponents. People and artificial-intelligence agents could provide rationale, explanations, or justifications in them for modifications made to reusable, revisable documents and their subcomponents.
People could be provided with opportunities to provide structured feedback or open-ended, natural-language comments about portions of documents, sections, paragraphs, sentences, or content selections, and about other document subcomponents. These feedback, comments, and annotations could be displayed for only those opting into viewing them or quality-filtered subsets. When displayed, these could be expandable margin notes proximate to relevant document content.
People desiring to provide feedback or comments about document could additionally be provided with opportunities to interact with dialogue systems conducting contextual and adaptive surveys and opinion polls.
How should encyclopedic answers to end-users’ complex questions be evaluated? How should agents’ performances in coordinated dialogues, debates, and processes be evaluated? How should their contributions to collaborative document-generation processes be evaluated?
With evaluation frameworks and rubrics, components of automatically or manually designed and varied agents could be independently measured and compared. These kinds of scientific architectures could empower teams of humans to continuously improve multi-agent systems.
Large language models have been evaluated with respect to their exhibited moral beliefs (Scherrer, Shi, Feder, & Blei, 2023).
Algorithmic fidelity is defined to be the degree to which the complex patterns of relationships between ideas, attitudes, and socio-cultural contexts within a model accurately mirror those within a range of human sub-populations (Argyle, Busby, Fulda, Gubler, Rytting, & Wingate, 2023).
Value stability, the adherence to roles, characters, or personas during unfolding interactions, is argued to be another dimension of large language model comparison and evaluation alongside knowledge, model size, and speed (Kovač, Portelas, Sawayama, Dominey, & Oudeyer, 2024).
With respect to resultant encyclopedic answers, desired qualities include: verifiability and accuracy, objectivity, and neutrality, plurality, diversity, fairness, balance, and comprehensiveness with respect to relevant points of view (McGrady, 2020).
Socratic assistants have been explored for both moral enhancement and educational purposes (Lara & Deckers, 2020). The manager, facilitator, or moderator agents, discussed above, could coordinate teams comprised of artificial-intelligence agents, humans, or combinations of both.
Artificial intelligence systems for facilitation with respect to group meetings and discussions have been previously researched in the form of group support systems (Bostrom, Anson, & Clawson, 1993).
Educational applications of the technologies under discussion include intelligent tutoring systems for teams (Sottilare, Burke, Salas, Sinatra, Johnston, & Gilbert, 2018) and artificial-intelligence-enhanced pedagogical discussion forums (Butcher, Read, Jensen, Morel, Nagurney, & Smith, 2020).
Artificial intelligence systems capable of debating with humans are a subject of ongoing research (Slonim, Bilu, Alzate, Bar-Haim, Bogin, Bonin, & Choshen, 2021).
Modular systems containing multiple interlocutors, each with their own distinct points of view reflecting their training in a diversity of concrete wisdom traditions, have been previously considered (Volkman & Gabriels, 2023).
Anmarkrud, Øistein, Ivar Bråten, and Helge I. Strømsø. "Multiple-documents literacy: Strategic processing, source awareness, and argumentation when reading multiple conflicting documents." Learning and Individual Differences 30 (2014): 64-76.
Argyle, Lisa P., Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. "Out of one, many: Using language models to simulate human samples." Political Analysis 31, no. 3 (2023): 337-351.
Bao, Yunqian. "Towards automated generation of open domain Wikipedia articles." Master's thesis, University of Illinois at Urbana-Champaign, 2023.
Berne, Jennifer I., and Kathleen F. Clark. "Focusing literature discussion groups on comprehension strategies." The Reading Teacher 62, no. 1 (2008): 74-79.
Bostrom, Robert P., Robert Anson, and Vikki K. Clawson. "Group facilitation and group support systems." Group support systems: New perspectives 8 (1993): 146-168.
Butcher, Tamarin, Michelle Fulks Read, Ann Evans Jensen, Gwendolyn M. Morel, Alexander Nagurney, and Patrick A. Smith. "Using an AI-supported online discussion forum to deepen learning." In Handbook of research on online discussion-based teaching methods, pp. 380-408. IGI Global, 2020.
Chang, Edward Y. "Prompting large language models with the Socratic method." In 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0351-0360. IEEE, 2023.
Dibia, Victor. "Lida: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models." arXiv preprint arXiv:2303.02927 (2023).
Du, Yilun, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. "Improving factuality and reasoning in language models through multiagent debate." arXiv preprint arXiv:2305.14325 (2023).
Gao, Tianyu, Howard Yen, Jiatong Yu, and Danqi Chen. "Enabling large language models to generate text with citations." arXiv preprint arXiv:2305.14627 (2023).
Goldenberg, Claude. "Instructional conversations: Promoting comprehension through discussion." The Reading Teacher 46, no. 4 (1992): 316-326.
Kovač, Grgur, Rémy Portelas, Masataka Sawayama, Peter Ford Dominey, and Pierre-Yves Oudeyer. "Stick to your role! Stability of personal values expressed in large language models." arXiv preprint arXiv:2402.14846 (2024).
Lara, Francisco, and Jan Deckers. "Artificial intelligence as a Socratic assistant for moral enhancement." Neuroethics 13, no. 3 (2020): 275-287.
Lehrer, Keith, and Carl Wagner. Rational consensus in science and society: A philosophical and mathematical study. Vol. 24. Springer Science & Business Media, 2012.
List, Alexandra, and Patricia A. Alexander. "Analyzing and integrating models of multiple text comprehension." Educational Psychologist 52, no. 3 (2017): 143-147.
Mateos, Mar, Elena Martín, Isabel Cuevas, Ruth Villalón, Isabel Martínez, and Jara González-Lamas. "Improving written argumentative synthesis by teaching the integration of conflicting information from multiple sources." Cognition and Instruction 36, no. 2 (2018): 119-138.
McGrady, Ryan Douglas. Consensus-based encyclopedic virtue: Wikipedia and the production of authority in encyclopedias. North Carolina State University, 2020.
Qiao, Bo, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, et al. "TaskWeaver: A code-first agent framework." arXiv preprint arXiv:2311.17541 (2023).
Santurkar, Shibani, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. "Whose opinions do language models reflect?." arXiv preprint arXiv:2303.17548 (2023).
Scherrer, Nino, Claudia Shi, Amir Feder, and David Blei. "Evaluating the moral beliefs encoded in LLMs." Advances in Neural Information Processing Systems 36 (2023).
Schwitzgebel, Eric, David Schwitzgebel, and Anna Strasser. "Creating a large language model of a philosopher." arXiv preprint arXiv:2302.01339 (2023).
Shanahan, Murray, Kyle McDonell, and Laria Reynolds. "Role-play with large language models." Nature 623, no. 7987 (2023): 493-498.
Slonim, Noam, Yonatan Bilu, Carlos Alzate, Roy Bar-Haim, Ben Bogin, Francesca Bonin, Leshem Choshen, et al. "An autonomous debating system." Nature 591, no. 7850 (2021): 379-384.
Sottilare, Robert A., C. Shawn Burke, Eduardo Salas, Anne M. Sinatra, Joan H. Johnston, and Stephen B. Gilbert. "Designing adaptive instruction for teams: A meta-analysis." International Journal of Artificial Intelligence in Education 28 (2018): 225-264.
Van Ockenburg, Liselore, Daphne van Weijen, and Gert Rijlaarsdam. "Learning to write synthesis texts: A review of intervention studies." Journal of Writing Research 10, no. 3 (2019): 401-428.
Van Steendam, Elke, Nina Vandermeulen, Sven De Maeyer, Marije Lesterhuis, Huub Van den Bergh, and Gert Rijlaarsdam. "How students perform synthesis tasks: An empirical study into dynamic process configurations." Journal of Educational Psychology 114, no. 8 (2022): 1773.
Volkman, Richard, and Katleen Gabriels. "AI moral enhancement: Upgrading the socio-technical system of moral engagement." Science and Engineering Ethics 29, no. 2 (2023): 11.
Wang, Boshi, Xiang Yue, and Huan Sun. "Can ChatGPT defend its belief in truth? Evaluating LLM reasoning via debate." In Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 11865-11881. 2023.
Wang, Haotian, Xiyuan Du, Weijiang Yu, Qianglong Chen, Kun Zhu, Zheng Chu, Lian Yan, and Yi Guan. "Apollo's oracle: Retrieval-augmented reasoning in multi-agent debates." arXiv preprint arXiv:2312.04854 (2023).
Wells, Simon. "Formal dialectical games in multiagent argumentation." PhD thesis, University of Dundee, 2007.
Wu, Qingyun, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. "AutoGen: Enabling next-gen LLM applications via multi-agent conversation framework." arXiv preprint arXiv:2308.08155 (2023).
Personas for role-playing language agents can be categorized into three distinct categories: demographic personas, character personas, and individualized personas (Chen, Wang, Xu, Tuan, Zhang, Shi, & Xie, 2024). Demographic personas focus on groups of people sharing common characteristics. Character personas represent well-established, widely-recognized individuals. Individualized personas refer to digital profiles built and continuously updated based on personalized user data.
Character profiling is summarizing profiles for characters from fictional stories (Yuan, Yuan, Cui, Lin, Wang, Xu, Chen, & Yang, 2024).
A character profile encompasses attributes, relationships, events, and personality. The basic attributes of a character encompass gender, skills, talents, objectives, and background. A character’s interpersonal relationships are a vital aspect of their profile. Events cover the experiences that characters have been part of or impacted by, marking a critical dimension of their profile. Personality refers to the lasting set of characteristics and behaviors that form an individual’s unique way of adapting to life (Yuan, Yuan, Cui, Lin, Wang, Xu, Chen, & Yang, 2024).
Other studies into persona-based decision-making similarly involve descriptions of characters, composed of the characters’ basic situations and storylines, and characters’ memories of current scenes, which can offer more detail (Xu, Wang, Chen, Yuan, Yuan, Liang, Chen, Dong, & Xiao, 2024).
The evaluation of role-playing language agents has two primary categories of criteria: role-playing capability evaluation and persona fidelity evaluation. Role-playing capability evaluations concern aspects such as anthropomorphic abilities, attractiveness, and usefulness, which encompass more granular dimensions including conversation ability, engagement, persona consistency, emotion understanding, theory of mind, and problem-solving ability. Persona fidelity evaluation concentrates on whether individual agents well replicate the intended personas, including their knowledge, linguistic habits, personality, beliefs, and decision-making (Chen, Wang, Xu, Tuan, Zhang, Shi, & Xie, 2024).
With respect to the evaluation of character profiling, there are internal and external evaluation. Internal evaluation involves factual consistency examination, comparing the model-summarized character profiles with reference profiles. External evaluation involves motivation recognition, thoroughly evaluating whether the summarized character profiles enhance models’ understanding of characters’ essences, investigating whether character profiles generated by models effectively aid in comprehending characters’ motivations behind their decisions (Yuan, Yuan, Cui, Lin, Wang, Xu, Chen, & Yang, 2024).
Chen, Jiangjie, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie et al. "From persona to personalization: A survey on role-playing language agents." arXiv preprint arXiv:2404.18231 (2024).
Xu, Rui, Xintao Wang, Jiangjie Chen, Siyu Yuan, Xinfeng Yuan, Jiaqing Liang, Zulong Chen, Xiaoqing Dong, and Yanghua Xiao. "Character is destiny: Can large language models simulate persona-driven decisions in role-playing?." arXiv preprint arXiv:2404.12138 (2024).
Yuan, Xinfeng, Siyu Yuan, Yuhan Cui, Tianhe Lin, Xintao Wang, Rui Xu, Jiangjie Chen, and Deqing Yang. "Evaluating character understanding of large language models via character profiling from fictional works." arXiv preprint arXiv:2404.12726 (2024).