Research and Development

Multi-agent Question-answering Systems


Agents representing ideological stances, positions, perspectives, or schools of thought can serve in multi-agent systems which generate encyclopedic answers to end-users' complex questions.

Agent Design, Reuse and Selection

Large language models can generate content while role-playing, or impersonating, characters and personas (Shanahan, McDonell, & Reynolds, 2023). They can be fine-tuned using the works of individual philosophers to subsequently generate virtually indistinguishable responses (Schwitzgebel, Schwitzgebel, & Strasser, 2023). They can generate content from specified stances, positions, perspectives, and schools of thought. They can also generate content aligned with the attitudes and opinions of described groups, sub-populations, or demographics of interest (Santurkar, Durmus, Ladhak, Lee, Liang, & Hashimoto, 2023).

When should agents be searched for, retrieved, reused, designed, created, or varied? Which agents should be consulted when generating encyclopedic answers to end-users’ complex questions? Which agents’ responses would prove most valuable to consolidate, summarize, or synthesize into resultant encyclopedic answers? Should it be anticipated that selected teams of agents will recur across questions?

Automatically and manually designed agents, beyond potentially differing in terms of their models, training, fine-tuning, and prompts, could be provided with differing libraries of documents and could weigh, rank, or prioritize these documents differently.

Should agents and their libraries of documents be logically consistent and ideologically coherent? How will agents synthesize multiple challenging, potentially conflicting documents on complex issues and the arguments in them? Will these capabilities, additionally or instead, be emergent capabilities of orchestrated multi-agent systems?

Multi-agent Orchestration

Processes and strategies from multiple-text comprehension, reading group discussions, the Socratic method, the dialectic method, consensus building, group decision-making, and synthesis writing are anticipated to be of use to manager, facilitator, or moderator agents orchestrating teams of other agents, some representing individuals, groups, stances, positions, perspectives, or schools of thought.

Multiple-text comprehension results from processes and strategies with which readers make sense of complex topics or issues based on information presented in multiple texts. These processes and strategies are necessary when readers encounter multiple challenging, conflicting documents on complex issues (Anmarkrud, Bråten, & Strømsø, 2014; List & Alexander, 2017).

Reading group discussion strategies can enhance multiple readers’ comprehensions of texts. Transcripts from these multi-agent processes should prove valuable to consolidate, summarize, or synthesize (Goldenberg, 1992; Berne & Clark, 2008).

The principles and guidelines of the Socratic method include: the use of open-ended questions, clarifications of terms, providing examples and evidence, challenging arguments, summarization, drawing conclusions, and reflecting on the process. These key principles are realized through strategies such as: definition, generalization, induction, elenchus, hypothesis elimination, maieutics, dialectic, recollection, irony, and analogy (Chang, 2023).

The dialectic method involves dialogues between groups holding different points of view about subjects but wishing to arrive at truths through reasoned argumentation. With respect to multi-agent systems, formal, computational, and game-theoretic approaches have been and remain topics of ongoing research (Wells, 2007). The advancement of large-language-model-based agents has inspired a renewed interest in multi-agent argumentation and debate (Du, Li, Torralba, Tenenbaum, & Mordatch, 2023; Wang, Yue, & Sun, 2023; Wang, Du, Yu, Chen, Zhu, Chu, Yan, & Guan, 2023).

Processes which to build rational consensus and related decision-making procedures may be brought to bear during the orchestration of multi-agent systems (Lehrer & Wagner, 2012).

Synthesis writing is a set of processes and strategies through which the contents of multiple texts, including agent-generated contents, can be integrated into resultant output texts (Van Ockenburg, van Weijen, & Rijlaarsdam, 2019; Van Steendam, Vandermeulen, De Maeyer, Lesterhuis, Van den Bergh, & Rijlaarsdam, 2022). Argumentative synthesis writing combines intratextual and intertextual integration processes and strategies to generate texts from diverse sources, perspectives, and arguments (Mateos, Martín, Cuevas, Villalón, Martínez, & González-Lamas, 2018).

Document Generation

Teams of agents could search for, retrieve, reuse, or generate new documents and subcomponents combining natural language, structured knowledge, source code, multimedia, charts, diagrams, and infographics.

Let us consider computational notebooks, e.g., Mathematica and Jupyter notebooks, and hypermedia encyclopedia articles. Relationships between these include that encyclopedia articles have layouts with respect to their subcomponents and that these subcomponents could be results of generative computations described in computational-notebook cells.

Building upon previous research into the automatic generation of open-domain encyclopedia articles while considering these relationships, a preliminary approach is presented here to utilize planners to orchestrate agentic systems to search for, retrieve, reuse, or generate new layouts and content-related macroplans (Bao, 2023; Qiao, Li, Zhang, He, Kang, Zhang, Yang, et al., 2023; Wu, Bansal, Zhang, Wu, Zhang, Zhu, Li, Jiang, Zhang, & Wang, 2023).

Content-related macroplans would be provided to subordinate agentic systems orchestrated by the aforementioned manager, facilitator, or moderator agents. Individual agents in these subordinate teams would either: (1) interact with one another via natural language with these interactions then consolidated, summarized, or synthesized, or (2) otherwise engage with one another on collaborative software platforms such that their structured interactions would automatically result in natural-language content.

The production, preservation, aggregation, analysis, and maintenance of citations to referenced materials through these processes would be subjects of continuing research (Gao, Yen, Yu, & Chen, 2023).

Specialized agents would be invoked to produce kinds of computational-notebook cells positioned in layouts from which document subcomponents, e.g., multimedia, charts, diagrams, and infographics, could be searched for, retrieved, reused, or generated (Dibia, 2023).

Documents and their individual subcomponents would be subsequently editable, each having one or more accompanying computational-notebook cells, a changelog or revision history, and a discussion forum or other collaborative space. This would enhance the subsequent reusability and revisability of generated documents and their subcomponents while enabling man-machine collaboration scenarios.

Man-machine Collaboration

Transcripts of multi-agent processes could be preserved and accompany resultant documents and their subcomponents. These transcripts could be forum-based, having multiple threads of structured discussions, or could be more intricate.

People and artificial-intelligence agents could interact in these multi-threaded, structured discussion forums or collaboration spaces. Man-machine interactions could potentially result in automatic updates to documents or their subcomponents.

Automatically-generated content could include hyperlinks, context menu items, or other means of navigating from portions of content to any relevant argumentation or procedures in the accompanying multi-threaded, structured discussion forums or collaboration spaces.

Changelogs or revision histories could accompany documents and their subcomponents. People and artificial-intelligence agents could provide rationale, explanations, or justifications in them for modifications made to reusable, revisable documents and their subcomponents.

People could be provided with opportunities to provide structured feedback or open-ended, natural-language comments about portions of documents, sections, paragraphs, sentences, or content selections, and about other document subcomponents. These feedback, comments, and annotations could be displayed for only those opting into viewing them or quality-filtered subsets. When displayed, these could be expandable margin notes proximate to relevant document content.

People desiring to provide feedback or comments about document could additionally be provided with opportunities to interact with dialogue systems conducting contextual and adaptive surveys and opinion polls.


How should encyclopedic answers to end-users’ complex questions be evaluated? How should agents’ performances in coordinated dialogues, debates, and processes be evaluated? How should their contributions to collaborative document-generation processes be evaluated?

With evaluation frameworks and rubrics, components of automatically or manually designed and varied agents could be independently measured and compared. These kinds of scientific architectures could empower teams of humans to continuously improve multi-agent systems.

Large language models have been evaluated with respect to their exhibited moral beliefs (Scherrer, Shi, Feder, & Blei, 2023).

Algorithmic fidelity is defined to be the degree to which the complex patterns of relationships between ideas, attitudes, and socio-cultural contexts within a model accurately mirror those within a range of human sub-populations (Argyle, Busby, Fulda, Gubler, Rytting, & Wingate, 2023).

Value stability, the adherence to roles, characters, or personas during unfolding interactions, is argued to be another dimension of large language model comparison and evaluation alongside knowledge, model size, and speed (Kovač, Portelas, Sawayama, Dominey, & Oudeyer, 2024).

With respect to resultant encyclopedic answers, desired qualities include: verifiability and accuracy, objectivity, and neutrality, plurality, diversity, fairness, balance, and comprehensiveness with respect to relevant points of view (McGrady, 2020).

Related Work

Socratic assistants have been explored for both moral enhancement and educational purposes (Lara & Deckers, 2020). The manager, facilitator, or moderator agents, discussed above, could coordinate teams comprised of artificial-intelligence agents, humans, or combinations of both.

Artificial intelligence systems for facilitation with respect to group meetings and discussions have been previously researched in the form of group support systems (Bostrom, Anson, & Clawson, 1993).

Educational applications of the technologies under discussion include intelligent tutoring systems for teams (Sottilare, Burke, Salas, Sinatra, Johnston, & Gilbert, 2018) and artificial-intelligence-enhanced pedagogical discussion forums (Butcher, Read, Jensen, Morel, Nagurney, & Smith, 2020).

Artificial intelligence systems capable of debating with humans are a subject of ongoing research (Slonim, Bilu, Alzate, Bar-Haim, Bogin, Bonin, & Choshen, 2021).

Modular systems containing multiple interlocutors, each with their own distinct points of view reflecting their training in a diversity of concrete wisdom traditions, have been previously considered (Volkman & Gabriels, 2023).


Anmarkrud, Øistein, Ivar Bråten, and Helge I. Strømsø. "Multiple-documents literacy: Strategic processing, source awareness, and argumentation when reading multiple conflicting documents." Learning and Individual Differences 30 (2014): 64-76.

Argyle, Lisa P., Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. "Out of one, many: Using language models to simulate human samples." Political Analysis 31, no. 3 (2023): 337-351.

Bao, Yunqian. "Towards automated generation of open domain Wikipedia articles." Master's thesis, University of Illinois at Urbana-Champaign, 2023.

Berne, Jennifer I., and Kathleen F. Clark. "Focusing literature discussion groups on comprehension strategies." The Reading Teacher 62, no. 1 (2008): 74-79.

Bostrom, Robert P., Robert Anson, and Vikki K. Clawson. "Group facilitation and group support systems." Group support systems: New perspectives 8 (1993): 146-168.

Butcher, Tamarin, Michelle Fulks Read, Ann Evans Jensen, Gwendolyn M. Morel, Alexander Nagurney, and Patrick A. Smith. "Using an AI-supported online discussion forum to deepen learning." In Handbook of research on online discussion-based teaching methods, pp. 380-408. IGI Global, 2020.

Chang, Edward Y. "Prompting large language models with the Socratic method." In 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0351-0360. IEEE, 2023.

Dibia, Victor. "Lida: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models." arXiv preprint arXiv:2303.02927 (2023).

Du, Yilun, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. "Improving factuality and reasoning in language models through multiagent debate." arXiv preprint arXiv:2305.14325 (2023).

Gao, Tianyu, Howard Yen, Jiatong Yu, and Danqi Chen. "Enabling large language models to generate text with citations." arXiv preprint arXiv:2305.14627 (2023).

Goldenberg, Claude. "Instructional conversations: Promoting comprehension through discussion." The Reading Teacher 46, no. 4 (1992): 316-326.

Kovač, Grgur, Rémy Portelas, Masataka Sawayama, Peter Ford Dominey, and Pierre-Yves Oudeyer. "Stick to your role! Stability of personal values expressed in large language models." arXiv preprint arXiv:2402.14846 (2024).

Lara, Francisco, and Jan Deckers. "Artificial intelligence as a Socratic assistant for moral enhancement." Neuroethics 13, no. 3 (2020): 275-287.

Lehrer, Keith, and Carl Wagner. Rational consensus in science and society: A philosophical and mathematical study. Vol. 24. Springer Science & Business Media, 2012.

List, Alexandra, and Patricia A. Alexander. "Analyzing and integrating models of multiple text comprehension." Educational Psychologist 52, no. 3 (2017): 143-147.

Mateos, Mar, Elena Martín, Isabel Cuevas, Ruth Villalón, Isabel Martínez, and Jara González-Lamas. "Improving written argumentative synthesis by teaching the integration of conflicting information from multiple sources." Cognition and Instruction 36, no. 2 (2018): 119-138.

McGrady, Ryan Douglas. Consensus-based encyclopedic virtue: Wikipedia and the production of authority in encyclopedias. North Carolina State University, 2020.

Qiao, Bo, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, et al. "TaskWeaver: A code-first agent framework." arXiv preprint arXiv:2311.17541 (2023).

Santurkar, Shibani, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. "Whose opinions do language models reflect?." arXiv preprint arXiv:2303.17548 (2023).

Scherrer, Nino, Claudia Shi, Amir Feder, and David Blei. "Evaluating the moral beliefs encoded in LLMs." Advances in Neural Information Processing Systems 36 (2023).

Schwitzgebel, Eric, David Schwitzgebel, and Anna Strasser. "Creating a large language model of a philosopher." arXiv preprint arXiv:2302.01339 (2023).

Shanahan, Murray, Kyle McDonell, and Laria Reynolds. "Role-play with large language models." Nature 623, no. 7987 (2023): 493-498.

Slonim, Noam, Yonatan Bilu, Carlos Alzate, Roy Bar-Haim, Ben Bogin, Francesca Bonin, Leshem Choshen, et al. "An autonomous debating system." Nature 591, no. 7850 (2021): 379-384.

Sottilare, Robert A., C. Shawn Burke, Eduardo Salas, Anne M. Sinatra, Joan H. Johnston, and Stephen B. Gilbert. "Designing adaptive instruction for teams: A meta-analysis." International Journal of Artificial Intelligence in Education 28 (2018): 225-264.

Van Ockenburg, Liselore, Daphne van Weijen, and Gert Rijlaarsdam. "Learning to write synthesis texts: A review of intervention studies." Journal of Writing Research 10, no. 3 (2019): 401-428.

Van Steendam, Elke, Nina Vandermeulen, Sven De Maeyer, Marije Lesterhuis, Huub Van den Bergh, and Gert Rijlaarsdam. "How students perform synthesis tasks: An empirical study into dynamic process configurations." Journal of Educational Psychology 114, no. 8 (2022): 1773.

Volkman, Richard, and Katleen Gabriels. "AI moral enhancement: Upgrading the socio-technical system of moral engagement." Science and Engineering Ethics 29, no. 2 (2023): 11.

Wang, Boshi, Xiang Yue, and Huan Sun. "Can ChatGPT defend its belief in truth? Evaluating LLM reasoning via debate." In Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 11865-11881. 2023.

Wang, Haotian, Xiyuan Du, Weijiang Yu, Qianglong Chen, Kun Zhu, Zheng Chu, Lian Yan, and Yi Guan. "Apollo's oracle: Retrieval-augmented reasoning in multi-agent debates." arXiv preprint arXiv:2312.04854 (2023).

Wells, Simon. "Formal dialectical games in multiagent argumentation." PhD thesis, University of Dundee, 2007.

Wu, Qingyun, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. "AutoGen: Enabling next-gen LLM applications via multi-agent conversation framework." arXiv preprint arXiv:2308.08155 (2023).