Research Statement

Thoughtful AI

Technological artifacts are not neutral; they shape how we act, how we see each other, how we think, what we value [@5096; @4942; @178]. This is particularly true in technologies that mediate how people interact with each other, such as media and now generative AI, which injects computational judgments into how we compose our communications and how we perceive others’ communications with us. By shaping how we think about each other, these technologies risk doing wrong to each other, such as dishonoring our email recipients and educators through computer-generated simulacra of empathetic and thoughtful prose. But my work embodies a hope that it doesn’t have to be that way: shaped rightly, AI could help us do justice towards one another through thoughtful communication, perspective-taking, and reflective strategy.¹

Dark side of AI Productivity

Narratives around intelligent systems (especially generative AI) has typically centered on speed and scale: how can we get more done in less time? But productivity-enhancing systems like text generation systems have been found to encourage people using them to align with their biases (see my work[@1969] and others[@5053; @5027]), increase risk of errors[@56], and risk homogenizing creative expression [@4964; @4908], among other challenges.

My dissertation work inaugurated the study of how predictive text entry systems might affect the content of written communication. Although predictive text was already ubiquitous on smartphones and starting to get deployed in other interfaces, academic research on it had only evaluated its effects on productivity (e.g., text entry speed and accuracy). No one (to my knowledge) had questioned what effect it might have on the content of writing. But once I started looking I found many ways that predictive text nudges writers to conform to system’s suggestions (word choice [@1960], level of detail [@1796], stance [@1969]).

Since suggestions can preempt human thinking, self-reported reflection is insufficient to document how writing with suggestions compares with what people would have written counterfactually without the suggestions. So I ran controlled experiments using repeatable writing tasks and devised quantitative measures to tease out these conformity effects. In the years since I published this work, many other researchers have replicated and extended these findings [@4964; @5101; @1877; @5053]. To run these experiments, I designed and implemented a custom mobile predictive-text keyboard, including an innovative phrase-suggestion interface, and wrote inference code to generate within-word and next-word predictions from n-gram and LSTM language models (including one configuration where the predictions were conditioned on image embeddings).

Supporting Thoughtful Writing through Alternative Interactions

Reflecting on my dissertation findings, I conjectured that the problematic nature of the predictive interactions I’d studied was mediated through an implicit value of fast content production instead of cognitive engagement. By presenting suggestions that humans could accept as their own with minimal thought or interaction, these prediction-based interfaces embody a value of quantity over thoughtfulness. Although disciplined people can still use these systems in thoughtful ways, the default is cognitive disengagement; human participation becomes passive. We must study how AI can help us think not less but better.

The problem of cognitive disengagement also occurs in AI decision-support systems: people over-rely on the recommendations, leading to lower performance and less human learning. However, one intervention has been shown to be effective at maintain cognitive engagement in AI decision-making: hiding the prediction and showing only the AI explanation [@22].

Thought Embers: suggestions that require writers to think

What if AI helped us come up with questions, not answers, as Licklider suggested back in 1960 [@5097]? Can presenting fragments of ideas, rather than complete and usable ideas, help writers retain agency over their work? In Summer 2024, I worked with a team of 4 students (3 undergraduate, 1 high school) to study how the type of information suggested by an AI writing assistant affects writers’ cognitive engagement with the suggestion and how they appropriate that suggestion in their draft. We developed a Microsoft Word sidebar that offered next-sentence suggestions expressed in four different ways: in addition to predictive-text-style examples they could use verbatim (e.g., by copy-and-paste), we also allowed writers to request questions that the next sentence might answer, vocabulary they might use, and rhetorical moves (such as giving examples or considering counterarguments) that their next sentence might engage with. Students conceptualized these interventions, designed prompting and post-processing approaches for the NLP backend, and designed and implemented the sidebar frontend.

In a pilot study (N=8), writers found questions and rhetorical moves to be useful. Although writers chose to request examples more often, they often rejected the suggested text. Overall, they rated the Questions suggestion type as most compelling (desirable) in post-task surveys, followed by Examples. These preliminary results suggest that writers welcomed AI suggestions that could not be inserted verbatim into their documents but instead required further thought. Overall, by offering intentionally-incomplete suggestions like Questions or Rhetorical Moves, AI systems might become better cognitive partners for writers, enriching thinking rather than circumventing it. The students have presented this work at internal venues; we are designing a follow-up experiment to build on these findings for broader publication.

Interaction-Required Content Suggestions

The prior project manipulated the type of AI-generated content but used conventional interaction techniques; what if we presented typical types of content, but through interaction modalities that invite thinking? For example, consider a revision task, like a writer trying to adapt their work for a non-expert audience. An LLM can be used in at least 3 different ways for a task like this: directly generating the complete revised document (requiring no direct interaction), providing the revision as predictive text for the writer one word at a time (showing incremental contextual alternatives and requiring choices at each step), or showing the writer’s original document annotated with alternative words the LLM might generate if generating that document (focusing human attention and requiring the writer to make any edits themselves).

In work in progress, my undergraduate research team is using prototypes we made of these interactions to study how the differences in interaction paradigm expressed in these three interfaces affect the writer’s sense of authorship/ownership, how helpful the AI system is for achieving their goal, and other factors like the writer’s awareness of audience and sensitivity to AI bias. To support these interactions efficiently, we needed to implement custom inference code that interfaces with open-weights LLMs at a lower level than is typically used. Specifically, I used the forward pass of the model to compute single-step next-token predictive distributions rather than a batch-generated response, then manually managed the KV cache to generate lookahead tokens.

Supporting Reflection on Writing

Textfocals: AI Views for Revising

Could AI help writers think about their text from the perspective of their readers and revise to serve them better? This work builds on theories of writing process and reflection for revision [@4814; @4813]. I worked with 7 undergraduate researchers in Summer 2023 to develop Textfocals, a Microsoft Word sidebar add-in that provides writers LLM-powered tools to reflect on their writing and set revision goals. Built-in tools provide several types of summaries and questions, but writers can also customize the prompts.

A formative user study with Textfocals [@50] yielded promising evidence that this approach could help people develop underdeveloped ideas, cater to the rhetorical audience, and clarify their writing. However, the students running the study and analyzing the findings identified interaction design challenges related to document navigation and scoping, prompt engineering, and context management, leading to substantial reworks of the interaction design. Our work shows the breadth of the design space of writing support interfaces powered by generative AI that maintain authorship integrity.

Reflective Dialogue

An advanced undergraduate student is currently working with me and another co-advisor to refine the kind of reflection-promoting interactions that we started exploring in Textfocals. The current work focuses on helping the writer actively reflect on their rhetorical situation (who the writer is addressing, why, and how) and their writing. The AI-powered system can then help the writer set revision goals grounded in their rhetorical situation and consider multiple approaches to achieving those goals. The student is specifically interested in how using speech as a modality for reflection might affect the quality of a reflective dialogue.

Although I try to give students specific and reasonably scoped tasks, I also try to involve them at least lightly in the whole research process (from design to writing to reviewing) and let them experience the adventure of uncertain research and face it with courage. The student who is doing this project is an example of the success of that strategy: the vision of the project motivated him enough to start a follow-up project on his own initiative.

Future Directions

My team is currently finalizing a Microsoft Word add-in store submission for our add-in, which we are now calling Thoughtful (since it includes both drafting and revision features), and making plans for a longitudinal study of its real-world use. Other near-term plans include a follow-up study on Thought Embers, a study of how interaction-required suggestions affect ownership in revision tasks, and a study on modality of interaction with AI in reflection for text revision. All these aim for publication in archival venues such as ACM CHI, UIST, or IUI.

My work is currently funded by an NSF CRII grant (like NSF CAREER grants but for primarily-undergraduate institutions), and I use a mix of NSF ACCESS cyberinfrastructure and a small GPU workstation on campus. I plan to apply for more funding next year when the CRII grant runs out.

Visions for Cross-Disciplinary Collaboration

I am particularly interested in three promising areas for collaboration:

Working with writing and rhetoric scholars to study how AI-assisted tools might explicitly scaffold writing processes that instructors want to emphasize. This collaboration would build on our work on Textfocals while bringing in deeper theoretical frameworks.
Working with education scholars to develop and study AI systems that help instructors better understand their learners better and make student-centered revisions to their teaching and materials? I have already started to explore some ideas, but I look forward to engaging these questions more deeply with domain experts. For example:
- I have started building LMS extensions that give feedback on instructor content (not student content, for privacy reasons) to help taking the perspective of students: predicting what clarifying questions students will have about assignments, quiz questions, logistics, or policies.
- I have done some formative work with a language education scholar on how we might build fine-grained models of learner knowledge to adaptively differentiate instruction within classes. Current high-fidelity language modeling approaches aim to mimic highly competent linguistic behavior. High-fidelity modeling of non-expert (student) behavior could enable precisely differentiated instruction (e.g., personalized glosses in language education) where each learner stays exactly in their zone of proximal development within an authentic learning community (such as a classroom).

Working with organizational behavior and communication scholars: to explore how thoughtful AI (reflection-promoting interfaces) might improve group decision-making and communication. Examples could include:
- Aiding in reflective revision of organizational communication materials (perhaps as add-ins for other software like Outlook and PowerPoint). For example, for an organizational email communication, the system could encourage the sender to consider how will various stakeholders perceive the decision being communicated.
- Reflective/critical AI help for leaders and facilitators: to flag potentially excluded stakeholders or unvoiced concerns or suggest interventions aimed at increasing understanding or surface unvoiced concerns.

Conclusion

We can—and must—build AI to help people think better about other people. To do so, we need cross-disciplinary theoretical grounding, intentional co-design of AI and HCI, and empirical study both in the lab and through deployed systems. My research agenda welcomes students and experts in many disciplines to contribute to building a more caring world through thoughtful AI.

References

Footnotes

See https://kenarnold.org/projects.html for papers and demos of the projects mentioned here.↩︎