Transparency for Generated Text

Lightweight marks for AI-generated text
ml
llm
nlp
Author

Ken Arnold

Published

February 17, 2023

The norm of transparency suggests that readers should be able to see that some text was generated using an AI system. But how should we indicate that some text has been automatically generated? Some have proposed fancy watermarks, but these have the potential for false positives, can be gamed, and aren’t actually visible to humans without using a fancy tool. I think it’s better if we make it easier for people to mark generated text, so easy that it would be the default.

A few dimensions in the design space:

  1. How visible?
    • do you need to copy the text into a different tool to see if it’s autogen?
    • could you notice it from close study, e.g., slight differences in typography or little extra markers over some letters?
    • maybe there’s a different style, like faded out or smaller or italic or lighter font weight?
  2. How encoded?
    • Formatting often doesn’t copy-paste, especially in text messaging. But if the actual text uses some less-visible or more intrusive encoding, the display widget could translate that into something less intrusive.
    • Unicode symbols are a good option:
    • There are lots of invisible marks in Unicode - search for Unicode Steganography. Perhaps just add a zero-width non-joiner character to the end of each generated word?
  3. What gets marked?
    • Do we mark every generated character?
    • Every generated word?
    • Some generated words?
    • Proportional to how much information the model added (i.e., its total logprob?)
  4. How does this relate to co-writing?

For example, this is what GitHub Copilot generated when I started this blog post.

my_text = "Some people have proposed fancy watermarks, " 

completion = """but I think that's a bit too much. I think we should just add a simple line at the end of the text, like this:

> This text was generated by a machine learning model.

I think this is a good compromise between transparency and aesthetics. It's not too intrusive, and it's clear that the text was generated by a machine learning model.

## Why is this important?

I think it's important to be transparent about the fact that some text has been generated by a machine learning model. This is because machine learning models are not perfect, and they can make mistakes. For example, a machine learning model might generate text that is offensive or inappropriate. If we don't indicate that the text was generated by a machine learning model, then people might think that the text was written by a human, and they might be offended or upset by it.
"""

import re

One option to show that the text was autogenerated: a simple combining mark:

print(my_text + re.sub(r'(?<=\b\w)', '\u0300', completion))
Some people have proposed fancy watermarks, b̀ut Ì t̀hink t̀hat's̀ à b̀it t̀oo m̀uch. Ì t̀hink ẁe s̀hould j̀ust àdd à s̀imple l̀ine àt t̀he ènd òf t̀he t̀ext, l̀ike t̀his:

> T̀his t̀ext ẁas g̀enerated b̀y à m̀achine l̀earning m̀odel.

Ì t̀hink t̀his ìs à g̀ood c̀ompromise b̀etween t̀ransparency ànd àesthetics. Ìt's̀ ǹot t̀oo ìntrusive, ànd ìt's̀ c̀lear t̀hat t̀he t̀ext ẁas g̀enerated b̀y à m̀achine l̀earning m̀odel.

## Ẁhy ìs t̀his ìmportant?

Ì t̀hink ìt's̀ ìmportant t̀o b̀e t̀ransparent àbout t̀he f̀act t̀hat s̀ome t̀ext h̀as b̀een g̀enerated b̀y à m̀achine l̀earning m̀odel. T̀his ìs b̀ecause m̀achine l̀earning m̀odels àre ǹot p̀erfect, ànd t̀hey c̀an m̀ake m̀istakes. F̀or èxample, à m̀achine l̀earning m̀odel m̀ight g̀enerate t̀ext t̀hat ìs òffensive òr ìnappropriate. Ìf ẁe d̀on't̀ ìndicate t̀hat t̀he t̀ext ẁas g̀enerated b̀y à m̀achine l̀earning m̀odel, t̀hen p̀eople m̀ight t̀hink t̀hat t̀he t̀ext ẁas ẁritten b̀y à h̀uman, ànd t̀hey m̀ight b̀e òffended òr ùpset b̀y ìt.

Another option: Interpuncts after each generated word.

after_word_space_re = re.compile(r'(?<=[\w,.?]) ')
print(my_text + after_word_space_re.sub('\u00b7', completion))
Some people have proposed fancy watermarks, but·I·think·that's·a·bit·too·much.·I·think·we·should·just·add·a·simple·line·at·the·end·of·the·text,·like·this:

> This·text·was·generated·by·a·machine·learning·model.

I·think·this·is·a·good·compromise·between·transparency·and·aesthetics.·It's·not·too·intrusive,·and·it's·clear·that·the·text·was·generated·by·a·machine·learning·model.

## Why·is·this·important?

I·think·it's·important·to·be·transparent·about·the·fact·that·some·text·has·been·generated·by·a·machine·learning·model.·This·is·because·machine·learning·models·are·not·perfect,·and·they·can·make·mistakes.·For·example,·a·machine·learning·model·might·generate·text·that·is·offensive·or·inappropriate.·If·we·don't·indicate·that·the·text·was·generated·by·a·machine·learning·model,·then·people·might·think·that·the·text·was·written·by·a·human,·and·they·might·be·offended·or·upset·by·it.

Even if we use something invisible to mark the text, a viewer could translate those markings into visible styles. For example, we could use “zero-width non-joiners” after generated words. (This code doesn’t mark punctuation, which may not be desireable.)

# encode the generated text with zero-width non-joiners after words.
encoded_text = re.sub(r'(?<=\w\b)', '\u200c', completion)
# Note that you can't see the difference in the normal printout
print(encoded_text[:30])
# but you can see it if you look inside
print(repr(encoded_text[:30]))
but‌ I‌ think‌ that‌'s‌ a‌ bit
"but\u200c I\u200c think\u200c that\u200c's\u200c a\u200c bit"

But we could then show the text like this:

from IPython.display import HTML
def format_generated(match):
    word = match.group(1)
    return f'<span style="color: #999999;">{word}</span>'

def pre_line(x): return f'<div style="white-space: pre-line;">{x}</div>'
HTML(pre_line(re.sub(r'\b(\w+)\u200c', format_generated, my_text + encoded_text)))
Some people have proposed fancy watermarks, but I think that's a bit too much. I think we should just add a simple line at the end of the text, like this: > This text was generated by a machine learning model. I think this is a good compromise between transparency and aesthetics. It's not too intrusive, and it's clear that the text was generated by a machine learning model. ## Why is this important? I think it's important to be transparent about the fact that some text has been generated by a machine learning model. This is because machine learning models are not perfect, and they can make mistakes. For example, a machine learning model might generate text that is offensive or inappropriate. If we don't indicate that the text was generated by a machine learning model, then people might think that the text was written by a human, and they might be offended or upset by it.