Making Any Wikipedia Topic Dystopian
I once read a Tweet that said, “you can make any Wikipedia article dystopian by changing it to the past tense.” Dystopian Wikipedia sounded fun fitting for our quasi-dystopian times, so I built Dystopedia as a proof-of-concept. Try the demo on Hugging Face Spaces, and keep reading to learn how I made it.
Building Dystopedia
Since the Tweet uses the Wikipedia page for water as an example, I got started by fetching the first seven sentences of that article using the wikipedia library. Text from the Wikipedia article is copyrighted Wikipedia and shared here under CC BY-SA 3.0.
import wikipedia
result = wikipedia.search("water", results=1)
summary = wikipedia.summary(
result[0],
sentences=7,
auto_suggest=False,
redirect=True
)
print(summary)
Water is an inorganic compound with the chemical formula H2O. It is a transparent, tasteless, odorless, and nearly colorless chemical substance, which is the main constituent of Earth's hydrosphere and the fluids of all known living organisms (in which it acts as a solvent). It is vital for all known forms of life, despite not providing food, energy or organic micronutrients. Its chemical formula, H2O, indicates that each of its molecules contains one oxygen and two hydrogen atoms, connected by covalent bonds. The hydrogen atoms are attached to the oxygen atom at an angle of 104.45°. "Water" is also the name of the liquid state of H2O at standard temperature and pressure.
Because Earth's environment is relatively close to water's triple point, a number of natural states of water exist on earth. It forms precipitation in the form of rain and aerosols in the form of fog.
Next, I parsed and tagged the summary using spaCy and converted present tense verbs to the past tense using LemmInflect. According to Wikipedia's style guide, articles are written in the present tense by default.
import lemminflect
import spacy
nlp = spacy.load("en_core_web_lg")
def make_past_tense(token):
if token.tag_ in ("VBP", "VBZ"):
return f'{token._.inflect("VBD")} '
return token.text_with_ws
doc = nlp(summary)
dystopian_summary = "".join(
[make_past_tense(token) for token in doc]
)
print(dystopian_summary)
Water was an inorganic compound with the chemical formula H2O. It was a transparent, tasteless, odorless, and nearly colorless chemical substance, which was the main constituent of Earth's hydrosphere and the fluids of all known living organisms (in which it acted as a solvent). It was vital for all known forms of life, despite not providing food, energy or organic micronutrients. Its chemical formula, H2O, indicated that each of its molecules contained one oxygen and two hydrogen atoms, connected by covalent bonds. The hydrogen atoms was attached to the oxygen atom at an angle of 104.45°. "Water" was also the name of the liquid state of H2O at standard temperature and pressure.
Because Earth's environment was relatively close to water's triple point, a number of natural states of water existed on earth. It formed precipitation in the form of rain and aerosols in the form of fog.
So far, so good! But what if I only changed the articles for topics with positive connotations instead of making all articles past tense? What if nothing "good" existed anymore, but anything "bad" or "neutral" were left unaltered?
Using a transformers pipeline, I created a function to determine whether a given text was positive. Its underlying model is DistilBERT base uncased finetuned SST-2, the default sentiment analysis model transformers
used when I built Dystopedia.
from transformers import pipeline
sentiment_analyzer = pipeline(
"sentiment-analysis",
model="distilbert-base-uncased-finetuned-sst-2-english",
revision="af0f99b"
)
def is_positive(text):
return sentiment_analyzer(text)[0]["label"] == "POSITIVE"
for word in ("good", "bad"):
if is_positive(word):
print(f'"{word}" was labelled positive')
else:
print(f'"{word}" was not labelled positive')
"good" was labelled positive
"bad" was not labelled positive
To my surprise, the sentiment analyzer didn't classify the summary for water as positive.
print(is_positive(summary))
Hmm, why was this the case? What if we analyzed the summary sentence by sentence?
Well, the first sentence was "positive."
sentences = list(doc.sents)
print(sentences[0])
print(is_positive(sentences[0].text))
Water is an inorganic compound with the chemical formula H2O.
True
But the second wasn't.
print(sentences[1])
print(is_positive(sentences[1].text))
It is a transparent, tasteless, odorless, and nearly colorless chemical substance, which is the main constituent of Earth's hydrosphere and the fluids of all known living organisms (in which it acts as a solvent).
False
What about the term "water" on its own?
"Positive."
print(is_positive("water"))
True
From this, and for my purposes, it seemed more reliable to determine whether a topic was positive using a term rather than a summary.
def make_dystopian(term, text):
doc = nlp(text)
if is_positive(term):
return "".join([make_past_tense(token) for token in doc])
return doc.text
for word in ("good", "bad"):
print(make_dystopian(word, f"{word.capitalize()} things exist."))
Good things existed .
Bad things exist.
Now that I'd worked out a general approach, I could build a quick demo using Gradio as a web interface.
I created a function that searches for a term on Wikipedia and yields the first sentence of the article retrieved. It then makes the sentence dystopian or provides a helpful error message if something goes wrong.
import gradio as gr
def get_dystopian_summary(term):
if term == "":
return term
try:
results = wikipedia.search(term, results=1)
except wikipedia.exceptions.DisambiguationError as e:
raise gr.Error(e.error)
if len(results) == 0:
raise gr.Error(
f'Could not find an article on the term "{term}". '
'Try searching for a different topic.'
)
summary = wikipedia.summary(
results[0],
sentences=1,
auto_suggest=False,
redirect=True
)
return make_dystopian(term, summary)
print(get_dystopian_summary("water"))
Water was an inorganic compound with the chemical formula H2O. It was a transparent, tasteless, odorless, and nearly colorless chemical substance, which was the main constituent of Earth's hydrosphere and the fluids of all known living organisms (in which it acted as a solvent).
Finally, I wrapped the function in a Gradio interface to create a demo.
def launch_demo(**kwargs):
title = "Dystopedia"
description = (
"Make any Wikipedia topic dystopian. Inspired by "
"[this Tweet](https://twitter.com/lbcyber/status/1115015586243862528). "
"Dystopedia uses [DistilBERT base uncased finetuned SST-2](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) "
"for sentiment analysis and is subject to its limitations and biases."
)
examples = ["joy", "hope", "peace", "Earth", "water", "food"]
gr.Interface(
fn=get_summary,
inputs=gr.Textbox(label="term", placeholder="Enter a term...", max_lines=1),
outputs=gr.Textbox(label="description"),
title=title,
description=description,
examples=examples,
cache_examples=True,
allow_flagging="never",
).launch(**kwargs)
# If running remotely on Binder, set `share=True` in `launch_demo()`
launch_demo(show_error=True)
What’s Next?
Dystopedia uses an incredibly unsophisticated algorithm to transform a text from present to past tense: For each token in a text, if a token is a present tense verb, convert it to past tense. Otherwise, leave the token unchanged. Unfortunately, since this approach doesn't consider auxiliary verbs, it wouldn't work for articles about upcoming events or sentences with complex verb phrases.
Dystopedia was an afternoon project; I don't plan to keep building it. However, I plan on learning more about controllable text generation and sophisticated techniques for morphological inflection.
Resources
- Try Dystopedia on Hugging Face Spaces
- Dystopedia’s source code
- Companion notebook for this blog post
- spaCy 101: Everything you need to know
- Transformers Pipelines API reference
- Gradio Quickstart