Here is a quick summary of what the script does:
- Takes a text file containing a collection of dreams and quantifies their contents. This gives the frequency of words related to categories like cognitive processes, positive/negative emotions, social interactions, sensory experiences etc.
- Compares these scores to those derived from dreams of the general population. This makes the scores meaningful by showing which aspects of an individual's dream life are statistically different from the baselines found by researchers.
- Graphically plots changes in the content of dreams over time. This can be used to explore the temporal development of specific themes (e.g. changes in emotional valence over several months/years)
- Plots a network of names (people, places), where nodes represent names and links represent the occurrence of these entities together in the same dream.
Below
I describe each of these functionalities and present examples of insights
from an individual's dream diary. If you want to try it for yourself, the
script can be found on GitHub, along with the example diary. This is my first time sharing code, all
feedback welcome. The script can also be tweaked to work with a normal diary or expressive writing, where baselines from the general population can be added for
comparison. I suggest having a corpus of at least 100
diary entries to work with.
Dream Content
To analyse the content of dreams, the script uses the Linguistic
Inventory and Word Count (LIWC) system. This was developed by psychologist James Pennebaker and colleagues to automatically analyse text samples and calculate the
frequencies of words used in different domains. The intention was to identify sets of
words that reflect basic emotional and cognitive dimensions. The latest version of LIWC (2015) has over 70 categories including emotions,
cognition, personal concerns, work and leisure activities, as well as grammar
and vocabulary dimensions like the use of pronouns and verbs. Each category is
triggered when the input text features a relevant word. For instance, the words
“dish”, “eat” or “pizza” will contribute to the score for the “ingestion”
category, and the words “worried” or “fearful” will count towards the “anxiety”
category. Some words belong to more than one category. For example, the word
“cried” will increment the scores for categories such as “sadness”, “negative
emotion” and “past focus”. These scores are expressed as the percentage of
words in the text sample that are related to each category.
Comparison to Baselines
Researchers have applied the LIWC system
of analysis to study different kinds of text
samples (e.g. blogs, novels, expressive writing). Most relevant here are the recent
results of psychologists Kelly Bulkeley and Mark Graves (2018), who used
LIWC to study the linguistic properties of dreams. Using over five thousand
dream reports from a diverse collection of people, the authors identified
baseline rates for the usage of each LIWC category. These baselines enable us to
compare the linguistic contents of our personal dreams with those of a “normal”
person, to highlight distinguishing aspects that may be unique to us as individuals.
In order to quantify the difference between a person's scores
and population baselines, the script uses measurements known as Z-scores.
These scores take into account the variability of scores in the population to estimate the degree to which an individual's value is different from the population
average. If a Z-score exceeds an absolute threshold of 1.96, this would give 95%
certainty that the value is significantly different.
Below is an example output produced using a publicly available diary containing 315 dreams from a woman named Merri. I scraped her diary from the DreamBank repository managed by Adam Schneider & William Domhoff.
The plot displays a selection of only the top 10 LIWC categories that show the greatest difference between the individual's dream diary and the population baselines. The height of the bars corresponds to the degree to which the category scores are different from the population average (either higher or lower). The red dashed line indicates the 95% confidence interval, a threshold that would be exceeded when the difference is statistically significant.
From the above plot we can see that Merri's use of words generally lies within the expected thresholds. Still, some interesting differences can be observed. For example, she uses fewer function words, more 2nd person pronouns ("you") and fewer 1st person singulars ("i") compared to the general population. For more insight into what this means, check out this great TED talk on the psychology of pronouns.
Below is an example output produced using a publicly available diary containing 315 dreams from a woman named Merri. I scraped her diary from the DreamBank repository managed by Adam Schneider & William Domhoff.
The plot displays a selection of only the top 10 LIWC categories that show the greatest difference between the individual's dream diary and the population baselines. The height of the bars corresponds to the degree to which the category scores are different from the population average (either higher or lower). The red dashed line indicates the 95% confidence interval, a threshold that would be exceeded when the difference is statistically significant.
From the above plot we can see that Merri's use of words generally lies within the expected thresholds. Still, some interesting differences can be observed. For example, she uses fewer function words, more 2nd person pronouns ("you") and fewer 1st person singulars ("i") compared to the general population. For more insight into what this means, check out this great TED talk on the psychology of pronouns.
Temporal Analysis
To examine changes in dream content over time, the script splits the diary into quarterly chunks (3 months each) and calculates the LIWC scores for every period. It then identifies and plots the categories that show the greatest change over time.
Merri's dream diary shows a general decline in 2nd person pronouns ("you"), a decline in conjunctions ("conj") and a gradual increase in filler words over time.
If there is a specific category that interests you that isn't captured in the automated plot, it can be plotted separately like so:
Extraction of Names
Dream
diaries often include references to people or places by their names. It is possible to use named entity recognition functions from the nltk library in Python to automatically extract all the names mentioned in a diary. I used these to make some network visualisations. 
In the image below, names are represented by nodes and the links show which names have appeared together in a dream. This can give a sense of whether certain people or places play a central role or have a tendency to cluster together. The script produces a very basic network plot like this:
In the image below, names are represented by nodes and the links show which names have appeared together in a dream. This can give a sense of whether certain people or places play a central role or have a tendency to cluster together. The script produces a very basic network plot like this:
This particular plot shows only names that occur in at least 3 dreams and co-occur with their neighbours beyond a certain level (these thresholds can be adjusted). We see hubs formed around Merri's two siblings, Dora and Rudy, whose names frequently get mentioned alongside other characters and place names.
There is a lot that can be improved in the visualisations and analyses. Nonetheless, I hope this post gives a sense of some of the dream insights that can be automated with Python.
I'd love to know if this project will be useful or interesting to some of you. If you have any feedback or suggestions please let me know!
GitHub repository: https://github.com/mpriestley/dream-analysis
References:
Bulkeley, K., & Graves, M. (2018). Using the LIWC program to study dreams. Dreaming, 28(1), 43.
Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin, TX: University of Texas at Austin.
Sean Rife's PsyLex functions for reading a LIWC dictionary in Python: https://github.com/seanrife/psyLex
DreamBank repository of dream reports: http://www.dreambank.net/
There is a lot that can be improved in the visualisations and analyses. Nonetheless, I hope this post gives a sense of some of the dream insights that can be automated with Python.
I'd love to know if this project will be useful or interesting to some of you. If you have any feedback or suggestions please let me know!
GitHub repository: https://github.com/mpriestley/dream-analysis
References:
Bulkeley, K., & Graves, M. (2018). Using the LIWC program to study dreams. Dreaming, 28(1), 43.
Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin, TX: University of Texas at Austin.
Sean Rife's PsyLex functions for reading a LIWC dictionary in Python: https://github.com/seanrife/psyLex
DreamBank repository of dream reports: http://www.dreambank.net/




