Introduction
In June 2017, I joined the LiterAIry project which PhD student Sarah Sterman and undergraduate student Vivian Liu have been working on for a few months. Our shared interest in literature pulled us together to ask the question: how can we visualize writing style and teach a machine about stylometry? 

Style is a core component of writing, shaping how audiences interpret and engage with literary works. It is the literary element that describes the ways that the author uses words – word choice, sentence structure, figurative language, and sentence arrangement all work together to establish mood, images, and meaning in the text.

In phase 1 of the project, we conducted crowdsourced studies in which we tried to understand how non-experts related to and conceptualized style, in order to ground our approach to developing style-based tools. We then used crowdsourced data to train a machine learning model to distinguish style. In phase 2 of the project, we designed and implemented a style visulization application and conducted various user studies. 

Phase 1
During the summer, we worked on the phase 1 of our project: initial probes into both expert and non-expert understandings of style, in order to ground our approach to developing style-based tools.  

We conducted crowdsourced studies in which workers were asked to select the pairs of excerpts with the most similar plot, the most similar mood, and the most similar style from a set of three. I helped to design the questions and workflow. 

After we get the results from Mechanical Turk, I used R to perform evaluation and statistical analysis on the data. We filtered out workers who chose the same option for all the questions, and analyzed correlation between workers' answers and their background information. 

Vivian had built a natural language feature analyzer with SpaCy and other NLP libraries. Working on top of that, I implemented and tested functions to extract textual features that are commonly used in authorship analysis literatures. Moreover, I worked on building the pipeline of feature generation.

For another week, I worked on improving our existing SVM classifier by implementing Random Forest with Python Scikit-Learn to compare accuracy and extracting most informative features from the results. 

PHASE 2
As we geared towards the DIS 2018 submission deadline, we started to work on the application part of the project. While Vivian and Sarah worked on the visulizations of textual features and user interfcae, I worked on data collection and database of our system. 

To Populate our collection of literature data, I used the Request library and BeautifulSoup to scrape and process the top 100 books from the Gutenberg Project. Moreover, I designed the schema and implemented the database for our system with sqlite3

This is how our visualization and tool looks like by the end of this phase : 
PHASE 2.5
Unfortunately we were rejected by DIS 2018. One of the main problems came from our visualizations. I then started designing and implementing visualizations that fit more closely to the features but also closely to human perceptions of the them with D3.js. For example, the variation of sentence length shouldn't just be visualized as a value, because that value wouldn't make much more sense to the human brain. 
Reflection
This was my first research experience in HCI and it inspired me to pursue this field. I realized my interest in utilizing the computing power to explore, analyze and understand art and human creativity. I learned to ask questions that I never would have asked and learned how to solve them. Now I look back at it, I felt immensely grateful and lucky that I was able to be part of it. 

Sarah was a great mentor who always encouraged me to keep going and trying out my crazy ideas. She valued my input and offered me guidance along the way. Vivian was the best co-worker that I could ever ask for, who always recommended me fantastic books. Every week, we found ourselves working on problems that few have ever tackled before, and learning new algorithms/techniques we did not know before. Moreover, everyone at Hybrid Ecologies was an amazing researcher, engineer, and most importantly, artist. I truly learned the spirit of research from them.
Back to Top