Workshop on text mining, web scraping, and machine learning for humanities research

Workshop on text mining, web scraping, and machine learning for humanities research

Benoit Berthelier, a literary scholar who uses text mining techniques to analyze North Korean documents, will be hosting a workshop on some of his methods. Join us in learning about text mining, web scraping, and machine learning for humanities research. More info below.

RSVP here.

Fri, March 9, 2018
2:00 PM – 4:00 PM PST
Seuss Room

Workshop: Python for Digital Humanities with Benoit Berthelier


This workshop will give an overview of how Python can be used to supplement research in the humanities. Python is a free, open source, cross-platform and beginner-friendly programming language widely used for computational research in academia and industry. The workshop will present different ways in which digital humanists can develop their own tools and integrate computational methods in their scholarly work using Python. It will feature several different practical use cases taken from the speaker’s past and current research projects such as using computer vision to annotate and analyze propaganda posters, developing natural language processing algorithms for multi-lingual discourse analysis and automating the cataloging of data for a digital archive.

The goal of the workshop is to demonstrate how Python can be used:

  • As a “quick and dirty” tool to automate or speed up a number of basic research tasks.
  • As a way to develop, explore and test research hypotheses
  • As a back-end tool for online multimedia digital humanities projects

Some of the techniques the workshop will cover:

  • Data collection (crawling, scraping and algorithmic data cleansing)
  • Data mining using natural language processing and/or computer vision
  • Machine learning for digital humanities

(Please download and install Python 3.6, or preferably Anaconda (Python + data science packages) from before the workshop. The workshop requires participants to bring a laptop.) 

Benoit Berthelier is a post-doctoral scholar at the University of California, San Diego. His research focuses on the literature, culture and societies of the Koreas. His current project looks at the politics of technology and data on the Korean peninsula.

Leave a Reply

Your email address will not be published. Required fields are marked *