Corpora Resources


In connection with the Vistola project on correspondence, I, with help from students, have created a few corpora of letters or email messages, most annotated with lemmas and parts of speech in addition to the metadata associated with the letters. 

A few other annotated, non-correspondence texts can also be found in the BAAL workshop materials

Between Elizabeth Barrett and Robert Browning

  • Letters semi-automatically annotated with To/From, dates, tokens, sentences, parts of speech, lemmas, named entities. 

From Michelangelo

  • Letters (in Italian, 2.5MB) automatically annotated with to, place, dates, sentences, tokens, lemmas, and parts of speech. Includes the python part of speech tagging script.

From Ambrose Bierce

  • All letters (348K), semi-automatically annotated with To, Destination, Date, some text structure, tokens, sentences, parts of speech, lemmas.

Tema della settimana 

  • 101 email messages (plain text) from me to a group of friends, mainly in Italian. All messages, plus metadata (77.5K) Version 0.2, released 15 April 2014.
© Chris Culy 2010-2016