In connection with the Vistola project on correspondence, I, with help from students, have created a few corpora of letters or email messages, most annotated with lemmas and parts of speech in addition to the metadata associated with the letters.
A few other annotated, non-correspondence texts can also be found in the BAAL workshop materials.
Between Elizabeth Barrett and Robert Browning
- Letters semi-automatically annotated with To/From, dates, tokens, sentences, parts of speech, lemmas, named entities.
- Letters (in Italian, 2.5MB) automatically annotated with to, place, dates, sentences, tokens, lemmas, and parts of speech. Includes the python part of speech tagging script.
From Ambrose Bierce
- All letters (348K), semi-automatically annotated with To, Destination, Date, some text structure, tokens, sentences, parts of speech, lemmas.
Tema della settimana
- 101 email messages (plain text) from me to a group of friends, mainly in Italian. All messages, plus metadata (77.5K) Version 0.2, released 15 April 2014.