Skip to main content

Free Resources

I have created a variety of freely available visualization tools. Included here is a tool, SLASH/A, developed by two of my former students. In addition to the visualization tools are a few corpora, mostly with simple inline annotations of lemma and part of speech.

Visualization tools

DoubleTree KWIC visualization DoubleTreeJS is a compact, interactive view of keyword in context (KWIC) and concordance information. [The original DoubleTree, in Java, is available at EURAC, or from this local copy.]
KWICis concordance visualization KWICis is a modern concordance (keyword in context = KWIC) visualization that is interactive and designed for structured data.
Slasha_v2 Slash/A is an ngram viewer for corpora with dated documents. It’s a tool created by two former students of mine: Slava Todorova and Maria Chinkina.
ProD tree visualization ProD is an experimental visualization for tree(-like) structures (e.g. constitutent structures, dependency structures, etc.)
Extended Linguistic Dependency Diagrams visualization Extended Linguistic Dependency Diagrams (xLDDs) is a visualization tool specialized for the graphical presentation of linguistic dependency structures and the dynamic interaction with these visualizations. Download local copy
Structured Parallel Coordinates visualization Structured Parallel Coordinates is an interactive visualization for corpus query results and ranked data. It is a specialized version of Parallel Coordinates. Download local copy

Corpora

Letters

All the letters corpora are annotated with information about the letters (author and/or addressee, date, etc.), as well as with token, lemma, and part of speech information (all automatically generated). The Barrett-Browning letters also have (some) named entities annotated, while the Bierce and Michelangelo letters have some additional structural annotations (e.g. salutation and closing, paragraphs, etc.). All corpora come in an XML version. The Barrett-Browning letters are in Text-Corpus Format (TCF), while the Bierce and Michelangelo letters are custom formats (DTDs provided). In addition, the Barrett-Browning and Michelangelo letters also come in other formats: individual letters as XML and all the letters as a tab delimited "vertical file". The letters are freely available under a Creative Commons License.

  • Letters between Elizabeth Barrett and Robert Browning (1845-1846)
    Letters semi-automatically annotated with To/From, dates, tokens, sentences, parts of speech, lemmas, named entities. These letters were prepared with help from students at the University of Tübingen, especially Eyal Schejter.
    Download page
  • Letters from Ambrose Bierce to a variety of people (1892-1913)
    All letters (348K), semi-automatically annotated with To, Destination, Date, some text structure, tokens, sentences, parts of speech, lemmas.
    Download
  • Letters from Michelangelo Buonarotti to a variety of people (1497-1524)
    Letters (in Italian, 2.5MB) automatically annotated with to, place, dates, sentences, tokens, lemmas, and parts of speech. Includes the python part of speech tagging script.
    Download

Email

Tema della settimana
101 email messages (77.5K, plain text) from me to a group of friends, mainly in non-native Italian.
Download messages, plus metadata

Journals

Journal of a Trip to California by the Overland Route Across the Plains, by E. S. Ingalls.
[coming soon]