Manual

To browse through the corpus, and see what meaning representations are assigned to texts, go to the explorer. The explorer is a tool to browse through the parallel corpus. It shows one document (a sentence or a short text) at a time, in at least two languages (of which one is always English). If you want to see the translation, click on German, Dutch or Italian. To find more about the semantic analysis of a text, select one of the following five tabs:

  • raw
    the document in its raw format;
  • tokens
    showing the result of segmentation, the text being split into word and sentence tokens;
  • alignment
    sentence alignment for languages other than English;
  • sentences
    syntactic and semantic analysis for each sentence;
  • discourse
    the meaning representation for the entire text.

The document set has a unique identifier, consisting of a two-digit part (ranging from 00 to 99) and a four-digit document number (ranging from 0000 to 9999). There are three tabs with extra information about the analysed text:

  • metadata
    source of the document, language, terms of use;
  • bits of wisdom
    individual (manual) annotation that corrected machine output;
  • warnings
    all warnings produced by the semantic technology pipeline producing the initial analysis.

Documents are sorted on size. You can select a different document by clicking on the icons on the top (previous, random, or next). Documents are regularly reprocessed as soon as there are updates in the models or annotations. It is also possible to force reprocessing by clicking the circular icon on the top of the screen.

The sentences environment is where it all happens. This is the most exciting part, where the semantic analysis of the words (lexical semantics) and the sentences (compositional semantics) is shown. You can select the layers of analysis that you want to see for a sentence:

  • sem
    the semantic tag (part-of-speech tagging for semanticists);
  • sym
    the non-logical symbol (basically: lemmatisation and normalisation);
  • sns
    the WordNet synset of which the word is a member;
  • rol
    the VerbNet roles selected for a word with a functional category;
  • ref
    information about the antecedent of a referring expression;
  • cat
    the supertag, a.k.a. lexical category in combinatorial categorial grammar
  • drs
    the lexical semantics in the format of a discourse representation structure.
At this point you probably think we are particularly fond of three-letter acronyms. Well, that's true. We think they are great.