Software

Here is an overview of the natural language processing software currently used for the automatic annotation of the Parallel Meaning Bank:

  • One of the first statistical tokenizers, elephant, is the tool we use for segmenting words and sentences in the PMB;

  • We use morpha for the morphological analysis of English;

  • For semantic tagging, we use the TNT Tagger;

  • The EasyCCG parser is employed for syntactic parsing;

  • Further included in the pipeline is Boxer which produces semantic representations (DRS) on top of the CCG parse trees;

  • For calculating semantic similarity between semantic representations of sentences in different languages, we use Counter;

  • To speed up our pipeline of continuously processing documents, we use Viasock;

  • We run our pipeline by using Produce;

  • Annotations produced by this GWAP are added to the PMB;

  • Neural models for DRS parsing are available in this Github repo.