Software

Here is an overview of the natural language processing software currently used for the automatic annotation of the Parallel Meaning Bank:

  • elephant is the statistical tool for word and sentence segmentation used in the PMB;

  • We use morpha for the morphological analysis of English;

  • We use an in-house developed semantic tagger based on deep learning;

  • The EasyCCG parser is employed for syntactic parsing;

  • Further included in the pipeline is Boxer which produces semantic representations (DRS) on top of the CCG parse trees.

  • For calculating semantic similarity between semantic representations of sentences in different languages, we use D-match.