Here is an overview of the natural language processing software currently used for the automatic annotation of the Parallel Meaning Bank:
elephant is the statistical tool for word and sentence
segmentation used in the PMB;
We use morpha for the morphological analysis of English;
For semantic tagging, we use the TNT Tagger.
The EasyCCG parser is employed for syntactic parsing;
- Further included in the pipeline is Boxer which produces semantic representations (DRS) on top of the CCG parse trees.
- For calculating semantic similarity between semantic representations of sentences in different languages, we use D-match.
- To speed up our pipeline of continuously processing documents, we use Viasock.