Software
Here is an overview of the natural language processing software currently used for the automatic annotation of the Parallel Meaning Bank:
-
One of the first statistical tokenizers, elephant, is the tool we use for segmenting words and sentences in the PMB;
-
We use morpha for the morphological analysis of English;
-
For semantic tagging, we use the TNT Tagger;
-
The EasyCCG parser is employed for syntactic parsing;
-
Further included in the pipeline is Boxer which produces semantic representations (DRS) on top of the CCG parse trees;
-
For calculating semantic similarity between semantic representations of sentences in different languages, we use Counter;
-
To speed up our pipeline of continuously processing documents, we use Viasock;
- We run our pipeline by using Produce;
- Some models for neural DRS parsing are available here.