Part of speech tagging

Since the Middle Low German component of the CHLG is a collaboration with the Referenzkorpus Mittelniederdeutsch / Niederrheinisch (1200-1650) (ReN), the part of speech tagging uses the HiNTs tagset (Barteld et al. 2018).

Compared to the standard and rather broad tagset intended in the original Penn annotation scheme, the HiNTs tagset involves a much more fine-grained set of distinctions. The full documentation of how the HiNTS tagset was employed for the ReN corpus can be found here (Annotationshandbuch Part 1, in German).

 

The HINTS tags employs recurring labels which encode certain values for the features of lexical identity, articleness and position/headedness, as follows:

Feature Value Label within the tag
lexical identity definite D
indefinite I
negative NEG
possessive POS
relative REL
wh W
articleness article-like ART
not article-like
position/headedness pre-head A
post-head N
head S
head of nominal predicate D

 

These labels are combined to produce individual POS-tags, for example:

  • DDARTA: a determiner which is definite, article-like and precedes its head noun, e.g. dat landthat land’
  • DIARTA: a determiner which is indefinite, article-like and precedes its head noun, e.g. eyn kind ‘a child’
  • ADJA: an adjective which precedes its head noun, e.g. grot wundergreat wonder’
  • ADJN: an adjective which follows its head noun, e.g. juncvrowe fin ‘maiden fine
  • ADJS: an adjective which is a head, e.g. Se was nicht de beste ‘She was not the best
  • ADJD: an adjective which is the head of a nominal predicate, e.g. De neppe weren grot unde runt ‘the bowls were big and round

 

Here, we provide an overview of how some of the key HiNTs tags correspond to the more standard Penn tagset:

Penn tag HiNTs tag Comment
ADJ ADJA attributive adjective pre-head
ADJN attributive adjective post-head
ADJS substantival adjective as head
ADJD adjective as head of nominal predicate
ADV AVD adverb
ADJV adverbial adjective
AVKO conjunctional adjective
CONJ KON coordinating conjunction
D DDARTA definite article pre-head
DDARTN definite article post-head
DIARTA indefinite article pre-head
DIARTN indefinite article post-head
DDA demonstrative article pre-head
DDN demonstrative article post-head
DPDS demonstrative pronoun
FW FM foreign material
INTJ ITJ interjection
N NA common noun
NPR NE proper noun
NEG PTKNEG negation particle
NUM CARDA attributive numeral pre-head
CARDN attributive numeral post-head
CARDS substantival numeral as head
CARDD numeral as head of nominal predicate
P APPR preposition
APPO postposition
PRO PPER personal pronoun
PRO-RFL PRF reflexive personal pronoun
PRO$ DPOSA possessive adjective pre-head
DPOSN possessive adjective post-head
DPPOSS substantival possessive as head
DPOSD possessive as head of nominal predicate
PTCL PTKA particle with adjective or adverb
PTKN particle with nominals
RP PTKVZ verbal particle
TO PTKZU infinitival particle

 

Verbs are treated along two different dimensions in the HiNTs tagset:

  • There is a three-way distinction between lexical verbs (VV), modal verbs (VM) and HAVE, BE, BECOME and DO as auxiliaries (VA)
  • There is a four-way distinction finite verbs (FIN), infinites (INF), imperatives (IMP), present particles (PS) and past/passive participles (PP)

These are combined as follows:

Broad category Tag Comment
lexical verbs VVFIN finite
VVINF infinitive
VVIMP imperative
VVPS present participle
VVPP past/passive participle
modal verbs VMFIN finite
VMINF infinitive
VMIMP imperative
VMPS present participle
VMPS past/passive participle
auxiliaries (HAVE/BE/BECOME/DO) VAFIN finite
VAINF infinitive
VAIMP imperative
VAPS present participle
VAPP past/passive participle