Lemmatisation

With the exception of two texts, oldenburg and soest, which currently lack lemmas, the Middle Low German component of the CHLG is lemmatised.

 

The annotation makes a distinction between the label ORTHO, which dominates the orthographic form of the word as it appears in the text, and the label LEMMA, which sits within the META information and dominates the lemma corresponding to the ORTHO form:

( (IP-MAT (NP-SBJ (PPER (META (CASE nom)
                              (GENDER masc)
                              (LEMMA hē)  ← lemma
                              (NUMBER sg)
                              (PERSON 3))
                        (ORTHO He)))      ← orthographic form
          (VVFIN (META (LEMMA hebben)  ← lemma
                       (MOOD ind)
                       (MORPHO-CLASS weak)
                       (NUMBER sg)
                       (PERSON 3)
                       (TENSE past))
                 (ORTHO hadde))        ← orthographic form
          (NP-OB1 (DIARTA (META (CASE akk)
                                (GENDER masc-neut)
                                (LEMMA ēn)     ← lemma
                                (NUMBER sg))
                          (ORTHO eyn))         ← orthographic form
                  (NA (META (CASE akk)
                            (GENDER masc-neut)
                            (LEMMA dēl)      ← lemma
                            (NUMBER sg))
                      (ORTHO deyl))          ← orthographic form
                  (NP-COM (NP-POS (DPOSA (META (CASE gen)
                                               (GENDER neut)
                                               (LEMMA sīn)   ← lemma
                                               (NUMBER sg))
                                         (ORTHO sines))      ← orthographic form
                                  (NA (META (CASE gen)
                                            (GENDER neut)
                                            (LEMMA hēre)  ← lemma
                                            (NUMBER sg))
                                      (ORTHO heren)))     ← orthographic form
                          (NA (META (CASE akk)
                                    (GENDER fem)
                                    (LEMMA Kraft) ← lemma
                                    (NUMBER sg))
                              (ORTHO kraf)))))    ← orthographic form
 )

 

Where lemmas are not present (e.g. for proper names), the symbol # is used as a placeholder:

( (FRAG (PP (APPR (META (CASE dat)
                        (LEMMA van))
                  (ORTHO Van))
            (NP (NE (NE (META (CASE dat)
                              (GENDER masc)
                              (LEMMA #)     ← absent lemma
                              (NUMBER sg))
                        (ORTHO flosse))
                    (KON (META (LEMMA unde))
                         (ORTHO vnde))
                    (NE (META (CASE dat)
                              (GENDER fem)
                              (LEMMA #)     ← absent lemma
                              (NUMBER sg))
                        (ORTHO blankflosse))))))
 )

 

The lemmas for oldenburg and soest will be added in a later version.