

The Corpus of Historical Low German (CHLG) is a project to create a treebank of the Middle Low German (MLG) and Old Low German (Old Saxon) languages.

The Old Saxon component was released in the form of the HeliPaD and follows the standards of the Penn Parsed Corpora of Historical English in both POS-tagging and parsing (cf. Walkden 2016). The HeliPaD is documented and downloadable here.

The Middle Low German component adds syntactic annotation following, but adapting (cf. Booth et al. 2020), the Penn standards to a strategic selection of texts from the DFG-funded project Referenzkorpus Mittelniederdeutsch / Niederrheinisch (1200-1650) (ReN), with which we have been allowed to co-operate on the parts of speech and morphological tagging of a number of texts. Due to this collaboration, the parts of speech in the Middle Low German component are tagged using the HiNTS tagset (Barteld et al. 2018).

A first version of the Middle Low German component, built with generous funding (nearly € 400,000) from the Hercules Foundation (2014-2015, grant number AUGE 13/02) and the Flemish Research Foundation (FWO) (2015-2020, grant number G0F2614N), was released in December 2020. This and any future versions are/will be documented and searchable via this website.




We would like to thank the following organizations and individuals for their support and advice:

  • The Referenzkorpus Mittelniederdeutsch/Niederrheinisch project team
    • Christian Fischer, Norbert Nagel and Robert Peters (Universität Münster)
    • Ingrid Schröder, Sarah Ihden, Katharina Dreessen, Fabian Barteld (Universität Hamburg)
  • University of Cambridge
  • Universiteit Gent
  • University of Manchester
  • Our former research assistants: Julia Kolkmann (Manchester), James Cormack, Jamie Douglas, Helen Etheridge, Elaine Oliver, Claire Richardson, Sophie Shephard, Sadie Smith (Cambridge)
  • David Willis (University of Cambridge), Luc de Grauwe (Universiteit Gent), Joel Wallenberg (University of Newcastle), Kersti Börjars (University of Manchester), Susan Pintzuk and Ann Taylor (University of York)
  • The various funding bodies (Sheila Watts: Newton Trust, British Academy Small Research Grant; Anne Breitbarth: FWO post-doctoral fellowship 2012-2015 and Hercules/FWO grant 2014-now; George Walkden: AHRC, University of Manchester; Hannah Booth: FWO post-doctoral fellowship 2021-2024)


Barteld, Fabian, Sarah Ihden, Katharina Dreessen & Ingrid Schröder. 2018. HiNTS: A Tagset for Middle Low German. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 3940–3945. Miyazaki, Japan: European Language Resources Association (ELRA).

Booth, Hannah, Anne Breitbarth, Aaron Ecay & Melissa Farasyn. 2020. A Penn-style treebank of Middle Low German. In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC 2020) , 766–775. Marseille, France: European Language Resources Association (ELRA).

Walkden, George. 2016. The HeliPaD: A parsed corpus of Old Saxon. International Journal of Corpus Linguistics 21(4), 559-571.