Skip to content

Add German/Dutch prefixes and German title/degree suffixes#191

Merged
derek73 merged 2 commits into
masterfrom
add-german-dutch-prefixes-titles
Jul 1, 2026
Merged

Add German/Dutch prefixes and German title/degree suffixes#191
derek73 merged 2 commits into
masterfrom
add-german-dutch-prefixes-titles

Conversation

@derek73

@derek73 derek73 commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #18 (Thomas Bachem's 58-test gist of German/Dutch names and international degrees, open for 11 years).

  • Adds prefixes: aan, aen, auf, dem, freiherr, freiherrin, heer, het, op, te, tho, thoe, vande, vd.
  • Adds titles/suffixes: Dipl.-Ing., FH-Prof., Gräfin, Me., PD, Priv.-Doz., RA, Univ.Prof., WP, ba, bsc, meng, stb, MdB/MdL/MdEP/MdA/MdHB/MdBB.
  • Fixes join_on_conjunctions() to register a conjunction-merged piece (e.g. von + und + zu) as a prefix too, mirroring the existing title-handling, so multi-word prefix chains like German "von und zu" bridge correctly into the last name instead of getting stranded in the middle name.

This takes the gist's suite from 21/68 passing to 46/68.

Deliberately not included, with reasoning verified by test:

  • to, in, an, then, ten as global prefixes — these are common Korean/Vietnamese given-name syllables in the middle-token position (e.g. Park In Hwan), and adding them regresses a currently-correct parse for those names, not just an ambiguous case.
  • bare v as a prefix (for German "v. Kloppenheim") — collides with ordinary Western middle initials (John V. Smith breaks).
  • Several remaining gist failures assert suffix for what this library correctly parses as a leading title (e.g. Mag., RA, Dipl.-Ing.) — consistent with existing conventions for Dr./MD/PhD, so not changed.
  • Multi-token suffix/title continuations like Dr. rer. nat., LL. M., M. Sc. need new joining logic beyond config additions — out of scope here.

Test plan

  • python -m pytest tests/ — 990 passed, 4 skipped, 22 xfailed (no regressions)
  • Added 3 new regression tests in tests/test_conjunctions.py covering the join_on_conjunctions prefix-bridging fix
  • Manually verified no collision for entries shared with existing constants (vd already a suffix acronym for a different meaning, ra similarly, freiherr already a leading title)

🤖 Generated with Claude Code

derek73 and others added 2 commits July 1, 2026 02:44
Closes #18. Adds prefixes (aan, aen, auf, dem, freiherr, freiherrin,
heer, het, op, te, tho, thoe, vande, vd) and titles/suffixes (Dipl.-Ing.,
FH-Prof., Gräfin, Me., PD, Priv.-Doz., RA, Univ.Prof., WP, ba, bsc, meng,
stb, MdB/MdL/MdEP/MdA/MdHB/MdBB) that don't collide with existing
English-language parsing.

Also fixes join_on_conjunctions() to register a conjunction-merged piece
(e.g. "von" + "und" + "zu") as a prefix too, mirroring the existing
title-handling, so multi-word prefix chains like German "von und zu"
bridge correctly into the last name instead of getting stranded in the
middle name.

Deliberately left out short, high-frequency English words (to, in, an,
then, ten) that collide with common Korean/Vietnamese given-name
syllables in the middle-token position, and bare "v" as a prefix, which
collides with ordinary Western middle initials.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Covers two gaps flagged in review: a merged piece that's registered as
both a title and a prefix ("freiherr"), and a chain with more than one
non-contiguous conjunction bridging prefixes into the last name.
@derek73 derek73 added this to the v1.3.0 milestone Jul 1, 2026
@derek73 derek73 self-assigned this Jul 1, 2026
@derek73 derek73 merged commit 8c75919 into master Jul 1, 2026
8 checks passed
@derek73 derek73 deleted the add-german-dutch-prefixes-titles branch July 1, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Additional unit tests mostly for German & Dutch names + more degrees

1 participant