Treat leading period-abbreviations as titles (#109)#196
Merged
Conversation
…109) Add period_abbreviation regex and is_leading_title() helper. In the no-comma parse path, an unrecognized multi-letter token ending in a period before the first name is set (e.g. "Major.") is now parsed as title instead of first. Update test_suffix_in_parenthesis_with_period, which documented the old behavior as a known limitation, to match.
…path (#109) Complete the leading-title wiring across all three parse_full_name() paths. Update test_brute_force test16-18, which documented the old behavior for "Doe, John. A. Kenneth..." (unrecognized "John." parsed as first name); it's now correctly recognized as a leading title, same as the no-comma and suffix-comma paths.
…ocs (#109) Add tests for the leading-period-abbreviation feature's remaining untested boundaries per PR #196 review: digit/apostrophe exclusion, case-insensitivity, and interaction with parenthetical nicknames. Also tighten doc wording that implied only the literal first token in a name could become a title, when the rule applies to the whole leading title run (chained abbreviations included).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
"Major."), appearing before the first name is set, is now parsed astitleinstead offirst, across all three parse formats (no-comma, suffix-comma, lastname-comma).period_abbreviationregex (^[^\W\d_]{2,}\.$) and anis_leading_title()helper that composes it with the existingis_title()check, without mutatingC.titlesor any other Constants collection — so the periodless form (e.g."Major") is never affected in later parses."J.") and internal-period abbreviations ("E.T.") are unaffected; a period-word after the first name is still parsed as a middle name.test_suffix_in_parenthesis_with_period,test_brute_force.test16-18) had their expectations updated because they exercised the same leading-title code path with the old (now superseded) behavior — seedocs/release_log.rstfor the versioning note.Closes #109.
Test plan
uv run pytest— 1070 passed, 22 xfaileduv run mypy nameparser/— cleanuv run ruff check nameparser/ tests/— cleantests/test_titles.pycover the happy path across all three parse formats, chained leading abbreviations, exclusions (single-letter initial, internal-period), post-first-name placement, and interaction with known titles/middle initials