MJPN-Cluster v1.0:
A Two-Rule Procedure for Family-Level Grouping
of Moroccan Jewish Phonetic Roots
DHSS Hub, Open University of Israel · Yahasra.org
§Abstract
Moroccan Jewish surnames present a distinctive challenge for genealogical search: a single family is routinely written across dozens of orthographic, phonetic, and morphological variants spanning Latin, Hebrew, French, and Judeo-Arabic spelling traditions (אוקנין, OUAKNINE, AKNIN, OKNIN). The MJPN consonant-skeleton normalizer (v3.4.5) collapses much of this variation into a stable phonetic root, but residual family-level fragmentation remains: vowel-initial variants such as AKNN, IKNN, OKNN, and KNN survive normalization, and patronymic compounds such as ABTBLBNRSH (ABITBOL-BENARROCH) are split from their head families.
We present MJPN-Cluster v1.0, a deterministic two-rule procedure that operates on the MJPN root set R and produces a partition into family-level equivalence classes through union-find closure of (P) patronymic prefix peeling and (C) compound surname suffix matching. The procedure has no tunable parameters beyond a 19-element prefix list and a single suffix-length floor, requires no LLM at runtime, and runs in milliseconds on a corpus of several thousand roots. The method was originally derived by structural decomposition of an offline LLM oracle clustering on the Yahasra cemetery corpus.
§Procedure overview
The algorithm has four steps. The input is a finite set R of MJPN consonant-skeleton roots. Rule P scans each root for a peelable patronymic or definite-article prefix from a fixed list and unifies the prefixed form with its bare anchor whenever the anchor is itself attested in R. Rule C scans pairs of distinct roots for shared suffixes that are themselves attested anchors, with a minimum length floor of k = 3 on both the suffix and each prefix. After both rules apply exhaustively, union-find transitive closure yields the final partition; the canonical representative of each class is the lexicographically smallest root.
§Reference implementation
The full reference implementation is 80 lines of Python, requires only
the standard library, and is reproduced below. The full paper, including
worked examples for the OUAKNINE, HARROCH/BENARROCH, and BITTON families,
a discussion of limitations, and the full Beider-grounded
PREFIX_SET,
is available below.
PREFIX_SET = frozenset({
"BN", "BEN", "BAR", "BAT", "BU", "BOU",
"EL", "AL", "A", "O", "I", "U", "E",
"H", "HA", "L", "LB", "M", "B",
})
def mjpn_cluster(roots, prefix_set=PREFIX_SET, k=3):
R = set(roots)
parent = {r: r for r in R}
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
def union(a, b):
ra, rb = find(a), find(b)
if ra != rb:
if ra < rb: parent[rb] = ra
else: parent[ra] = rb
sorted_prefixes = sorted(prefix_set, key=lambda p: -len(p))
for r in R: # Rule P
for p in sorted_prefixes:
if r.startswith(p) and len(r) > len(p):
rest = r[len(p):]
if len(rest) >= k and rest in R:
union(r, rest); break
for r in R: # Rule C
for n in range(len(r) - k, k - 1, -1):
s = r[-n:]
if s in R and s != r and len(r[:-n]) >= k:
union(r, s); break
return {r: find(r) for r in R}
Ouaknine, Y. S. (in publication). MJPN-Cluster v1.0: A Two-Rule Procedure for Family-Level Grouping of Moroccan Jewish Phonetic Roots. DHSS Hub, Open University of Israel; Yahasra.org. Retrieved from yahasra.org/papers/mjpn-cluster
§Live deployment
The procedure is live in production on yahasra.org as of April 2026. A search for Cohen now also retrieves Hacohen burials; a search for Ouaknine reaches the Aknin, Iknin, Oknin, and Awknin spelling variants. Each result panel includes a brief etymology explaining the family link.