Yahasra Working Papers · Onomastics & Computation ← yahasra.org

Working paper in publication v1.0 · April 2026

MJPN-Cluster v1.0:
A Two-Rule Procedure for Family-Level Grouping
of Moroccan Jewish Phonetic Roots

Yohanan S. Ouaknine

DHSS Hub, Open University of Israel · Yahasra.org


§Abstract

Moroccan Jewish surnames present a distinctive challenge for genealogical search: a single family is routinely written across dozens of orthographic, phonetic, and morphological variants spanning Latin, Hebrew, French, and Judeo-Arabic spelling traditions (אוקנין, OUAKNINE, AKNIN, OKNIN). The MJPN consonant-skeleton normalizer (v3.4.5) collapses much of this variation into a stable phonetic root, but residual family-level fragmentation remains: vowel-initial variants such as AKNN, IKNN, OKNN, and KNN survive normalization, and patronymic compounds such as ABTBLBNRSH (ABITBOL-BENARROCH) are split from their head families.

We present MJPN-Cluster v1.0, a deterministic two-rule procedure that operates on the MJPN root set R and produces a partition into family-level equivalence classes through union-find closure of (P) patronymic prefix peeling and (C) compound surname suffix matching. The procedure has no tunable parameters beyond a 19-element prefix list and a single suffix-length floor, requires no LLM at runtime, and runs in milliseconds on a corpus of several thousand roots. The method was originally derived by structural decomposition of an offline LLM oracle clustering on the Yahasra cemetery corpus.


§Procedure overview

The algorithm has four steps. The input is a finite set R of MJPN consonant-skeleton roots. Rule P scans each root for a peelable patronymic or definite-article prefix from a fixed list and unifies the prefixed form with its bare anchor whenever the anchor is itself attested in R. Rule C scans pairs of distinct roots for shared suffixes that are themselves attested anchors, with a minimum length floor of k = 3 on both the suffix and each prefix. After both rules apply exhaustively, union-find transitive closure yields the final partition; the canonical representative of each class is the lexicographically smallest root.

MJPN-Cluster v1.0 procedure: input root set, Rule P patronymic peel, Rule C compound suffix, union-find closure, output partition.
Figure 1 MJPN-Cluster v1.0 procedure. Input is a root set R produced by MJPN v3.4.5; Rules P and C are applied independently and produce union-find merges; the closure step yields a final partition of R into family-level equivalence classes.

§Reference implementation

The full reference implementation is 80 lines of Python, requires only the standard library, and is reproduced below. The full paper, including worked examples for the OUAKNINE, HARROCH/BENARROCH, and BITTON families, a discussion of limitations, and the full Beider-grounded PREFIX_SET, is available below.

PREFIX_SET = frozenset({
    "BN", "BEN", "BAR", "BAT", "BU", "BOU",
    "EL", "AL", "A", "O", "I", "U", "E",
    "H", "HA", "L", "LB", "M", "B",
})

def mjpn_cluster(roots, prefix_set=PREFIX_SET, k=3):
    R = set(roots)
    parent = {r: r for r in R}

    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x

    def union(a, b):
        ra, rb = find(a), find(b)
        if ra != rb:
            if ra < rb: parent[rb] = ra
            else:       parent[ra] = rb

    sorted_prefixes = sorted(prefix_set, key=lambda p: -len(p))

    for r in R:                                 # Rule P
        for p in sorted_prefixes:
            if r.startswith(p) and len(r) > len(p):
                rest = r[len(p):]
                if len(rest) >= k and rest in R:
                    union(r, rest); break

    for r in R:                                 # Rule C
        for n in range(len(r) - k, k - 1, -1):
            s = r[-n:]
            if s in R and s != r and len(r[:-n]) >= k:
                union(r, s); break

    return {r: find(r) for r in R}

Download · .docx
Full paper
7 pages, including worked examples, limitations, and the Beider-grounded prefix list.
Download · .py
Reference implementation
80 lines, standard library only. Drop-in module with four self-tests.
How to cite

Ouaknine, Y. S. (in publication). MJPN-Cluster v1.0: A Two-Rule Procedure for Family-Level Grouping of Moroccan Jewish Phonetic Roots. DHSS Hub, Open University of Israel; Yahasra.org. Retrieved from yahasra.org/papers/mjpn-cluster

@unpublished{ouaknine2026mjpncluster, author = {Ouaknine, Yohanan S.}, title = {{MJPN-Cluster v1.0: A Two-Rule Procedure for Family-Level Grouping of Moroccan Jewish Phonetic Roots}}, year = {2026}, note = {Working paper, in publication. DHSS Hub, Open University of Israel; Yahasra.org}, url = {https://yahasra.org/papers/mjpn-cluster} }

§Live deployment

The procedure is live in production on yahasra.org as of April 2026. A search for Cohen now also retrieves Hacohen burials; a search for Ouaknine reaches the Aknin, Iknin, Oknin, and Awknin spelling variants. Each result panel includes a brief etymology explaining the family link.