PhiloBERTA: A Transformer-Based Cross-Lingual Analysis of Greek and Latin Lexicons | De Litteris II

Abstract

PhiloBERTA is a cross-lingual transformer model that measures the semantic distance between the ancient Greek and Latin lexicons. From a small, carefully chosen set of term pairs drawn from the classical corpus, it computes contextual embeddings and compares them with an angular similarity metric, asking a precise version of an old philological question: when a Greek philosophical concept passed into Latin, how much of its meaning survived the crossing?

The answer the model returns is quantitative and surprisingly sharp. Etymologically related pairs preserve their meaning far more tightly than control pairs, and the effect is strongest for abstract philosophical concepts such as ἐπιστήμη / scientia and δικαιοσύνη / iustitia. The etymological set clusters around a mean similarity of 0.814 with a standard deviation of only 0.003, against 0.780 ± 0.023 for the controls (t = 3.219, p = 0.012). The dispersion, more than the mean, is the result: meaning was not merely transmitted but transmitted systematically.

Keywords

computational philology
cross-lingual embeddings
Ancient Greek
Latin
transformer models
angular similarity
semantic preservation
philosophical lexicon

Authors

Rumi A. AllbertWolfram Institute

Makai L. AllbertRalston College

Preprint & Code

arXiv:2503.05265 RumiAllbert/PhiloBERTA

πάντες ἄνθρωποι τοῦ εἰδέναι ὀρέγονται φύσει.

All human beings by nature desire to know.

— Aristotle, Metaphysics 980a21

Graecia capta ferum victorem cepit et artis intulit agresti Latio.

Captive Greece took captive her savage conqueror, and brought the arts into rustic Latium.

— Horace, Epistles II.1.156

Introduction

When a Greek philosophical term entered Latin, something was carried across and something, perhaps, was left behind. ἐπιστήμη became scientia; δικαιοσύνη became iustitia; ψυχή became anima. Generations of philologists have read these crossings as acts of interpretation, not mere substitution. This paper asks whether the question can also be made quantitative: can a modern language model measure how much of a concept's meaning survived the passage from one classical language to the other?

The difficulty is real and threefold. Parallel corpora for Greek and Latin are scarce; the vocabulary is heavily polysemous and bent by genre; and meaning itself drifts across the centuries that separate the texts. Terms like logos in Greek or anima in Latin accreted new senses with every philosophical school that took them up. General models trained on modern, high-resource languages do not transfer well to this setting, which is why computational work on classical languages has had to build its own tools (Craig et al., 2023; Perrone et al., 2021; McGillivray et al., 2019).

Contextual transformer embeddings opened a path. Multilingual models have been used for sentence alignment of Greek translations into Latin, for lemmatization, and for semantic retrieval (Craig et al., 2023; Yousef et al., 2022; Krahn et al., 2023b), while embedding- and Bayesian methods have tracked diachronic lexical change in both languages (Perrone et al., 2021; Sprugnoli et al., 2020; McGillivray et al., 2019). But these two lines of work rarely meet, and the philosophical domain — dense with conceptual polysemy and intertextual allusion — has been largely passed over. Recent domain-specific transformers such as the allusion detector of Riemenschneider and Frank, 2023 show the promise of the approach without yet addressing diachronic meaning or philosophical semantics directly. PhiloBERTA is built to occupy exactly that gap.

The Two Lexicons

Verba Migrantia

The formal object is a similarity score between a Greek term g and a Latin term l. Rather than compare a single embedding for each, the model averages over many contexts in which each term actually occurs, computing the mean similarity across every pairing of Greek and Latin context vectors. Averaging over contexts is what lets the score absorb polysemy instead of being defeated by it: a word that means several things contributes all of them, weighted by how often each sense appears.

The case for contextual embeddings over older static ones is not merely stylistic. On the model's own evaluation, moving from static FastText vectors to contextual BERT representations cuts the out-of-vocabulary rate from 18.7% to 4.2%, lifts part-of-speech accuracy from 76.3 to 85.1, and more than doubles the cross-lingual correlation (0.31 → 0.68). Static spaces, trained separately for each language, simply never align; the contextual model shares parameters and so shares a geometry.

The deeper trick addresses the scarcity of parallel Greek–Latin text. The model is trained by multilingual knowledge distillation (Reimers and Gurevych, 2020; Krahn et al., 2023a), using English as a pivot: an English teacher model's representations guide the student so that Greek and Latin are aligned through English rather than directly. The authors report that this reduces the parallel data required by 83% against direct alignment. Genre is handled as a source of variance in its own right — an analysis of variance attributes roughly 63% of the spread in similarity to genre rather than time (F = 8.92, p = 0.002) — and missing genre metadata is modelled with adversarial dropout during training rather than simply ignored.

III

The Model

PhiloBERTA

PhiloBERTA is a compact pipeline built on a familiar base. A multilingual tokenizer segments both Greek and Latin into subwords; a transformer core — bert-base-multilingual-cased, 768-dimensional, twelve attention heads, about 110 million parameters — produces contextual representations; a small temporal projection layer (a 768 → 768 linear map) absorbs diachronic variation; and the result is a cross-lingual embedding in which Greek and Latin share one semantic space. Figure 1 sets out the four stages.

Figure 1.PhiloBERTA's architecture. Ancient Greek and Latin enter through a shared multilingual tokenizer; the transformer core generates contextual representations; a temporal projection layer handles diachronic drift; and the aligned cross-lingual embeddings are what the similarity metric finally compares. The core is an off-the-shelf multilingual BERT, deliberately so — the novelty is in how it is trained and read, not in a bespoke backbone.

Against a plain multilingual BERT baseline the gains are consistent rather than marginal: cross-lingual accuracy rises from 0.78 to 0.92 (+18%), intra-language cohesion from 0.71 to 0.85 (+20%), and genre robustness from 0.63 to 0.89 (+41%). The temporal objective also guards against a characteristic failure of modern-trained models — semantic anachronism. A contemporary multilingual sentence model maps ἄτομον to the modern atom rather than to the Latin individualis that the ancient sense demands; PhiloBERTA's temporal masking reduces such errors by 37%. The lesson, which recurs in classical NLP, is that training data drawn from the wrong epoch quietly imports the wrong meanings (Riemenschneider and Frank, 2023).

Measuring Nearness

De Similitudine

Two embeddings can be compared in several ways; the choice matters more than it looks. PhiloBERTA uses angular similarity — one minus the arccosine of the cosine, rescaled to the unit interval — rather than raw cosine similarity. The reason is empirical: embedding magnitudes vary far more in ancient texts than in modern ones (a magnitude variance of 0.18 against 0.05), and the angular form is insensitive to magnitude, reading only the direction in which a meaning points. It is, in effect, a measure built for a noisy corpus.

The data set is small by design and large in care. Ten term pairs are studied — five etymologically related, five controls. For each term the model draws fifty contextual windows from the Perseus corpus, segmented with CLTK's sentence tokenizer for Greek and Latin, balanced so that no term is starved of context. The texts are sampled from the canonical authors of each tradition: Plato, Aristotle, Homer, Thucydides, and Herodotus on the Greek side; Seneca, Cicero, Virgil, Lucretius, and Tacitus on the Latin. Each term's embedding is the average of its fifty context vectors, taken at the model's [CLS] position; the cross-lingual score is the angular similarity between the two averages.

The control pairs are the quiet hero of the design. Without them, a high similarity score proves nothing — any two reasonable embeddings are somewhat alike. It is the gap between etymological and control pairs, and the difference in how tightly each clusters, that turns a pleasant number into evidence.

Results

Quod Invenimus

The central finding is a clean separation between the two pair types. Etymologically related pairs reach a mean similarity of 0.814 with a standard deviation of 0.003; control pairs sit lower and far looser, at 0.780 ± 0.023. A two-sample t-test gives t = 3.219, p = 0.012 — the difference is unlikely to be an accident of sampling. Figure 2 shows the two distributions; the story is in their shape as much as their position.

Figure 2.Density of cross-lingual similarity for the two pair types, drawn to a common peak height so that width carries the variance. The etymological pairs (gold) form a narrow spike at 0.814; the control pairs (slate) spread broadly around 0.780. The rug beneath the axis marks the underlying ten scores. The annotation reports the two-sample test (t = 3.219, p = 0.012).

Table 1 makes the comparison numeric. What is most telling is not the 34-thousandth gain in the mean but the collapse in spread: the etymological pairs are nearly an order of magnitude more consistent than the controls. A tight cluster at a high value is the signature of a process, not a coincidence.

Mean similarity0.8140.780+0.034

Standard deviation0.0030.023−0.020

Maximum similarity0.8200.800+0.020

Minimum similarity0.8110.741+0.070

Metric	Etymological	Control	Δ
Mean similarity	0.814	0.780	+0.034
Standard deviation	0.003	0.023	−0.020
Maximum similarity	0.820	0.800	+0.020
Minimum similarity	0.811	0.741	+0.070

Table 1.Summary statistics for the two pair types. The etymological set is not merely higher in mean similarity but an order of magnitude tighter in spread (σ = 0.003 against 0.023) — the dispersion, more than the mean, is what argues for systematic rather than accidental preservation.

The preservation is also uneven, and the unevenness is interpretable. The strongest alignments fall on the most abstract philosophical concepts: ἐπιστήμη / scientia at 0.820, δικαιοσύνη / iustitia and ἀλήθεια / veritas at 0.814, ψυχή / anima at 0.812. Figure 3 lays the five pairs out as a radar, where the etymological set traces a near-regular pentagon and the control set buckles inward — most sharply at justice.

Figure 3.The five term pairs as a radar. Each spoke is a Greek term; radial distance is cross-lingual similarity. The etymological polygon (gold) is almost uniform — the geometric face of σ = 0.003 — while the control polygon (slate) is irregular, collapsing at δικαιοσύνη. Regularity here is the evidence: systematic preservation looks like a shape that barely deviates from its circumscribing circle.

Resolved term by term in Figure 4, the same pattern recurs: the abstract concepts on the left hold their cross-lingual alignment while the etymological advantage over the control widens for terms whose mappings between the languages were less formalized. The error bars are 95% confidence intervals from ten thousand bootstrap resamples; the gaps that matter clear them.

Figure 4.Term-by-term similarity, etymological against control, for the five Greek terms. The vertical axis is zoomed to the occupied band (it begins at 0.70) so the differences are legible; the floor is marked on the figure. Abstract concepts (left) show the strongest and steadiest cross-lingual preservation.

Three conclusions follow. First, systematic preservation: the extremely low variance of the etymological pairs (σ = 0.003) points to a structured transfer of meaning rather than random convergence. Second, an abstract-concept advantage: philosophical vocabulary, lacking everyday equivalents, was preserved more faithfully than concrete terms. Third, statistical robustness: the significance (p = 0.012) together with the tightness of the etymological cluster gives the pattern evidential weight beyond any single pair.

Semantic Bridges

Pontes Significationis

The most significant number in the study is the smallest: σ = 0.003. A spread that tight, around a high mean and significantly apart from the controls, is hard to explain as chance. It suggests that the passage of these concepts from Greek into Latin was a structured process — that translators were not improvising independently but converging on shared solutions for the core vocabulary of philosophy.

Yet the preservation is concept-dependent, and that nuance is where the interpretation lives. Abstract terms — central to philosophical discourse, and often without ready equivalents in ordinary speech — appear to have been translated with more deliberation and less drift than concrete ones. This fits the conceptual-metaphor tradition (Lakoff and Johnson, 1980; Gaber, 2024): if abstract concepts are understood through stable underlying metaphors, the stability of the metaphor may be exactly what survives the crossing into another language.

For intellectual history, the implication is a measurable kind of continuity. A high, regular alignment in the key philosophical concepts supports the long-held picture of a sustained transmission of ideas from Greek thought into the Roman world (Slisli, 2024; Delignon, 2025; Arzhanov, 2024) — Horace's Graecia capta rendered as a number. The model measures similarity, not identity: ancient translators were creative adapters, not word-for-word copyists. What the results suggest is that beneath that creative adaptation a core semantic kernel was, for the specialized vocabulary of philosophy, reliably preserved. The natural next step is to ask whether different schools — Stoic against Epicurean — preserved their terms differently, a comparison this framework is built to support.

VII

Limits & Horizons

The honesty of the result depends on naming what it does not show. Four limits deserve to be stated plainly.

The term set is small and curated.

Ten pairs, five of them controls, were chosen to represent key philosophical concepts; they are not a sample of the lexicon. The pattern is sharp, but its generality is a hypothesis until the analysis is scaled to a larger and automatically selected set of terms (Copeland, 2024).

The corpus carries its own biases.

Everything rests on the digitized Perseus texts, which are extensive but incomplete. Whatever is over- or under-represented there is over- or under-represented in the embeddings, and the result inherits those tilts.

English is a pivot, and pivots leak.

Aligning Greek and Latin through an English teacher is what makes training feasible (Reimers and Gurevych, 2020), but a modern pivot can introduce subtle modern bias. Multiple pivots, or a method for direct Greek–Latin alignment, would test how much the choice of bridge shapes the view from it.

Time is handled only obliquely.

Diachronic variation is absorbed through genre-conditioning rather than an explicit temporal model — a pragmatic choice, since precise dating is unavailable for most classical works. An explicit temporal embedding would let the evolution of these relationships be tracked rather than averaged over (Perrone et al., 2021; McGillivray et al., 2019). Beyond it lie multimodal evidence from manuscripts and the school-by-school comparisons the discussion gestures toward.

VIII

Conclusion

PhiloBERTA shows that a transformer model, properly adapted for the ancient languages, can distinguish etymologically related Greek–Latin term pairs from controls with statistical significance (p = 0.012), and that the etymological pairs preserve their meaning with remarkable consistency (σ = 0.003), most of all for abstract philosophical concepts like ἐπιστήμη / scientia and δικαιοσύνη / iustitia.

The contribution is less a single number than a method. By joining cross-lingual alignment to a tolerance for diachronic and genre variation, the model turns a qualitative claim about the transmission of philosophy — that Greek concepts passed into Latin with their cores intact — into something that can be measured, contested, and extended at scale. The arts that captive Greece brought into rustic Latium left a trace precise enough, it turns out, to read off the geometry of an embedding space.

Cite this paper

Plain

Rumi A. Allbert and Makai L. Allbert (2025). PhiloBERTA: A Transformer-Based Cross-Lingual Analysis of Greek and Latin Lexicons. arXiv preprint arXiv:2503.05265. https://arxiv.org/abs/2503.05265

BibTeX

@article{allbert2025philoberta-a-transformer-based-cross-lingual-analysis-of-greek-and-latin-lexicons,
  author    = {Rumi A. Allbert and Makai L. Allbert},
  title     = {PhiloBERTA: A Transformer-Based Cross-Lingual Analysis of Greek and Latin Lexicons},
  journal   = {arXiv preprint arXiv:2503.05265},
  year      = {2025},
  url       = {https://arxiv.org/abs/2503.05265},
}

Download the PDF

References

Yury Arzhanov (2024). Porphyry in Syriac: The Treatise "On Principles and Matter" and its Place in the Greek, Latin, and Syriac Philosophical Traditions. Google Books
Rita Copeland (2024). Translating a Philosophical Style: Thomas Usk’s Boethian Prose. Google Books
Caroline Craig, Kartik Goyal, Gregory R. Crane, Farnoosh Shamsian, and David A. Smith (2023). Testing the limits of neural sentence alignment models on classical Greek and Latin texts and translations. Workshop on Computational Humanities Research
Bénédicte Delignon (2025). Aristotle’s Poetics in Horace’s Epistle to the Pisones: Transmission, cultural transfer, and auctorial rereading. Brill
Fadi Gaber (2024). Ancient Greek proverbs in Diogenianus: A semantic study. Classical Papers
John Krahn and Collaborators (2023). Knowledge distillation for ancient language alignment. Computational Linguistics Journal 45(2), 123–145
Kevin Krahn, Derrick Tate, and Andrew C. Lamicela (2023). Sentence embedding models for Ancient Greek using multilingual knowledge distillation. arXiv preprint
George Lakoff and Mark Johnson (1980). Metaphors We Live By. University of Chicago Press, Chicago, IL
Barbara McGillivray, Simon Hengchen, Viivi Lähteenoja, Marco Palma, and Alessandro Vatri (2019). A computational approach to lexical polysemy in Ancient Greek. Digital Scholarship in the Humanities
Valerio Perrone, Simon Hengchen, Marco Palma, Alessandro Vatri, Jim Q. Smith, and Barbara McGillivray (2021). Lexical semantic change for Ancient Greek and Latin. arXiv preprint arXiv:2101.09069
Nils Reimers and Iryna Gurevych (2020). Making monolingual sentence embeddings multilingual using knowledge distillation. Proceedings of EMNLP, 4512–4525. doi:10.18653/v1/2020.emnlp-main.365
Frederick Riemenschneider and Anette Frank (2023). Graecia capta ferum victorem cepit: Detecting Latin allusions to Ancient Greek literature. arXiv preprint
Fouad Slisli (2024). A metaphor to build empires: Imitatio and the politics of representation in European humanism. International Journal of Euro-Mediterranean Studies
Rachele Sprugnoli, Giovanni Moretti, and Marco Passarotti (2020). Building and comparing lemma embeddings for Latin: Classical Latin versus Thomas Aquinas. Italian Journal of Computational Linguistics
Tariq Yousef, Chiara Palladino, Farnoosh Shamsian, Anise d’Orange Ferreira, and Michel Ferreira dos Reis (2022). An automatic model and gold standard for translation alignment of Ancient Greek. Proceedings of LREC

Back to Research

arXiv Code Download PDF