The genetic history of the Iberian Peninsula has always been contested terrain. For over a decade, the dominant narrative described a progressive layering of ancestries — Mesolithic hunter-gatherers, Neolithic farmers from Anatolia, and a final Bronze Age wave of steppe-derived pastoralists — converging toward a distinctly Iberian genetic identity, with haplogroup oscillations treated as gradual and regionally coherent. That narrative is now structurally broken. The release of aligned sequencing data from 10,016 ancient individuals (ENA Project PRJEB106907), generated as part of Akbari et al. (2026), is not just the largest single data release in the history of paleogenomics — it is the dataset that forces a fundamental methodological reckoning.
This blog is launched specifically in response to that disruption. Its central argument is this: the haplogroups widely considered "characteristically Iberian" are far less autochthonous than the canonical literature suggests, and understanding why requires abandoning the linear, pulse-migration models that dominate archaeogenetics in favour of non-linear population dynamics frameworks capable of capturing minority lineage interactions, genetic drift in structured meta-populations, and the dilution of low-frequency introgression signals across the last five centuries. If that sounds like a methodological quarrel, it is — but it is also a historical one, with direct consequences for how we understand who lived in Iberia, when they arrived, where they came from, and how they were related to people living thousands of kilometres away at the same moment.
Before diving into the data, however, a structural complaint must be registered openly.
⚠️ The 10,000-Sample Embargo: Science Held Hostage
The publication of PRJEB106907 comes after years during which an estimated ten thousand ancient human genomes — processed, aligned and quality-controlled at public cost — were withheld from the scientific community under institutional embargo arrangements that serve the publication strategies of large academic consortia rather than the advancement of knowledge. This is not a minor administrative inconvenience. It is a structural act of sabotage against independent research in population genetics. Every month of embargo represents:
Preprints and doctoral theses blocked from releasing sequenced data already deposited internally
Independent researchers denied the ability to test, replicate, or extend findings using primary data
A systematic concentration of interpretive authority in the hands of the same groups that control data access
A direct subsidy to narratives that benefit from not being tested against the full dataset
We welcome the expectation that 2026 will see a wave of preprints, theses, and supplementary ENA depositions releasing dozens of samples that have been analytically mature for years. The dataset now accessible at PRJEB106907 — including anonymised samples from contexts previously known only through grey literature and unpublished conference presentations — must be treated as common scientific heritage. This blog will operate on the principle that all data in the public record is subject to independent reanalysis, regardless of whether the original depositors have published their own interpretation.
🧬 What the Data Actually Show: "Iberian" Haplogroups Were Never Exclusively Iberian
The received narrative of Iberian haplogroup genesis runs roughly as follows: R1b-DF27, the dominant male lineage in modern Iberians (~65–70% of males in the Peninsula), is a local Bronze Age consolidation of steppe-derived R1b-M269 that diversified in situ after the Bell Beaker phenomenon. Similarly, mtDNA haplogroups H1j1, U5b1f1a, and related Franco-Cantabrian refugium lineages are described as having been "already present in the Iberian Peninsula before the Bronze Age" and are treated as markers of deep Iberian continuity.
What the ancient DNA record shows, when examined without editorial simplification, is considerably more complex.
The Y-chromosome replacement was catastrophic and rapid. Bronze Age Iberian males show a virtual elimination of pre-existing Chalcolithic lineages (primarily I2a-M423 and G2a-P15) and their substitution with R1b-M269 subclades linked to steppe ancestry — a bottleneck event that is statistically extreme and inconsistent with gradual diffusion. The replacement signal at the Y-chromosome is among the most dramatic in the entire European aDNA record.
But autosomal steppe ancestry remained markedly lower in Iberia than in Central Europe. This is the central paradox: Iberian Bell Beaker males show near-total Y-chromosome replacement, yet their autosomal genomes retain a strong pre-steppe Chalcolithic Iberian component, far more so than contemporary Bell Beaker individuals in Britain or Scandinavia. This is the signature of male-biased migration — a small number of steppe-affiliated men reproducing preferentially with local women — not of demographic substitution.
The Northeast Iberian Iron Age aDNA record tells a different story again. Recent doctoral work (Cuesta_Aguirre, UAB, 2025) combining 73 individuals revealed that while most paternal haplogroups have apparent local roots, signals of long-distance female-mediated gene flow from North Africa, the Central and Eastern Mediterranean, and Central Europe are consistently detectable. The maternal lineages tell a fundamentally different story from the paternal ones — and that asymmetry is the key to everything.
The Bell Beaker genetic isolation paradox. Iberia and Central Europe maintained markedly distinct genetic profiles through the Chalcolithic despite extensive material-culture exchange — a decoupling of cultural and genetic diffusion that linear migration models cannot explain without invoking drift or reproductive isolation. Culture moved; most people did not.
🏺 The Bronze Age Flux-and-Reflux: When "Iberian" Lineages Were Already Elsewhere
One of the most consequential — and least discussed — revelations emerging from the high-coverage aDNA literature of the last two years is that several maternal and paternal haplogroups treated as diagnostic of Iberian autochthony were already circulating across Western and Central Europe at the onset of the Bronze Age, with confirmed presence in northern Italy and Central European contexts contemporaneous with, or prior to, their supposed Iberian florescence.
The best-documented case is mtDNA haplogroup U5b1f1a — long described as a Franco-Cantabrian autochthonous lineage and one of the signature haplotypes of the Busturialdea Basque population. A male individual carrying this haplogroup has now been recovered from an Early Bronze Age context near Parma, northern Italy — a result consistent with the broader mobilisation of Western European gene pools during the Bell Beaker horizon, but incompatible with the narrative of deep Iberian-Cantabrian regional restriction. Furthermore, a branch of U5b1f1a is currently attested in Germany bearing derived mutations absent in any sampled Basque female, implying a divergence event that predates the supposed consolidation of this lineage as an Iberian endemic.
Similarly, H1j1 — described by multiple studies as exclusive to Franco-Cantabrian and Basque-speaking populations — now presents at least two derived markers in an ancient German branch that are not present in any modern Basque sample in the public record. This implies either parallel derivation or a pre-Iberian common ancestral range extending deep into Central Europe, rendering H1j1's "Basque exclusivity" a sampling artefact rather than a genuine phylogeographic boundary.
These findings are not isolated anomalies. They form part of a coherent pattern in which R1b-P312 subclades — including the precursors of DF27 — did not originate in Iberia and then expand northward, but were circulating as a mobile meta-population across Western Europe, with northern Italy serving as a documented transit zone. The R1b-P312 clade splits into three main sub-branches: U152, predominant in northern Italy and the Alps; L21, dominant in Ireland and the British Isles; and DF27, numerically dominant in the Iberian Peninsula. The near-coincident age estimates for all three sub-branches (~4,200 years BP) strongly argues against an Iberian origin for DF27 specifically, and instead supports a rapid radiation from a single node — most likely in the Lower Rhine or northern Alpine zone — with subsequent parallel clade consolidation in each terminal region.
Several R1b-DF27 lineages demonstrably passed through northern Italy before establishing in NW Iberia: the shared IBD architecture between aDNA samples from Po Valley Early Bronze Age contexts and contemporaneous or slightly later Iberian Atlantic Bronze Age individuals is not explicable by single-pulse migration from the Rhine corridor directly into Iberia, but requires at minimum one intermediate population node on the Mediterranean coastal arc. The geographic signature of R1b-DF27's highest modern diversity in NE Iberia (particularly the Basque Country and Catalonia) is thus more plausibly interpreted as a secondary radiation centre driven by local drift amplification in small patrilocal founder communities, not as a primary origin.
🌐 Long-Distance IBD Sharing: The Kinship Network That No Admixture Model Can See
This brings us to what we consider the most fundamental methodological failure of the current archaeogenetics paradigm. Admixture-proportion-centric analyses — f-statistics, qpAdm, ADMIXTURE block decomposition — are designed to detect ancestry at population scale over many generations, not to resolve the individual-level mobility and short-range kinship that characterise the Bronze Age record. When we shift to IBD segment-length distribution analysis, a radically different picture emerges.
Long IBD segments (>8 cM) decay as a function of recombination time: a segment of 20 cM has a ~50% probability of surviving intact for approximately five meioses, implying that individuals sharing such segments are related within roughly 5–7 generations. Applied to Copper and Early Bronze Age Western Eurasians, IBD analysis recovers kinship connections spanning thousands of kilometres — between Iberian Atlantic Bronze Age individuals and contemporaneous Po Valley or Rhine corridor samples — that are entirely invisible to f-statistics at typical aDNA SNP array densities.
The Bronze Age landscape of Western Eurasia was not organised into sealed demographic compartments. It was traversed by structured networks of intermarriage, exchange, and mobile elite groups whose genealogical traces are recoverable through IBD but are systematically erased by admixture averaging. Not all R1b-P312 groups formed patrilocal settlements and expanded their clades in situ. Some R1b-DF27 lineages show clear evidence of having transited through northern Italy before establishing in NW Iberia — a mobility pattern that makes no sense under standard Bell Beaker diffusion models, but is entirely coherent within a kinship network framework where individual-level movement across 1,000+ km corridors was normal for Bronze Age elites.
A critical additional point: the apparent "localness" of Iberian haplogroups is at least partially an artefact of sampling density. We have far more Iberian aDNA samples than we have northern Italian or Provençal Early Bronze Age samples, creating an observational bias that systematically inflates the apparent regional specificity of lineages that were in reality circulating across a much larger geographic range. The Akbari et al. (2026) dataset, by massively increasing sample counts from under-represented Western and Central European zones, will directly test this inference.
🛡️ The Late Bronze Age Consolidation: When Flux Became Fixity
It is only from approximately 1300–1000 BCE — the Late Bronze Age — that the large R1b clades begin to generate the patrilocal clan structures that produce the regionally coherent haplogroup geographies we observe today. The evidence for this transition is now substantial:
Y-chromosome diversity collapses in Atlantic Bronze Age Iberian contexts, consistent with strongly patrilineal descent reckoning and male-biased inheritance of social capital — land, cattle, and metal wealth
Isotopic evidence from Atlantic Bronze Age burial assemblages in Iberia, Portugal, and Brittany documents high male philopatry alongside female exogamy — the demographic signature of patrilocal clans where women move but men remain
IBD network analysis of Late Bronze Age Central European communities demonstrates that male kin clusters persist across multiple cemetery phases while female IBD connections extend to distant regions, confirming sex-biased dispersal at the community scale
mtDNA diversity remains high through the Late Bronze Age and Iron Age precisely because female-mediated gene flow continued to import distant lineages, while Y-chromosome diversity declined — which is exactly why H1j1 and U5b1f1a persist at moderate frequencies despite the overwhelming R1b paternal sweep
The implication for modern haplogroup distributions is critical: the regional haplogroup "signatures" we observe today in Iberia are not ancient autochthonous features but products of a ~1,000-year drift-amplification process during the Late Bronze Age, when small patrilocal groups fixed particular R1b sub-branches stochastically through founder effects and clan endogamy. The narrative of "deep Iberian roots" for these lineages compresses what was in reality a dynamic, non-equilibrium system into a false image of continuity. The patriarchal regional structures that emerged from this Late Bronze Age consolidation have persisted for more than 2,000 years — but their deep-time origin is far more mobile and far more pan-European than the standard literature acknowledges.
📉 The Methodological Crisis: Why Linear Models Fail Iberian Data
The dominant analytical toolkit in Iberian archaeogenetics relies on four pillars: (1) qpAdm for modelling ancestry proportions as weighted mixtures of reference populations; (2) f3/D-statistics (ADMIXTOOLS) for detecting admixture signals; (3) PCA projections to visualise genetic affinity; and (4) haplogroup assignment as a proxy for ancestry. Each of these methods encodes linearity assumptions — that populations can be decomposed into discrete source components whose proportions sum to unity, and that admixture dynamics operate through a small number of pulses between static sources.
Recent simulation work has demonstrated that both qpAdm and f3-statistics fail systematically under complex demographic models involving multiple overlapping admixture events, unsampled ghost populations, and varying drift intensities. An additional layer of methodological concern is the identification of systematic allelic bias in in-solution hybridisation enrichment assays (Daicel Arbor Biosciences Prime Plus 1240k kit), which produces artefactual excess shared drift that corrupts within-continent relationships in PCA and f-statistics — a bias relevant to a substantial fraction of samples now in the public record.
The failure of linear superposition is particularly acute for Iberian Bronze-to-Iron Age dynamics because:
Effective population sizes in localised Iberian communities were small enough that genetic drift could rapidly fix or eliminate low-frequency introgressed lineages within 10–15 generations
Sex-biased dispersal produces systematically different inheritance dynamics for uniparental versus autosomal markers, violating the homogeneous mixing assumption of all standard admixture estimators
Minority population interactions — small-scale gene flow events from North African, Phoenician, Greek, and Central European sources documented archaeologically — generate signals that are statistically invisible to qpAdm at typical aDNA coverage depths but are recoverable through IBD length distribution analysis
🏛️ From Olalde 2019 to Oteo-Garcia 2025: The Medieval Palimpsest
The 2019 landmarks — Olalde et al. (Science, 8,000 years of Iberian genomic history) and Bycroft et al. (Science, fine-scale population structure of Iberia) — generated a compelling but profoundly simplified narrative: a progressive layering of ancestries converging toward a distinctly Iberian genetic identity. What neither study could capture was the sheer combinatorial unpredictability of Iberian genetic diversity between ~50 CE and ~1600 CE.
It took Oteo-Garcia et al. (2025, Genome Biology) to deliver the missing piece. Analysing medieval genomes from eastern Iberia across four chronological phases, the study demonstrated:
Pre-Islamic gene flow from multiple Mediterranean regions was already consolidating a pan-Mediterranean genetic background in late antiquity — making the Roman Empire the actual engine of Iberian genetic diversification
North African ancestry appears sporadically in late antiquity but becomes consolidated and demographically significant during the Umayyad and post-Umayyad Islamic period
North African ancestry persisted in Christian cemeteries until the 17th century, demonstrating that the Reconquista had minimal genetic impact on the resident population — it was a political and religious transfer, not a demographic replacement
The Expulsion of the Moriscos (1609–1614 CE) — the forced removal of approximately 300,000 individuals — was the single most decisive event in collapsing the North African genetic signal in eastern Iberia, creating the sharp discontinuity that modern surveys had wrongly attributed to gradual post-Islamic fading
Evidence of active slave trafficking from North Africa introduces a further source of non-linear, non-gradual gene flow that no pulse-admixture model can represent faithfully
The implications are radical: any haplogroup of North African or Near Eastern affinity found in Iberian medieval contexts cannot be classified as residual Islamic-period ancestry by default. It may represent Roman-era immigration, Visigothic-period Mediterranean connectivity, pre-Islamic trade networks, slave trade circuits, or Morisco demography — each with a completely distinct genealogical and historical meaning.
🔬 Anomalous Haplogroup Combinations: Preliminary Observations from PRJEB106907
The following observations concern anonymised individuals from Iberian archaeological contexts accessible through PRJEB106907, including material from Carrión province sites. These profiles were extracted without access to original metadata — a constraint that enforces methodological objectivity and eliminates confirmation bias.
Profile Type A — Eastern Mediterranean paternal lineage with Franco-Cantabrian maternal background:
At least one male individual carries a Y-chromosome haplogroup in the E-V12 cluster (specifically a subclade of E-M78), a clade documented in the northern Levant (~1878–1637 BCE), Iron Age Achziv (~900–400 BCE), Imperial Rome (~2nd century CE), and Medieval central Italy. This paternal lineage is combined with a maternal haplogroup characteristic of the Franco-Cantabrian refugium and an autosomal admixture profile mixing North African (Berber-affiliated) and Iberian Atlantic components. This combination is precisely what one would predict for an individual from Muwallad elite lineages who retained pre-Islamic maternal ancestry while acquiring Eastern Mediterranean paternal ancestry through the Islamic genealogical system — or for an individual whose family was shaped by the slave trade circuits documented by Oteo-Garcia et al. (2025).
Profile Type B — South Caucasian J1 with Basque-proximal autosomal profile:
A second male individual presents a Y-chromosome haplogroup J1 of South Caucasian clade affiliation combined with an autosomal profile mapping close to modern Franco-Cantabrian populations, with a detectable minor component of North Sea / Atlantic British-affiliated ancestry. The J1 paternal lineage is genealogically inconsistent with Iberian pre-Islamic populations under standard models, but is entirely consistent with Muwallad elite lineages of the Banu Qasi type — dominant in the Ebro valley through the 9th and early 10th centuries, documented as having Arab paternal ancestry (acquired through real or fictive genealogical conversion) superimposed on an existing Vascone autosomal and maternal background. The Banu Qasi held close alliance ties to the Vascone nobility, a genealogical structure that would generate exactly this combination of a non-Iberian J1 paternal haplogroup with a Franco-Cantabrian-proximal autosomal profile. The minor British-affiliated autosomal component may reflect Visigothic heritage or post-Roman Germanic settlement in the Upper Ebro corridor.
These profiles are published in anonymised form at casperhub.dev and are currently awaiting haplogroup tree placement confirmation at FTDNA. They are noted here as research hypotheses pending metadata confirmation.
🧪 PLINK IBD Analysis: First-Degree Kinship Between Anonymised Samples
A critical methodological result from the raw data analysis. Applying standard PLINK v1.9 IBD estimation (parameters: --genome --min 0.185) to VCF files generated from the original BAM alignments of PRJEB106907, two male individuals sharing identical Y-chromosome haplogroup (E-BY15909, a subclade of E-V12/E-M78) and identical mtDNA haplogroup (H1j1) yielded an IBD coefficient of:
This IBD signature — virtually all shared segments in the Z₂ state (identical by descent at both alleles) — is characteristic of monozygotic twins, a duplicated sample, or a first-degree relative pair. Working without metadata, all three interpretations remain formally open. If these represent two distinct individuals from the same burial context, the finding documents a first-degree kinship between two Iberian medieval males with an unusual paternal haplogroup — a result with direct implications for reconstructing social structure, patrilineal kin groups, and family burial practices.
The analytical pipeline:
# Step 1: BAM → pseudo-haploid VCF (random allele draw, 1240k positions)
samtools mpileup -l sites_1240k.txt -q 30 -Q 30 \
sample_A.bam sample_B.bam | \
bcftools call -m -Oz -o merged_calls.vcf.gz
# Step 2: PLINK relatedness estimation
plink --vcf merged_calls.vcf.gz \
--genome --min 0.185 \
--allow-extra-chr \
--out ibd_output
# Step 3: Parse output
awk '$10 > 0.45 {print $0}' ibd_output.genomeThis analysis was conducted intentionally without access to sample metadata, to demonstrate that IBD-based kinship inference is robust to the absence of archaeological context — a methodologically important point for the many samples in PRJEB106907 deposited with minimal site-level metadata.
⚙️ The Analytical Framework: Non-Linear Population Dynamics
This blog treats Iberian genetic history as a non-linear dynamical system rather than a series of discrete admixture pulses. The core components are:
1. Time-stratified allele frequency trajectories — analysed using the Wright-Fisher diffusion framework extended to accommodate multiple ancestry components. The key departure from standard selection scans is that I treat demographic processes (admixture, founder effects, structured gene flow) as primary drivers of frequency change, not selection alone.
2. IBD network analysis — leveraging whole-genome data from higher-coverage samples in the Akbari et al. (2026) dataset to reconstruct haplotype sharing networks. IBD-based inference recovers low-level admixture events invisible to f-statistics.
3. Non-parametric population structure modelling — moving beyond ADMIXTURE's hard assignment model toward probabilistic mixture models (SOURCEFIND, SparseMix) that allow for uncertainty in source population composition and time-varying mixing proportions.
4. Haplogroup frequency as a population-level emergent property, not a primary datum — R1b-DF27's present-day frequency in Iberia is treated not as evidence of ancient Iberian origin, but as the outcome of a stochastic bottleneck followed by demographic expansion under Late Bronze Age social conditions that can be modelled explicitly.
🛠️ Software Suite
Tier 1 — Data Ingestion and QC
Tier 2 — Standard Population Genetics (with corrections)
Tier 3 — Non-Linear and Deep Genomic Tools
📋 What This Blog Will Cover
This is the first post in a series. Subsequent posts will address:
Full re-analysis of Akbari et al. (2026) Iberian samples — coverage statistics, haplogroup calls, PCA placement, and qpAdm models against published reference panels
R1b-DF27 and the myth of Iberian genetic distinctiveness — a time-stratified frequency trajectory analysis, building on the R1b-DF27 original bottleneck post
The North African signal in Iberian aDNA: persistent, underreported, analytically challenging
Iron Age Iberia as a mosaic — integrating the NE Iberian Iron Age aDNA dataset with new Akbari et al. samples
Banu Qasi and the Muwallad genetic layer — the Morisco expulsion as a genetic event
Methodological critique of 1240k capture bias and its effects on published Iberian aDNA studies
A simulation-based evaluation of qpAdm on Iberian Bronze-to-Iron Age demographic models
All data, scripts, and supplementary analyses will be deposited on GitHub. Reproducibility is non-negotiable.
Iberian ancient DNA — foundational
Olalde et al. (2019). The genomic history of the Iberian Peninsula over the past 8,000 years. Science 363(6432):1230–1234.
Olalde et al. (2018). The Beaker phenomenon and the genomic transformation of northwest Europe. Nature.
Villalba-Mouco et al. (2021). Genomic transformation and social organization during the Copper Age–Bronze Age transition in southern Iberia. Science Advances.
Bycroft et al. (2019). Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula. Current Biology.
Western European ancient DNA — recent (2024–2026)
Oteo-Garcia et al. (2025). Medieval genomes from eastern Iberia illuminate the role of Morisco mass deportations on the genetic landscape of Spain. Genome Biology.
Ruiz de la Cuesta Aguirre, D. (2025). Unraveling the genetic history of Northeastern Iberians. Doctoral thesis, Universitat Autònoma de Barcelona.
Ruiz de la Cuesta Aguirre, D. et al. (2025). Mitochondrial DNA diversity in Northeast Iberians during the Iron Age. Portal Recerca UAB.
Carrión, P. et al. (2024). Disparate demographic impacts of the Roman colonization and the Migration Period in the Iberian Peninsula. Preprint.
Olalde, I. et al. (2026). Lasting Lower Rhine–Meuse forager ancestry shaped Bell Beaker genetic structure in the Rhine–Meuse delta.
Parasayan, O. et al. (2024). Late Neolithic collective burial reveals admixture dynamics during the third millennium BCE in western Europe. Science Advances 10:eadl2468.
Methodological
Williams, M. P. et al. (2024). Testing times: disentangling admixture histories in recent and complex demographies using ancient DNA.
Simon, A. & Coop, G. (2024). The contribution of gene flow, selection, and genetic drift to five thousand years of human allele frequency change. Proceedings of the National Academy of Sciences.
Davidson, R. et al. (2023). Allelic bias when performing in-solution enrichment of ancient human DNA. Molecular Ecology Resources.
Ringbauer, H. et al. (2023). ancIBD: screening for identity-by-descent segments in human ancient DNA. Nature Methods.
Li, R. et al. (2025). clusIBD: robust detection of identity-by-descent segments in ancient DNA.
R1b phylogeny
García-Fernández, C. et al. (2022). Y-chromosome target enrichment reveals rapid expansion of haplogroup R1b-DF27 in Iberia during the Bronze Age transition. Scientific Reports 12:20708.
Solé-Morata et al. (2017). Analysis of the R1b-DF27 haplogroup shows that a large fraction of Iberian Y-chromosome lineages originated recently in situ. PLOS ONE.
Cap comentari:
Publica un comentari a l'entrada