divendres, 24 d’abril del 2026

From Population Genetics to Genealogical Archaeology




The past decade of ancient DNA research has transformed how we think about population history in western Eurasia. The latest large-scale releases push this transformation one step further: they do not simply refine our estimates of ancestry components or haplogroup frequencies, they change the natural unit of analysis. Instead of treating “populations” as homogeneous blocks that rise, mix and disappear, we can now start to read prehistory through the partially recoverable genealogies of real individuals and families.

This shift is not only conceptual but forced by the data themselves. When tens of thousands of ancient genomes are aligned and jointly analysed, with improved tools for detecting identity-by-descent (IBD) segments and close kinship in low-coverage material, the picture that emerges is dense, clustered and highly structured at the scale of lineages. What we see first are not abstract populations but networks of related individuals, family clusters tied to concrete sites, and repeated reappearances of the same lineages in neighbouring valleys and regions over a few centuries. Any serious interpretation of ancient DNA now has to start from this genealogical texture.

From populations to lineages

Classical population genetics provided the first robust language to talk about human past: admixture proportions, effective population sizes, clines, isolation-by-distance. For a long time, this framework was the only one available, and early ancient DNA studies naturally adopted it. A small number of individuals were used to stand in for whole “cultures,” “migrations” or “ethnic groups,” and haplogroups were often treated as diagnostic markers of populations or identities.

With the current density of sampling, this approach has become increasingly fragile. High-coverage genome-wide data reveal that:

  • Many lineages once thought of as “local” or “diagnostic” for a given region already circulated widely across western and central Europe before they became frequent in their supposed heartlands.

  • Within a single archaeological culture, we can find multiple partially overlapping clusters of kin, sometimes coexisting in the same cemetery but representing different upstream lineages.

  • Temporal series from the same micro-region show that local genealogical continuity can be strong even when ancestry proportions and cultural labels change around it.

In other words, when we look closely, populations dissolve into overlapping genealogical networks. The same is true for haplogroups: what matters is not only the presence of a label (a particular Y-chromosome or mitochondrial clade), but how that label is embedded inside real family structures, how it expands, fragments or disappears over time, and how it interacts with other lineages in the same landscape.

Why family clusters matter

The key enabling development is the ability to detect IBD segments and close kinship in ancient genomes at scale. New methods can identify relatives up to several degrees of separation, even in noisy, low-coverage data, and can map extended family structures within and across sites. This opens up a new level of resolution: instead of inferring “gene flow” in the abstract, we can point to concrete genealogical ties between communities.

This has several consequences:

  • Archaeological cemeteries that once looked like homogeneous population samples can now be recognised as structured collections of related individuals, often dominated by a few extended families.

  • Regional patterns in haplogroup frequencies may partly reflect the over-representation of particular kindreds in the available data, rather than neutral, random sampling of the underlying population.

  • Connections between sites that share rare subclades are no longer just suggestive: they can be supported by overlapping IBD segments and consistent genealogical links.

In this context, treating subclades as direct proxies for entire populations or identities becomes misleading. A subclade is better understood as a lineage label that may be carried by one or several overlapping family clusters, whose demographic success or failure is often driven by local, contingent dynamics: marriage patterns, social hierarchies, bottlenecks and founder effects at the scale of villages and valleys. A genealogical perspective forces us to honour these micro-level processes instead of collapsing them into coarse population labels.

Genealogical archaeology: a working definition

By “genealogical archaeology” we mean an approach that starts from observable biological relationships and lineages, and only then builds outward towards cultural and historical interpretation. It is still deeply informed by archaeology and history, but its primary units are:

  • family clusters and extended kindreds reconstructed from IBD and kinship analysis

  • local lineages traced through time within well-sampled regions

  • networks of genealogical connections between sites, valleys and cultural units

  • episodes of lineage expansion, contraction and replacement that can be dated and mapped

This is deliberately more modest, and in a sense more biological, than older grand narratives of “peoples” moving across maps. It accepts that our access to the past is filtered through an uneven and often biased sampling of real communities, and it tries to make those biases explicit. Instead of asking “which population did this culture belong to?”, genealogical archaeology asks questions such as:

  • Which lineages are actually present at this site, and how are they related to each other?

  • Does this cemetery reflect one dominant kindred, or multiple overlapping families?

  • How do these local lineages connect to neighbouring regions in the same time slice?

  • When a new haplogroup or subclade appears, does it arrive as part of a single expanding family, or through multiple independent introductions?

These are questions that can be answered, at least partially, with the data we have now.

Bias, anonymity and the limits of interpretation

Large, curated resources of ancient DNA have to balance scientific openness with ethical and privacy concerns. One consequence is that metadata are often restricted or anonymised: precise find locations may be blurred, archaeological attributions simplified, and detailed context redacted. At the same time, the combination of dense sampling, regional focus and kinship analysis can make some individuals and sites quite recognisable to specialists.

This creates a paradox: at the level of the data table, samples look anonymous and evenly distributed; at the level of genealogical structure, we can clearly see over-represented families and dense local clusters. If we ignore this bias, we risk overinterpreting patterns that are in part generated by the way material was excavated, selected and sequenced. Genealogical archaeology therefore puts sampling and bias at the centre of the discussion rather than treating them as afterthoughts.

Practically, this means:

  • being cautious about translating frequency patterns into strong historical claims when they may reflect a few prolific kindreds

  • avoiding speculative links between lineages and historical ethno-political labels when the metadata are truncated

  • using family clusters as a tool to identify where anonymity may be more apparent than real, and arguing for careful, region-specific interpretation rather than universal stories

In short, an explicitly biological framing — grounded in kinship, lineages and local demography — allows us to stay rigorous even when historical and archaeological labels are uncertain or contested.

A biologically grounded vocabulary

One of the aims of this blog is to adopt a vocabulary that reflects this biological and genealogical shift. Instead of defaulting to terms like “migrations,” “tribes” or “cultures” as explanatory units, we will give priority to concepts that are closer to what the data directly show. Among them:

  • family cluster: a group of individuals with close IBD connections within a site or micro-region

  • local lineage: a subclade or set of related subclades traced across multiple time points in the same region

  • lineage expansion: a rapid local increase of a particular lineage, usually over a few centuries

  • partial replacement: the introduction and rise of new lineages in a region without complete displacement of older ones

  • genealogical over-representation: situations where a small number of extended families account for a disproportionate share of the sequenced individuals in a dataset

These terms are not meant to replace archaeological or historical language, but to discipline it. When we do use culture names or historical labels, we will do so explicitly as interpretative overlays on top of an underlying network of biological relationships.

Where subclades still matter

None of this means that Y-chromosome and mitochondrial subclades suddenly lose their interpretative value. On the contrary, they become more informative once they are embedded in a genealogical context. Subclades can help us:

  • track the spatial and temporal trajectory of specific lineages within a region

  • identify links between distant sites that share rare or derived lineages

  • reconstruct hierarchies of lineages within cemeteries and communities

  • compare ancient lineage structure with present-day patterns from projects like FTDNA and YFull

However, in this framework a subclade is never a population by itself. It is a marker of biological descent that must always be interpreted in relation to the known family clusters, local lineages and regional genealogical networks. Present-day phylogenies derived from living males can still provide essential scaffolding for interpreting ancient lineages, but they do not define populations either; they trace the survival and reshaping of particular branches through recent history.

Towards a unified framework

The goal of this series is to build, step by step, a unified yet flexible framework for reading the Iberian and western European past from this genealogical vantage point. In upcoming posts, we will:

  • examine specific regional case studies where ancient data now reveal dense family clusters and long-lived local lineages

  • revisit some “classic” subclades associated with Iberia and ask how their trajectories look once we anchor them in genealogical structure rather than in static maps

  • explore how non-linear demographic dynamics — founder effects, bottlenecks, shifts in social organisation — can amplify or suppress lineages over short time scales

  • integrate information from ancient datasets with present-day Y-chromosome and mitochondrial phylogenies, always with an eye on where the correspondence is strong and where it breaks down

The working hypothesis is simple: ancient DNA no longer just refines population models; it forces us to take genealogies seriously. If we want to understand how characteristic Iberian haplogroups emerged, spread and interacted with other lineages, we must start from families, not from idealised populations. The rest of this blog will try to make that hypothesis concrete, test it against the data and explore its implications for how we think about the deep history of the peninsula.


Genealogical Archaeology in Iberia: Dynamic Populations in a Crowded Landscape

The Iberian Peninsula has long been a test case for how far we can push genetic narratives about population history. Over the last few years, archaeogenetic work has gradually dismantled the idea of a simple sequence of “peoples” replacing one another and has replaced it with a much more dynamic, entangled picture of movement, mixture and local persistence. Among the voices emphasising this complexity is Carles Lalueza-Fox, who has repeatedly argued that Bronze and Iron Age Iberian sites record a far more fluid reality than our culture labels suggest, with overlapping networks of kin and mobility operating at multiple scales.

Recent work on intramural infant burials in northeastern Iberia makes this tangible. Studies of Iron Age newborns buried within domestic spaces in Catalonia and neighbouring regions, integrating morphology, histology and genetics, have shown that many of these infants died of natural causes and were incorporated into household ritual rather than being victims of infanticide or sacrifice. Behind this finding lies a key point for genealogical archaeology: these burials anchor concrete family histories inside settlements, revealing an intimate, local dimension to Iron Age life that cannot be captured by broad labels like “Iberian” or “Celtiberian” alone.




Anonymised datasets and Iberian over-representation

Against this backdrop, the recent massive release of aligned ancient DNA data, including many samples with anonymised or truncated metadata, has created both an opportunity and a challenge for Iberian research. On the one hand, the scale of the new datasets suggests that there may now be several hundred ancient Iberian individuals represented in anonymised series, potentially approaching or surpassing five hundred, even if only a subset can be securely identified as such from genetic profiles alone.

Preliminary attempts to estimate the Iberian share of these anonymised samples, based on ancestry profiles, haplogroup composition and cluster structure, suggest that well over four hundred individuals could plausibly belong to the peninsula, with a posterior probability above eighty percent for a large subset of them. The majority of these appear to cluster genetically with medieval and early medieval populations showing strong components characteristic of Islamic-period al-Andalus, while others could equally well belong to late Roman or post-Roman contexts depending on how we define and group family clusters and haplogroups.

This is where genealogical archaeology becomes essential. If late Roman southern Iberian populations already exhibited ancestry profiles and maternal/paternal haplogroup spectra that resemble those of later Andalusi communities, as suggested by Punic and Roman-period material from sites such as Villaricos, Almería and Málaga in recent work on Phoenician and Punic populations, then using “Andalusian” as a purely chronological label becomes misleading. The underlying genealogical structure may span what we think of as pre-Islamic and Islamic periods, and clusters of related individuals can bridge traditional period boundaries.

Family clusters as the backbone of population dynamics

One of the clearest ways to bring order into this complexity is to use family clusters as the basic units for reconstructing population dynamics. When we follow extended kindreds through time and space, several patterns emerge that are difficult to see with ancestry components alone:

  • Maternal haplogroups that once looked diagnostic of specific Iron Age groups, such as Etruscan or Hallstatt-associated lineages in central Europe, reappear in Roman and later contexts in eastern Iberia and the Baetican province, often embedded in new social and cultural settings.

  • Some of these maternal lineages, especially those associated with Celtic La Tène expansions, seem to have formed part of long-lasting, possibly matrilineal or at least matrilineage-conscious clans in regions like Cantabria and southern Britain for centuries, only to become rare or undetectable in most of Iberia and Gaul during the Middle Ages.

  • Despite this apparent disappearance at the macro level, characteristic subclades of these maternal lineages persist at low frequencies across many Iberian regions from the Bronze Age onwards, leaving a faint but traceable genealogical footprint.

These observations point to a landscape where lineages do not simply “appear” and “vanish” with cultures. Instead, maternal kin networks weave through the peninsula over many centuries, changing their social roles and demographic weight but rarely being completely erased. For genealogical archaeology, this means that maternal haplogroups can serve as long-term markers of connectivity between regions—between Etruria and eastern Iberia, between Hallstatt/La Tène zones and the Cantabrian area—provided we read them through the lens of family clusters and local histories rather than as static ethnic signatures.

Paternal lineages: founders, survivors and surprises

On the paternal side, the situation is even more dramatic. Modern Y-chromosome phylogenies, such as the ones curated by FamilyTreeDNA and other projects, have long highlighted the extraordinary expansion of R1b-M269 and its downstream branches in Europe, while hinting at a deeper, more diverse Paleolithic and Mesolithic reservoir of R1b lineages in eastern Europe and western Asia.

Ancient DNA continues to reinforce this picture. The discovery of previously unknown ancient R1b branches in regions like India and Central Asia, together with ancient Balkan and steppe samples, suggests that what we now think of as “the European R1b-M269 lineage” was once only one of many low-frequency branches of a much broader R1b radiation. A striking recent example is an anonymised sample in the Akbari dataset, now visible in public Y-tree tools, that appears to be a very old R-P297 lineage from the late Paleolithic Balkans, older than most previously known European representatives of this umbrella clade. 


From a genealogical perspective, this supports a founder-effect model: the main western European R1b-M269 branch that dominates present-day populations emerged from a background where R1b as a whole was present but not particularly common, and underwent a massive expansion during the Neolithic and especially the Chalcolithic and Early Bronze Age. Remarkably, despite the flood of new ancient R1b data in recent years, no unequivocal ancient representative of the exact clade that dominates many modern western Europeans has yet been identified, underscoring how narrow and contingent this successful branch may have been compared to the broader R1b diversity.

The anonymised paternal haplogroups in the Akbari dataset push against several mainstream assumptions that were built on sparser data. Among the most important for Iberia are:

  • The survival of multiple I2a lineages well into the Bronze Age in specific sites, showing that pre-steppe European male lines did not vanish abruptly but persisted within particular communities and networks.

  • Early dates for certain R1b-Z211+ lineages—potentially over 4,000 years ago in samples that are genetically consistent with eastern Iberian contexts—challenging current time-to-most-recent-common-ancestor estimates in some phylogenetic reconstructions.

  • A substantial presence of haplogroups such as E-M34, E-M81 and J2a already in Roman-period Iberia, which, when combined with Punic and other Mediterranean data, suggests a long and complex history of male-mediated gene flow before, during and after the Roman Empire.

  • A relatively low representation of R1b-U152 (often loosely associated with “Italic” or “Celtic” inputs) and of I1 (commonly linked to Germanic groups) in contexts where some models might have expected higher frequencies, at least in the currently available anonymised sample.

  • The presence of multiple R1b-L21 lineages among Basque and near-Basque individuals between roughly 100 and 1000 CE, which complicates any simple opposition between “Atlantic” and “Basque” male-line histories and illustrates how local genealogies can absorb and reshape incoming lineages over time.

Without precise site-level metadata, we cannot and should not attach these paternal lineages to specific excavations or named communities. Yet at the level of genealogical archaeology, we can still say a great deal about the kinds of demographic processes that must have operated in Iberia and its periphery: persistent survival pockets of I2a, repeated Mediterranean inputs of E and J lineages, deep-rooted but uneven expansions of R1b branches, and regionally structured integration of Atlantic lineages such as R1b-L21 into the Basque-country and neighbouring areas during the first millennium CE.

Between data and narrative: what we can responsibly say

The combination of anonymised large-scale datasets and detailed but slow-to-publish local projects, like those on Iron Age intramural infant burials in Catalonia and Aragón, puts Iberian genetic prehistory in a peculiar position. On the one hand, we know from public talks and preliminary reports that there is a wealth of site-specific genealogical information—kinship networks within households, sex-biased funerary practices, local continuity across cultural shifts—that has not yet fully entered the published literature. On the other hand, we can see anonymised patterns that almost certainly include some of these same individuals and lineages, but we cannot safely re-identify sites or families without breaking the intended anonymisation.

In this situation, genealogical archaeology offers a cautious path forward:

  • We treat family clusters, local lineage structure and haplogroup spectra as tools to describe the kinds of demographic dynamics that must have occurred (founder effects, survival pockets, recurrent inflows) without naming specific sites unless they are already published.

  • We accept that many Iberian samples are effectively “over-represented” in current datasets because of excavation and sequencing choices, and we incorporate this bias into our interpretations instead of ignoring it.

  • We emphasise how maternal and paternal lineages link Iberia to wider Mediterranean and European networks—Etruscan, Hallstatt, La Tène, Punic, Roman, Islamic—without forcing them into simplistic one-to-one mappings with historical identities.




In practical terms, this means that when we discuss, for example, the appearance of a particular R1b subclade in a late Iron Age or Roman context, we will frame it as evidence of the expansion of a concrete lineage, not as definitive proof of the arrival of a named “people.” When we highlight the disappearance of certain Celtic-associated maternal lineages from most of medieval Iberia and Gaul, we will ask whether this reflects true demographic replacement, changes in social structure, or sampling biases, rather than jumping straight to narratives of conquest and extinction.

What we gain in exchange is a more honest, biologically grounded picture of Iberian population history: one in which lineages have long, tangled lives across periods and labels, and in which a single extended family buried under the floor of an Iron Age house can tell us as much about the deep structure of the peninsula as a dozen arrows on a migration map. The challenge, and the ambition, of this blog is to keep our interpretations at that genealogical scale, even as new data tempt us with ever more spectacular—but potentially misleading—stories.

Cap comentari:

Publica un comentari a l'entrada

  From Archaeological Labels to Biological Structure: A Citizen-Research Pipeline for Ancient Iberia Introduction Ancient DNA research h...