Adrian Currie writes...
Here at Extinct, we’re interested in history. After all, palaeontologists want to uncover life’s deep past. But our interest in history shouldn’t stop with prehistoric events. Data and evidence themselves have a history, and a history which plays an essential role in their being data or evidence. Following on from a thought-provoking session I attended at the recent PSA meeting in Atlanta (Data in Time: The Epistemology of Historical Data*), I want to reflect a little on data’s history, and how it matters for paleontology.
Data Journeys
In the symposium, Sabina Leonelli distinguished between two kinds of time: phenomena-time and data-time. The former is what we at Extinct often focus on. Historical events and processes occur in time: the evolution of the enormous sauropod dinosaurs occupied the late Triassic right through to the great interruption at the K-Pg boundary. Paleontologists explain these critters by situating them within that temporal flow. The latter, data-time, is the focus of today’s post. Just as the sauropods evolved throughout the Mesozoic, the evidence we have concerning them has also changed.
Consider Argentinosaurus, a major contender for the terrestrial animals' heavy-weight title. There is some doubt about her length, but she’s certainly in Baleen-whale territory. This uncertainty is due to poverty of her physical remains: she’s known from a few vertebra, scraps of rib-cage, and the fibula (lower leg bone). Although working out exact size from these remains is a challenge, you don’t need to be a dinosaur scientist to know that Argentinosaurus is, well, big:
These scrappy remains, and the data extracted from them, have a history. After excavation, the fossils were prepared—the bad rock was removed from the good rock—and then measured. Those measurements are published in papers and are applied in various contexts. For instance, in combination with our knowledge of how vertebrates are put together, the measurements were used as the basis of various reconstructions. The reconstructions were carried along various representative mediums: language, pictures, and physical models.
And these reconstructions themselves form the basis of further work. The model above, for instance, was itself digitized, and that digital representation was used to ask new questions about sauropods. In 2013, Sellers et al were interested in Sauropod gait – how did these giants manage to walk? They painstakingly photographed the Argentinosaurus reconstruction and used it as the basis of a simulated sauropod:
Sellers et al layered mass, musculature and joints onto their simulant and ran a series of studies aimed at discovering how a creature with those properties would walk. Their results might themselves form the basis of further studies.
Sabina Leonelli (see, for instance, her excellent new book) uses the metaphor of journeys when discussing the historical nature of data. And indeed, the journey of the data extracted from those scraps of Argentinosaur traces has been rich and complex: switching context, modes of representation, as well as epistemic purpose.
Leonelli argues that data is intrinsically historical. That is, data can only be understood as data in relation to its journey - it is in virtue of provenance that data can be used to support a scientific claim. If I may expand on her metaphor, data journeys are never pilgrimages. There is no point where data arrives at Cathedral of Santiago de Compostela and is forgiven of its sins.
Data’s history: its source, the justification underwriting its status as data, and the journey it has taken always matters. Forgetting history can lead to its misuse. The apparent completeness of the reconstruction which Sellers et al use in their simulations could blind us to the fact that it is ultimately based on a vanishingly small number of initial pieces: the repackaging and transformations data undergoes can make it appear more stable – more trustworthy – than it is.
I read Leonelli as suggesting that data is an historical entity. In 1976, David Hull suggested the same of biological entities such as species. An historical entity or kind is such that its history is an essential property of it being that entity. I’m very attracted to Hull’s view: Homo sapiens aren’t such because they share a bunch of humany properties, but because they share in a particular evolutionary history. I’m even more attracted to this view about data, and want to draw on a classic thought experiment to explain why.
The Impossibility of Swamp-Data
Now, lots of philosophers – rightfully I think – are suspicious of the thought experiment I’m about to launch into (and not just because thought experiments are dodgy generally – I’ve heard it said that this thought experiment in particular is egregious), but I think my use of it here is okay, insofar as I’ll show that what might be a problem in other cases of historical entities is not a problem for data.
A few years ago, during a walk in New South Wales’ Budawangs with the Sons of the Desert, I got myself briefly lost – split from the group. In the few minutes it took everyone to find me, I was bitten by a particularly virulent and powerful red-bellied black snake, whose poison was of such power that I was eradicated down to my atomic structure. Simultaneously, a freak bolt of lightning hit that exact spot and, using an entirely different bunch of atoms, and due to some amazing cosmic accident, reconstructed an exact duplicate of Adrian.
The duplicate, swamp-adrian, and the original Adrian share the exact same memories, behaviors, mental states, and so forth. There is some reason to think that swamp-adrian has some right to being Adrian – and yet by the historical essentialist picture he simply can’t be. One of us was born by human parents and grew up, another was spontaneously created by a lightning strike.
This species of thought experiment was introduced by Donald Davison in 1987 and has enjoyed various alterations ever since (my favourite being Karen Neander’s). They are basically a template for generating objections to any account of a kind or entity which posits an essential history to that kind or entity. For instance, I might claim that I am Adrian—that I am this entity here (gesturing to myself)—essentially because I have a particular history. To briefly don, and then quickly remove, my Kripke hat, I might say that it is a necessary condition to be me that I was born of the particular parents that I had. Well, if that’s true, then swamp-adrian simply can’t be Adrian.
Whatever your intuitions are in this case (mine are split!) it strikes me that swamp data definitely isn’t data. This is because data is such in virtue of background theory which justifies the data’s evidential relevance, and part of this justification appeals to the history of the data. The Argentinosaurus fossil can be used as data to reconstruct the Argentinosaurus not only because we have an understanding of how fossils form, but because we know the details of how the fossil was taken out of the ground and prepared. These details matter. A set of data with the exact same properties - the same set of values, say - but of a different provenance, simply isn’t the same data. (it would be interesting to combine these thoughts with Lukas Rieppel’s discussion of the authenticity of casts…)
Data and evidence, then, are essentially historical: they have a life. But the life of data is not like our life, not only can it undergo profound transformations, but it can die, be reconstructed, and regrow again.
Zombie Data**
I’ve argued that understanding a trace’s history is required for it to count as evidence or data: how the Argentinosaurus limb became a fossil matters for what claims it can underwrite. Moreover, data drawn from a trace has a history of its own—the journey from discovered fossil to simulated sauropod was complex, and the details of that journey make an epistemic difference. I want to close by echoing a point made by Alison Wylie (especially in this wonderful paper): we must also consider the ways in which past scientists have themselves manipulated both representations of data and the material remains themselves. Moreover, palaeontologists can use new technology to reconstruct and utilize legacy data which has been compromised by its history: a kind of technological voodoo allows them to resurrect data from the dead.
There is a rich, and infamous, set of dinosaur trackways near the Paluxy River in Glen Rose, Texas. These have a rather problematic history. And not just due to silly claims that human footprints are contemporaneous with Dino-tracks. Many of the tracks are from the late Cretaceous, the most dramatic part containing a ‘chase sequence’, involving at least one theropod and a group of sauropods. The site was made famous in 1938 by Roland T. Bird. In 1940, Bird returned to the site determined to bring the chase sequence home for more systematic study. But moving that amount of rock is no small undertaking. The trackways were split into three sections, one of which ended up in the American Museum of Natural History in New York, another in the Texas Memorial Museum (where it has subsequently degraded significantly), and the last, well, has since disappeared (how do you lose something like that!?).
Given the rarity of such finds, and their value for our understanding of dinosaur behaviour—hunting and flocking, for instance—the loss of the complete trackways is disastrous. On the face of it, whatever mysteries they may have shed light on, must remain mysteries. The journey of the data from this material remain is finished.
Or so you might think. In 2014, Falkingham et al published a study which used photogrammetry to attempt to digitally reconstruct the whole sequence. They, in effect, used modern technology to reconstruct the trackways as they were in 1940, which in turn might serve as data to reconstruct the trackways as they were lain down in the Cretaceous. Photogrammetry, in effect, is a way of reconstructing a 3 dimensional topography from a series of 2 dimensional photographs. To not get too caught up in the hows of it, photogrammetric techniques map spots on a 2 dimensional plane and represents them in 3 dimensions. For instance, one might use this technique to represent a landscape in 3 dimensions on the basis of satellite photographs.
Why does this matter here? Well, Bird took photographs of the trackways in situ before the excavations, and although the material remainants have disappeared, these photographs have not.
Using 17 such photographs, Falkingham et al carefully applied photogrammetry to the trackways. Their results can be compared both to Falkingham’s original sketches and to the still existing parts of the original trackways. As such, the apparently lost sections of the flight sequence can be returned to evidential life. Here's a video (from Science News) which includes Falkingham et al's digital reconstruction:
Falkingham et al emphasize this technology’s capacity to give new life to old data:
“It is an exciting prospect to think that many paleontological or archaeological specimens that have been lost to science, or suffered irreparable damage, may be digitally reconstructed in 3D using free software and a desktop computer. We envisage that historical photogrammetry will become a powerful, common, tool in the future” (5).
And I agree—the capacity to retrace and reconstruct the history of data is often as important to historical science (and science generally!) as our capacity to reconstruct the formation of the traces which the data are taken from. Past scientists sometimes had different standards, practices, and ways of understanding deep history to ours – and this led them to treat their materials in different ways. Understanding this history allows us to utilize, and reconstruct, legacy data for our own purposes.
The history of data always matters: it is only in virtue of us knowing where they came from, and how they’re been manipulated, that we can take them as data, as informing us about the past.
So, not only does life have a deep, often complex, and often difficult history for palaeontologists, but so does the scientific data they rely upon to explore that history. Two kinds of time—phenomena-time and data-time—must be attended to in reconstructing the past.
*The speakers were Sabina Leonelli, Alison Wylie, Rachel Ankeny, and David Sepkoski - a treat, to say the least!
**Refering to such data as 'zombie data' is due to Sabina Leonelli