Introduction
Paleontology-related social media was recently abuzz with the revelation of new insight into T. rex—in particular, into the jaws of the Tyrant King. New research into T. rex skull morphology by Cost et al. (2019) resolved a conflict between computer models (which showed that T. rex had a bite force capable of crushing a car; see Bates & Falkingham 2012), and skeletal reconstructions (which showed that the bones in T. rex’s skull were incapable of accommodating the strains of such a powerful bite). Cost et al’s research showed that the King’s skull could in fact accommodate such forces if its palatal bones were more rigidly and inflexibly assembled than previous reconstructions had allowed. In discussing this new research with me, some students asked: how did paleobiologists get a computer model that was inconsistent with the fossils, anyway? How are paleobiological models made?
Good questions! Similar concerns have recently been on my mind since I recently became (peripherally) involved in the Functional Trait Resource for Environmental Studies project, or FuTRES. The project’s goal is to establish a large-scale repository for specimen-level trait data across the life sciences. As a first step towards that goal, project members are working on a taxonomy of traits and measurements that can accommodate disparate life science literatures; given my interest in similar questions, I enthusiastically joined the project’s first workshop this past summer. The workshop included a vigorous discussion of how to correlate data from computational analyses of morphology with analog measurements from “legacy” literature. Doing so would at least standardize trait concept terms and measurements; ideally, the project might even naturalize them.
Philosophers as far back as Plato, who famously hoped to ‘carve’ concepts ‘according to the natural formation, where the joint is,’ have sought to naturalize concepts by finding appropriate natural correlates. Kripke (1980) and Putnam (1974) argued that a goal of science is to find the elements of nature to which we fix our names for concepts; Quine (1971) made that attempt for the concept of knowledge. Absent an appropriate natural extension for a concept term, we can nevertheless hope to standardize the term’s usage by stipulating a referent. Naturalization of a concept term may or may not imply standardization; even if the goal of FuTRES is only the latter, we should nevertheless consider the project’s potential value towards the former.
In the course of our workshop discussions, several participants cited a new paper by Bardua et al (2019), which offers suggestions for standardizing terminology and methods for measuring data for computational analyses of morphology. That work may mark an important step in the right direction of concept analysis in the life sciences, but I hope that FuTRES may go even further than that in the future.
Recognizing Landmarks
To construct computer models of fossils, we need some way of quantifying and modeling raw anatomical data. One way of doing this is by directly scanning fossils; however, this is rarely practical and always plagued with difficulty. O’Higgins et al (2017) argue that that ‘it is not within the reach of present technology to produce an accurately realistic model’ from direct scanning. There are two primary obstacles that must be overcome: first and foremost, specimens are more often than not fragmentary; even when this is not the case (Cost et al’s analysis, for example, modeled the T. rex skull from scans of BHI 3033, which is perhaps the best-preserved nonavian dinosaur skull known to science) it is nevertheless a rare case wherein fossil specimens are easily distinguished from surrounding sedimentary matrix and efforts to distinguish one from the other tends to result in loss of anatomical information. The authors infer that ‘simplification of geometry is therefore useful and necessary’ (160).
Geometric morphometric analysis is perhaps the most common method in paleobiology for geometric simplification of anatomical data. By this method, a researcher reproduces a specimen in a 2- or 3-dimensional coordinate system by attaching coordinate values to points identified on the specimen’s surface. These points, called landmarks, combine to form a geometrically simplified digital model of the specimen’s anatomy.
As you might imagine, the selection of appropriate landmarks is a fundamentally important issue in morphometric analysis. Following Bookstein (1991), Bardua et al distinguish three kinds of landmark that may be useful in modeling anatomical data: biological (“Type I”), geometrical (“Type II”), and relational (“Type III”). Biological landmarks are those that are homologous across specimens and taxa; these are the most informative for life science research, but also more difficult to assess at increasingly broad taxonomic scales. A landmark such as the ‘anterodorsal extreme of the maxillopalatine’ must remain ambiguously homologous between (say) T. rex and Homo sapiens given differences in morphology and ontogeny between the two taxa; the landmark is nevertheless diagnosable in all gnathostomes, being defined less by evolutionary history and more by geometrical position on a bone shared across the clade. Geometrical landmarks therefore carry less biological information, but may be more useful in cross-taxonomic comparisons. Finally, relational landmarks—also called semilandmarks—attempt to capture information that would be lost, such as the curvature along a bone’s surface, if the only landmarks to be diagnosed were biological or geometrical.
The diagnosis and measurement of semilandmarks are the explicit concerns addressed in Bardua et al 2019, but their recommendations necessarily suggest practices for the diagnosis and measurement of biological and geometrical landmarks as well. The authors’ primary concern is with manual input, which introduces opportunities for subjective judgment or error. In the first place, landmarks must be diagnosed manually in software from fossil scans or using landmark measurement hardware such as reflex measurement microscopes or MicroScribe tools; additionally, the model templates generated from landscape diagnosis must be manually manipulated and fitted to particular specimen anatomies for analysis. By standardizing the practices by which morphometric models are generated and manipulated, Bardua et al hope to minimize both error and the role of interpretation in morphometric analysis.
Nevertheless, interpretation seems an ineliminable element of morphometric analysis and in this sense the effort to standardize landmark diagnosis resembles efforts to standardize trait diagnosis. Particularly relevant here seems to be the distinction between biological and non-biological landmarks: even if model generation were entirely automated, the diagnosis of a landmark as biological is a theory-laden observation and therefore dependent on a researcher’s input. The role of the researcher in morphometric analysis therefore resembles the role of the preparator in fossil research: as friend of the blog Caitlin Wylie has argued so well, the distinction between fossil and matrix is a theory-laden observation that often reduces to the preparator’s judgment (2009). If the ‘ideal’ landmark is one that ‘represents a biologically homologous position on a structure,’ as Bardua et al assert (7), then landmark diagnosis is ideally theory-laden.
This is not a problem per se, but it does suggest that landmark diagnosis (and, by parity of reasoning, trait diagnosis) is more easily standardized than it is naturalized. As a step towards naturalization, projects like FuTRES may offer some tantalizing hope for the future.
Rise of the Machines
The practical impossibility of impartial observation has long plagued attempts to naturalize scientific concepts. Towards naturalization of species taxa, theorists in biology turned to cross-cultural analysis as a test of species concepts, reasoning that artificial species taxon diagnoses would vary with theoretical backgrounds (see, e.g., Mayr 1932 and Atran 1998). Reading “theory-laden” for “artificial,” we may articulate similar tests for other scientific concepts: different theory-laden diagnoses will vary with different practical standards, and so the constancy of concept diagnosis across contexts serves as evidence for the concept’s naturalness.
Around the same time that I attended the FuTRES workshop I became aware of an intriguing study by Tshitoyan et al, recently published in Nature. The authors used a machine learning algorithm to analyze word associations in abstracts from over 3 million materials science-related journal articles. Even though the algorithm was theory-agnostic, it was nevertheless able to extract sufficient information to reconstruct the entirety of the periodic table, to identify concepts in materials science that were not explicitly named in any abstract (e.g., ‘thermoelectric’), to correctly anticipate the timing of new discoveries in materials science, and to predict discoveries that are yet to come in the next five years. These impressive results likely herald a landmark in developing ‘a generalized approach to the mining of scientific literature’ (2019, 95).
Indeed, Tshitoyan et al imply (conversationally, if not logically) that their machine learning algorithm exemplifies a sort of idealized impartial observer: they emphasize that the algorithm was programmed ‘without any explicit insertion of chemical knowledge’ and that the algorithm identified chemical concepts ‘without human labelling or supervision.’ To be sure, the algorithm’s output does not demonstrate the naturalness of the relevant concepts per se—especially since the data input were linguistic descriptions rather than raw data—but if the algorithm had failed to capture important chemical concepts then that would serve as evidence against the naturalness of those concepts. Even if the program isn’t truly impartial (spoiler alert: it isn’t!), it can at least provide a basis for comparison similar to those found in cross-cultural analyses.
This, then, is one of my hopes for the future of large-scale trait databases like FuTRES: that they may provide the data for tests of the naturalness of our concepts. Machine-learning algorithms similar to Tshitoyan et al’s may parse the database literature input, which includes diagnoses and measurements from a variety of practical standards, and identify measurements consistently correlated with particular descriptions or descriptions that remain invariant across practical contexts. Landmarks or traits that vary with research context, however standardized their measures may be within that context, may be recognized as artificial; those that are more constant would have powerful evidence in support of their naturalness.
At this point, any such research remains speculative: the FuTRES project, at least, does not currently include anyone experienced enough in machine learning to program the sort of near-ideal observer created by Tshitoyan et al. As the creation of such programs becomes more familiar and accessible, however, their inevitable application to biological data promises exciting insight into the natures of our most important concepts.
References
Atran, S. (1998). Folk biology and the anthropology of science: cognitive universals and cultural particulars. Behavioral and Brain Sciences 21: 547-609.
Bardua, C., Felice, R.N., Watanabe, A., Fabre, A.C., and Goswami, A. (2019). A practical guide to sliding and surface semilandmarks in morphometric analyses. Integrative Organismal Biology 1(1): 1-34. DOI: 10.1093/iob/obz016
Bates, K.T. and Falkingham, P.L. (2012). Estimating maximum bite performance in Tyrannosaurus rex using multi-body dynamics. Biology Letters 8(4): 660-664. DOI: 10.1098/rsbl.2012.0056
Bookstein, F.L. (1991). Morphometric tools for landmark data: geometry and biology. Cambridge University Press, Cambridge.
Cost, I.N., Middtleton, K.M., Sellers, K.B., Echols, M.S., Witmer, L.M., Davis, J.L., and Holliday, C.M. (2019). Palatal biomechanics and its significance for cranial kinesis in Tyrannosaurus rex. The Anatomical Record: 1-19. DOI: 10.1002/ar.24219
Kripke, S. (1980). Naming and Necessity. Oxford University Press, New York.
Mayr, E. (1932). A tenderfoot explorer in New Guinea: reminiscences of an expedition for birds in the primeval forests of the Arfak Mountains. Natural History.
O’Higgins, P., Fitton, L.C., Godinho, R.M. (2017). Geometric morphometrics and finite element analysis: assessing the functional implications of difference in craniofacial form in the hominin fossil record. Journal of Archaeological Science 101: 159-168. DOI: 10.1016/j.jas.2017.09.011
Putnam, H. (1974). Meaning and reference. The Journal of Philosophy, 70(19): 699-711.
Quine, W. V. (1971). Epistemology naturalized. Akten Des XIV. Internationalen Kongresses Für Philosophie, 6: 87-103.
Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., Persson, K.A., Ceder, G. and Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763): 95-106. DOI: 10.1038/s41586-019-1335-8
Wylie, C. D. (2009). Preparation in action: paleontological skill and the role of the fossil preparator. In Methods in fossil preparation: Proceedings of the first annual fossil preparation and collections symposium (pp. 3-12).