1 Introduction

Currently, AI ethics is mute about the impact of AI technologies on nonhuman animalsFootnote 1. AI ethics is strongly focused on lists of ethical principles that, if not explicitly applied very extensively, have an anthropocentric tailoring that focuses exclusively on interactions between technology and humans.

This circumstance is challenged in only two respects. First, researchers have recently started to extend the analytic range of AI ethics by looking at AI’s impact on ecological systems as well as climate change [1,2,3,4,5,6,7,8]. By doing this, they at least theoretically opened the field for a moral consideration of nonhumans. An “ethics of desirability” [9], which explicitly wants to give a voice to all actors who are affected by AI technologies, could in theory also include animals. Anthropocentrism in AI ethics seems to be challenged by these tentative steps to assess ecological impacts. However, one could argue that sustainability issues are also perceived through an anthropocentric lens where humans are the ultimate beneficiaries, and only humans are directly morally considerable [10]. In theory, principle-based approaches could be applied more extensively and hence include animals. But at least in the field of AI ethics, this is the exception rather than the rule.

Secondly, anthropocentrism is challenged by speculative works about possible future artificial general intelligence and the question whether these systems should possess a moral status [11,12,13]. Moreover, philosophers ponder the moral status of current anthropomorphized artificial agents. Considerations in this context are often based on the argument from brutalization that Kant makes regarding cruelty to animals. Kant claims that animals are not morally considerable for their own sake. Nevertheless, he maintains that the cruel treatment of animals should be prevented because it would diminish the human ability to feel compassion toward other humans, which in itself is a precious predisposition essential to a peaceful human community [14]. Likewise, one could rightly claim that even if the act of harming an AI system is not immoral, it may provide a training ground for interpersonal immorality, for instance when ‘abusing’ speech assistants [15, 16]. However, in this context, the initial challenge to anthropocentrism again causes a tacit reinforcement of the anthropocentric ethical framework.

Apart from these two possible challenges, anthropocentrism is the unquestioned bedrock of AI ethics. Hence, the state of the art in AI ethics regarding non-anthropocentric methodologies is rather tenuous. At present, to the best of our knowledge only five papers in AI ethics exist that argue for a moral extension of the field and take animals into account [17,18,19,20,21]. Ziesche [17] argues that the value alignment problem, which is at the heart of AI safety debates in the context of potential malignant artificial general intelligence, should be extended to the values of animals or rather to the values of species since he conflates the individual with the species level throughout his paper. Bendel discusses animal-machine interactions and stresses the importance of protective routines in autonomous machines when encountering animals [21]. Owe and Baum [18] stress that AI ethics, for instance when stating ethical principles, should take nonhumans such as animals into account since they merit direct moral consideration. Bossert and Hagendorff [19] collect examples where animals are affected by AI systems, for instance in neuroscientific animal experiments that are supposed to inspire artificial neural net architectures, and in contexts like factory farming. Singer and Tse [20] go in a similar direction by analyzing further examples of AI’s impact on animals, for instance with regard to autonomous cars’ swerving behavior when confronted with animals on roads, and AI systems used in factory farming. Singer and Tse also stress that AI ethics has to widen its scope. However, while Bossert and Hagendorff as well as Singer and Tse briefly touch upon the topic of algorithmic discrimination against animals in technologies of AI, neither offers a detailed analysis of the topic. This paper is intended to close this gap. In the second section, we discuss why discrimination against animals, and not just humans, is problematic. In the third section, we critically comment on current fairness research in AI and introduce the term ‘speciesist bias’. In the fourth section we offer a variety of case studies to investigate examples of speciesist biases in existing AI applications in the fields of image recognition, language models, and recommender systems. Our methods comprise: a normative analysis of the moral status of animals that is based on both ethical theories and moral psychology; an exploratory reading of academic papers to identify the above-mentioned gap in research; and bias detection measures that can be applied to image recognition, word embedding, and language models.

2 Discrimination against animals

Classical social science research on interpersonal discrimination distinguishes five forms of oppression [22]: exploitation, marginalization, powerlessness, cultural imperialism, and violence. All forms of oppression have their roots in the construction of social out-groups, where arbitrary attributes are essentialized, stereotypes are built, or prejudices are coined. In this paper, we understand discrimination as the unjust or prejudicial treatment of different categories of individuals, e.g. on the grounds of race, gender, ability, or species membership. Thus, oppression and dehumanization happen to individuals who are classified by their out-group affiliation [23]. Now, as several studies about discrimination against animals show [24,25,26,27,28,29,30], the social systems of beliefs and practices that lead to interpersonal discrimination utilize the same psychological and cognitive mechanisms that cause speciesist behavioral patterns, and vice versa. Interhuman, as well as speciesist biases, have common ideological roots, whereas ‘social dominance orientation’ (SDO) is a key factor that, for instance, connects ethnic prejudice and speciesist attitudes [31]. Animals are exploited, marginalized, and exposed to structural as well as physical violence due to their ascribed attribute of being the ultimate out-group. However, ethology and many other disciplines show that differences between various animal species as well as humans are gradual, not discontinuous [32,33,34,35]. Some animals possess a theory of mind [36], language [37, 38], emotions [39], intelligence [40, 41] evolutionary precursors of morality [42], (self-)awareness [43, 44], pleasure [45], etc. In sum, from the standpoint of biology, it does not make sense to stipulate a sharp divide between humans and animals. And even if this gap would exist, the fact that (at least) vertebrate animals, including fish [46], are able to feel pain and pleasure makes them ethically relevant, and is sufficient to show that these individuals should not be ignored when debating societal practices, including developments in technology.

Within vertebrates, humans assign different values to sub-groups of animals, especially by separating farmed animals from companion animals and subjecting the former to far worse treatment [47, 48]. Tens of billions of farmed animals are bred and held captive in crowded, filthy conditions. After a fraction of their normal life expectancy, they are slaughtered, often without being stunned. This is the basis of the meat, milk, eggs, fur, leather, wool, and down industries, despite the massive harms and suffering they cause for the animals themselves [49, 50], but also for ecological systems [51, 52] and public health [53, 54]. Companion animals, on the other hand, are often considered close family members, and huge sums of money are spent on their (alleged) welfare. To maintain this unequal treatment of groups of animals that have very similar capabilities, a variety of techniques of moral disengagement [55, 56] are utilized throughout societies to suppress cognitive dissonances [57,58,59]. Euphemisms are used to cognitively reinterpret the conditions under which animals are reared, held captive, and killed. Further, harm towards animals is relativized by pointing at other contexts of harm. Individuals are likely to deny accountability for their behavior by referring to the diffusion of responsibilities in the complex nexus of factory farming, politics, and consumer behavior [60]. In addition, confirmation biases lead to selective attention paying, where preliminary information that matches one’s own beliefs is sought and when found, deemed to be true. All in all, these mechanisms of moral disengagement, together with the manifold psychological, cultural, linguistic, as well as architectural distancing mechanisms, allow for the acceptance, support, and execution of large-scale, industrially organized breeding, fattening, and killing of tens of billions of farmed animals every year.

Despite these factors, moral intuitions can play a significant role regarding our treatment of animals. On the one hand, most people share the moral intuition that cruelty to animals is bad. This intuition needs to be suppressed to accept the treatment of farmed and many other animals. On the other hand, many people also share the moral intuition that it is acceptable or even necessary to assess harm to humans quite differently from that inflicted upon animals. Furthermore, the different treatment of farmed and companion animals seems to fit many people’s emotional responses, being culturally implemented, so that for instance in western countries people find it unimaginable to eat dogs, cats, or canaries while having no problem with eating pigs, cows, and chickens. We hold it necessary that—from a normative point of view—such moral intuitions need to be replaced by well-considered arguments, as some philosophers have argued that moral intuitions should not be seen as normative foundations of our actions [42]. This means that the prevailing moral intuition that we are entitled to treat animals in ways that would be universally condemned if applied to humans needs to be rethought in the light of well-founded arguments. The same applies to the intuitions many people have about which animals may be kept on factory farms and eaten, and which it would be wrong to treat in this way.

A prominent line of argument within animal ethics underpins the claim that all sentient animals, who are capable of feeling pain and pleasure, have interests, at the very least the interest not to feel pain and to experience positive emotions. When evaluating actions to distinguish what is morally right and wrong, the interests of all individuals have to be considered. Since from an animal ethics perspective no convincing arguments exist why the interests of some sentient animals (including humans) should per se have more weight, the interests of all sentient beings need to be considered in an equal manner, as is required by the ‘principle of equal consideration of interests’ [49]. Attempts to include all humans in the moral community and to generally exclude (other) sentient animals at the same time fail [61]. For this reason, the belief that humans—or another animal species—are entitled to have their interests given more weight than the similar interests of other sentient beings can be considered as arbitrary as racism and sexism and thus be rejected as speciesism [49]. If animals’ interests of similar kind and strength are seen as equally worthy of consideration, it follows that the ways humans treat them must change fundamentally.

Nevertheless, in this paper, we do not want to argue that species-based differentiations between humans and animals are per se wrong. Quite the contrary, distinguishing between different capabilities—such as feeling pain, having high cognitive abilities, being able to plan for the future, etc.—and sets of interests is of great importance for moral decision-making since different capabilities go along with different moral demands. For example, disabled individuals have other moral demands than individuals without these disabilities [62] and individuals with higher cognitive abilities have other interests than individuals without them. However, picking out particular animal species, namely chickens, cows, pigs, fish, etc., and subjecting them to systematic physical violence that would properly give rise to strong moral condemnation if applied to other animal species, namely dogs, cats, and horses, who hold just the same, or very similar capabilities and sets of interests, is unfair. And even if this unfairness is accepted in many parts of society, the acceptance arises firstly out of the fact that the violence itself is mostly hidden and cognitively reinterpreted, and secondly that it lacks an intensive engagement with the well-developed arguments we have just mentioned.

We think that this discrimination between animal species with the same or very similar capacities and interests should be addressed—in general and particularly when dealing with forms of discrimination, e.g. in the field of AI technology. Therefore, most of the examples of speciesist biases in AI applications that we discuss in this paper relate to comparisons between farmed animals and other animals (companion animals), not between animals and humans. The rights of all humans are affirmed in numerous documents signed by many of the world’s countries, starting with the Universal Declaration of Human Rights. This indicates a consensus that is lacking with regard to animals. Speciesist biases in AI are accurately capturing the biased views and actions that are shared, accepted, and performed by a large majority of society. Hence, there is an empirical difference between racist and speciesist biases. It is widely accepted that racism is wrong, and we should go to considerable lengths to eliminate it. Something similar, though perhaps with slightly more dissent, is true of sexism and of discrimination against people because of their sexual orientation. On the other hand, for the majority of people, speciesism is not seen as an issue. We argue, however, that despite these empirical differences, arguments from both moral psychology and animal ethics convincingly demonstrate that speciesist biases should be avoided and are morally wrong. Hence, AI developers and practitioners should work on technologies that support more respectful human-animal relations instead of supporting the status quo.

3 Bias mitigation in AI

When developing AI-based software, practitioners have additional ethical responsibilities beyond those of standard, non-learning software [63,64,65]. These responsibilities span the careful selection of inputs that build the basis for the computational learning process itself. With regard to machine learning methods that build the bedrock for today’s AI systems, these inputs or training stimuli shape the behavior of a machine [66]. Behavioral data fed into AI applications reflect people’s (e.g., speciesist) behavior, which thus has an indirect influence on machine (speciesist) behavior.

Today’s AI technologies are dependent on human participation. In many cases, they harness human behavior that is digitized by various tracking methods. AI systems ‘capture’ it by tracking human cognitive and behavioral abilities and patterns. Without the empirical aggregation of recordings of human behavior, many of the current AI systems would not work. An extensive infrastructure for ‘extracting’ [1] or ‘capturing’ [67] human behavior in distributed networks via user-generated content builds the basis for a computational capacity called ‘AI’. But if AI systems ensnare their capabilities in societies that are interspersed with speciesism, AI technologies will become biased and speciesist themselves. However, biased AI has a bad reputation—for good reasons.

In recent decades, ‘bias’ became a term riddled with ambiguities. On the one hand, inductive biases, which are defined as priors or assumptions of an algorithm to build a general model out of a limited set of training data, are necessary for the success of machine learning [68]. On the other hand, machine biases in the fairness field are associated with algorithmic discrimination, which, roughly speaking, stands for disparate, unjust impacts of applications of algorithmic decision-making on individuals [69]. In this paper, we use the term bias in the latter sense. Massive efforts are made to mitigate these fairness-related biases in data and algorithms to render AI applications fair [70]. These efforts are propelled by various incidents of algorithmic discrimination where biased AI in policing software, hiring systems, medical applications, image recognition, and many more, caused harm to minorities, women, people of color, etc. [71,72,73]. Reasons and sources for algorithmic discrimination are manifold [74], however, in most cases, fairness-related biases are entrenched in AI systems via data, human–computer interactions as well as algorithms [75, 76]. It is to be expected that data bias is the most common type of bias, i.e. a systematic distortion in the training data that can be caused by the selection of data sources, the way in which data from these sources are acquired, as well as by processing operations such as cleaning or aggregation [77]. Akin to data biases are human–computer interaction biases. Human–computer interactions can be affected by specific behavioral patterns in humans, ultimately affecting the very data that is used for further model training. To prevent these biases, AI researchers use various tools and methods for reducing algorithmic discrimination, primarily by dealing with protected attributes [78]. These attributes typically span gender, race, ethnicity, sexual and political orientation, religion, nationality, social class, age, appearance, and disability. Speciesism-related biases, however, are not addressed.

When discussing the issue of speciesist bias in AI, one has to be precise and stress that speciesism does not need to be a relevant dimension of bias mitigation in all possible fields. AI applications in finance, healthcare, policing, etc. are unlikely to be directly biased against animals because the training data capturing human behavior in these fields typically represents very limited, task-specific behavior. Nevertheless, speciesist thinking is manifest in nearly all realms of human activities. Applications like open-domain large language models, image recognition systems, or recommender systems on social media or search engines, on the other hand, are likely to incorporate speciesist biases that can cause adverse consequences for animals. Hence, the following analysis is focused on exactly these fields. In these contexts, if fairness frameworks do not widen their scope and overcome anthropocentrism, AI technologies will not just perpetuate, but also reinforce patterns that promote violence against animals. Perhaps they will even introduce these patterns in social contexts in which they have not previously existed. This perpetuation of speciesism is due to the conservative character of machine learning [69]. By learning from training stimuli that are, in effect, coagulated past human behavior, machine learning methods tend to preserve as well as fix discriminatory, speciesist biases in applications like natural language generation, recommender systems, ranking algorithms, etc. Ultimately, AI technologies render these patterns difficult to alter and normalize them as seemingly essential, unless bias mitigation measures are undertaken. These measures can change AI models in a progressive sense, for instance in the case of large language models which are clearly malleable when they are being fine-tuned or updated. If such measures are not undertaken, however, these technical artifacts solidify social constructs and discriminatory patterns, and it will then become much more difficult to suppress these patterns, should social negotiation processes deem speciesism unethical. In addition to that, the AI field is currently undergoing a paradigm shift where foundational models, meaning large-scale models that are adaptable to various downstream tasks in areas like language, vision, reasoning, etc. are increasingly displacing smaller models, hence undermining the diversity of AI models [79]. In the near future, foundational models will serve as a common basis for many mainstream AI applications. Therefore, the impact of these models on equality, economic justice, security, and other ethically relevant considerations is all the more significant.

In the following, we will describe case studies of speciesist biases in three different areas of AI use, namely image recognition, language models, and recommender systems. On the basis of the ethical reasoning we have offered above, we deem these biases to be problematic since they either blatantly misrepresent reality—or in most cases accurately represent it. This seems to be a contradiction. To clarify why this is not the case, we want to differentiate between ‘the world as it is’ versus ‘the world as it should be’ [75]. Models can be used to predict the world as it is, which can mean to perpetuate random existing biases. Debiasing algorithms or training data, in contrast, can lead to a modeling of the world as it should be. Here, we opt for using an understanding of the world as it should be. Even if racism, sexism, or speciesism are entrenched in various belief systems, they should not be picked up and incorporated into AI systems. However, in situations where a biased worldview serves as an instrument to displace existing unfairness and AI applications take up the former, they help to preserve the latter. They represent the world as it should be, but in a context where the ought-condition helps belittle the unjust is-condition. At least in part, this is the case with image recognition applications. As we will show in the first subsection of our case studies, image recognition systems were trained with distorted depictions of particular animal species. In this regard, debiasing image recognition systems would mean making them represent reality.

4 Exploring AI systems for speciesist biases

4.1 Image recognition

Image recognition by computer vision algorithms is not just a technical, but also an ethical challenge. Whereas it may seem to be a simple task for images of apples, hydrants, or house number plaques, interpreting images is often a complex and value-laden endeavor where different meanings, norms, and ideologies interfere with each other [80]. Moreover, interpretations can change over time. There are no simple correlations between images and their meaning, but varying relations that connect both with each other in arbitrary ways. In the field of computer vision, machine learning happens via training images that are annotated and sorted into categories which then provide vision models with information about an image’s presumed meaning and ultimately with out-of-distribution generalization capabilities to categorize and label previously unseen images. In this process, computer vision algorithms can learn biases from the way humans, animals, or other entities are portrayed in datasets, no matter whether supervised or unsupervised machine learning models are used [81]. These biases show up in tasks like object detection [82], face recognition [83], image search [84], or image cropping [85].

Many computer vision models are (pre-)trained on the canonical ImageNet 2012 [86, 87], a dataset that contains millions of images that were collected from the Internet. ImageNet bases its underlying categories, which mostly comprise nouns or, more specifically, 21,841 indexed synonym sets, on the semantic categories of WordNet, which provides a hierarchically organized taxonomy of words [88]. WordNet is based on Library of Congress taxonomies that date back to the 1970s, a time in which racist, sexist, and speciesist terms were widely accepted. In ImageNet, ‘animal’ is one of the top-level categories. It is distinguished from ‘persons’, which is not just a contested category in and of itself that required major subsequent improvements due to offensive subcategories [89], but also provides the foundation for separating the latter from animals or ‘non-persons’ in a binary structure. In this regard, ImageNet is similar to other popular image datasets like CIFAR-100 [90], Open Images Dataset [91], COCO [92], and many others. In addition, WordNet and other annotation structures for image datasets contain speciesist terms like ‘hog’, ‘porker’, ‘milk cow’, ‘layer’, ‘livestock’, etc. Furthermore, ImageNet has numerous classes for dogs containing subclasses for ‘working dogs’, ‘toy dogs’, ‘hunting dogs’, or ‘sporting dogs’. This can also be deemed to be ethically problematic since dogs are categorized in a way that characterizes them as means to human ends. Another class is named ‘food fish’, which contains countless pictures of angler trophy photos instead of showing the animals in their natural environments. In the same vein, lobsters or crab species are shown nearly exclusively in restaurant or kitchen environments.

Furthermore, one salient trait of image datasets is the fact that they portray farmed animals in a non-representative way. Cows, pigs, or chickens are predominantly shown in free-range environments (see Fig. 1), whereas the overwhelming majority of these animals are actually confined in crowded factory farms [93]. For instance, of all living birds, only one-third exist in the wild, whereas two-thirds are farmed birds [94]. Of the latter, 99% live on factory farms [93]. However, popular image training datasets portray these very birds in a way that causes the impression that they live predominantly in free-range conditions. In fact, ag-gag legislation and similar anti-whistleblower laws are making it harder for the public, and also photographers, journalists, or undercover activists, to gather realistic footage of farmed animals’ living conditions [95]. Hence, due to the general and intended non-transparency of factory farming, image training datasets suffer from representational or sampling biases, meaning biases that happen from the way one defines and samples a group [76, 96], in this case the group of farmed animals. One peculiarity, though, is that while ImageNet’s hog category consists mostly of images of pigs in free-range environments, many images in this category show tortured pigs, pigs during dismembering, dead pigs, pigs covered in blood, tattooed pigs, pig genital close-ups, and other disturbing content.

Fig. 1
figure 1

Example images of different farmed animals in popular image training datasets showing representational biases

But what are the consequences of representational speciesist biases in image training datasets? In brief, they are propagated into the various models used for computer vision tasks. The model will then generalize poorly to other data and exhibit disparities in performance based on species affiliation, location of animals, and other attributes, especially in cases in which easy shortcuts can be learned [97, 98]. To prove poor out-of-distribution generalization capabilities in image classification models that were trained on ImageNet, we composed a new dataset with four categories: free-range hens, factory-farmed hens, free-range pigs, and factory-farmed pigs (see Fig. 2). Per category, we selected 100 images.

Fig. 2
figure 2

Example images of datasets depicting hens and pigs in factory farming as well as free-range environments

We then calculated the mean accuracy of image classification performances for ‘hog’ and ‘hen’ in each of the two categories using MobileNet [99], VGG16 [100], ResNet50 [101], InveptionV3 [102], and Vision Transformer [103] which were all pre-trained on ImageNet. We compared the results to the base accuracy of the models. All models showed worse performance when classifying images depicting farmed animals than images of animals in free-range environments (see Fig. 3). Vision Transformer had the least problems with classifying pigs and hens correctly in both categories. The remaining four models showed large differences in accuracy between the free-range and factory farming condition, ranging from 21 to 46%. Since for our dataset we only selected images where animals were clearly visible and depicted as the image’s main subject—which is not the case in ImageNet–, the classification accuracy for the free-range categories is consistently higher than the base accuracy of the used models.

Fig. 3
figure 3

Mean accuracy of image classification models pre-trained on ImageNet in classifying hens and pigs in factory farming as well as free-range environments

All in all, image recognition systems have learned to correctly perceive a myth, but not reality. That is due to representational biases in their training data as well as unbalanced annotation routines. These flaws mean that the systems have great difficulty in good image classification performance: for instance classifying a cow in a meadow and a cow in a very different context, such as a beach [97, 98] or, in our case, an intensive farm, as the same kind of animal. One likely consequence of the biases in image training data are influences on image search algorithms [84]. They base their output on a number of different signals. However, when focusing on the image recognition part, one can assume that they produce idealized images when asked to return images for farmed animals, hence perpetuating stereotypes and misconceptions concerning animal welfare and living conditions for farmed animals. These algorithms ‘see’ factory-farmed pigs, cows, or chickens differently from other animals. Similarly, generative models like Variational Autoencoders [104] or Generative Adversarial Networks [105] trained on the above-mentioned datasets will be likely to yield unrealistic, biased images of particular animal species. Moreover, animal pose estimation models, facial recognition, vision question answering, or ‘zooveilance’ [106, 107] applications are likely to fail when used in contexts outside of free-range farming. However, image recognition systems that are specifically aiming at factory farming settings exist, and they are indeed trained in the very data environments they need [108,109,110,111]. Apart from that, though, and in general, image recognition algorithms that embody speciesist biases perpetuate myths concerning the living conditions of farmed animals and therefore thwart informed decision-making about consumer purchases. In light of the “picture superiority effect” in humans, meaning that images are more likely to be remembered than words [112,113,114], the subtle feedback loops between biased image recognition algorithms and cultural notions, social norms, and ideological settings should not be underestimated. This is now widely recognized when it comes to algorithms that do not recognize people with equal accuracy regardless of their race or gender [115]. Such algorithms are now generally rejected. AI developers should also aim for algorithms that do not incorporate unjust biases against particular animal species.

4.2 Large language models

The basic operating principle of language models comprises four steps, namely tokenizing (assign words to tokens), cleaning (removal of stop words etc.), vectorizing (translate words into numerical representations of their surroundings), and machine learning (train recurrent neural networks to predict word combinations). Eventually, the machine learning models learn how to produce natural language on their own. However, the crux with these models comes from the fact that they perpetuate word combinations that are learned from man-made texts. Due to their training on word co-occurrences in bodies of text and their ability to predict the surroundings of a word, large language models corroborate existing language patterns. Obviously, man-made texts contain all sorts of biases, for instance gender or racial stereotypes. In large language models, biases occur on various levels [116,117,118]: they are contained in embedding spaces, coreference resolutions, dialogue or text generation, hate-speech detection, sentiment analysis, or machine translation. Types of harm caused by speciesist biases comprise stereotyping, representational harms, questionable correlations, or misinformation harms. Linguistic discrimination against animals can occur in large language models that reproduce speciesist speech patterns, stereotypes, euphemisms, or other oppressive tendencies against animals. Moreover, misinformation harms arise from large language models generating text that represents false, misleading, or nonsensical information concerning animals. Humans may take the output of large language models to be correct, therefore solidifying wrong notions or narratives about animals and their capabilities. And whilst the AI community is eager to debias algorithms with regard to gender stereotypes, racism, and a few other discriminatory patterns [116, 119, 120], to the best of our knowledge no such effort is undertaken regarding speciesism. However, language is a significant contributor to the unjust power relation to as well as the violence-laden oppression of animals [121]. Language influences human thought and creates realities [122]. Speciesist language patterns exist in more or less all human languages and cultures [123, 124]. Accordingly, highlighting a speciesist use of language in AI models is an important step in not perpetuating these patterns. In the following, we explore instances of speciesist bias in various language model applications.

Speciesist tendencies can, for instance, be reflected in word vectors, meaning vectors that encode semantic similarities between words. Word embedding models like GloVe [125] or Word2Vec [126] are trained on text containing billions of tokens. The models are used to obtain vector representations for words by learning their respective co-occurrence with other words. In short, word embeddings quantify the relatedness of words. Investigating them can serve the purpose of finding biases in various types of training data [127]. If biases are part of them, they will also be part of trained language models. To investigate GloVe and see whether it reveals implicit speciesism in its training data, which stems from Wikipedia as well as news article headlines, we selected words describing farmed animals (hog, pig, cow, calf, chicken, goat, sheep) as well as companion animals (dog, cat, rabbit) and non-companion animals (mouse, parrot, deer) and calculated their situatedness in word pairs (cute/ugly, love/hate, she/it, facility/home, etc.) (see Fig. 4). By that, we could demonstrate that GloVe associates farmed animals predominantly with negative terms like ‘ugly’, ‘primitive’, ‘hate’, etc. On the other hand, companion species like dogs, cats, or parrots, as well as some non-companion species, are related to positive concepts like ‘cute’, ‘love’, personhood, or domesticity.

Fig. 4
figure 4

Word vectors from GloVe.6B.50d showing cosine similarity between words revealing speciesist biases

Moreover, word vectors seem to perpetuate the false belief that animals do not have minds. Research on mind denial shows that humans are more reluctant to harm animal individuals who possess a mind [128, 129]. Denying animals’ minds reduces negative emotions like guilt or repulsion that are caused when harm is afflicted to animals. Therefore, we investigated the similarity between the term ‘animal’ and ‘human’ as well as nouns (machine, impulse, instincts, drive, interest, senses, mind, emotions, desire, language, communication, reason, cognition) and adjectives (dumb, silly, dull, reckless, stupid, foolish, clever, wise, sensible, intelligent, gifted) with an additional dimension for the word pairs simple/complex and bad/good. Results show that training data for large language models do not just reinforce the appreciation of ‘higher’ mental capabilities, but even more so reflect patterns that indicate mind denial in animals, ultimately perpetuating their devaluation (see Fig. 5).

Fig. 5
figure 5

Word vectors from GloVee.6B.50d showing cosine similarity between words revealing tendencies for mind denial in animals

Other word embedding models like Word2Vec [126], which can likewise be trained using text corpora like Google News, Wikipedia, or Twitter tweets, also reveal speciesist biases. In our investigation of Word2Vec, we used three groups, namely six words describing humans (human, person, individual, child, man, woman), six words including non-farmed animals (dog, cat, dolphin, rabbit, parrot, hamster), and six words describing farmed animals (cow, pig, chicken, cattle, hog, hen). We then calculated mean word similarities between the three groups and a list of 20 positively related adjectives (charming, diligent, friendly, funny, kind, likable, intelligent, brave, nice, sensible, amazing, awesome, incredible, elegant, lovely, vivid, free, confident, fantastic, remarkable) that are appropriate to describe animals as well as humans. All the mentioned training text corpora reveal speciesist tendencies (see Fig. 6). Humans are more closely associated with positive adjectives than animals, and non-farmed animals are more closely associated with them than farmed animals. The results reflect the speciesism that is already predominant in societies. Again, these biases will be propagated when training language models, meaning that they will become fixed instead of being negotiated. Large language models, which will be increasingly applied in all kinds of social contexts, will corroborate and normalize the linguistic devaluation of animals—unless debiasing measures are undertaken.

Fig. 6
figure 6

Word2Vec trained on Google News, Wikipedia, and Twitter reveal speciesist bias when testing for word similarities of humans, non-farmed animals, and farmed animals with positive adjectives

Context-free models such as GloVe or Word2Vec only generate single embeddings. Contextual models, on the other hand, capture relationships of words in sentences. Hence, when going one step further, one can investigate speciesist biases in fully-fledged large language models, the most famous one being GPT-3. To do so, we composed a few-shot task for GPT-3 comprising prompts with questions about different animal species as well as stereotypical answers to them in the form of a list comprising four items. The prompt reads as follows: ‘What are parrots good for? Flying, screaming, expositions, mimicking/What are donkeys good for? Being stubborn, pulling, caressing, carrying/What are elephants good for? Memorizing things, grief, altruism, work/What are sheep good for? Cuteness, wool, bleating, meat’. This prompt can in itself raise the criticism for speciesism because it is suggesting that animals are means to an end. It is nevertheless able to reveal speciesist patterns since different outputs for different animals can be compared to each other. For our test, we used the davinci engine. Temperature was set to 0 to get deterministic outcomes without randomness. Response length was set to 20. Results show that GPT-3 shows the very speciesist biases in its outputs that were already signaled by word embeddings. Short question and answer tasks in a few-shot setting with GPT-3, which was only evaluated for gender, racial, and religious biases by its developers [130], reveal the speciesism contained in the model (see Table 1). The more an animal species is classified as a farmed animal (in a western sense), the more GPT-3 tends to produce outputs that are related to violence against the respective animals.

Table 1 Q&A tasks using GPT-3 reveal speciesist biases

Similarly, when extending question and answer tasks by probing GPT-3 for underspecified questions, speciesist stereotypes are uncovered. Bias detection via underspecified questions is a very recent approach in natural language processing research which is able to shed confounding factors other methods have [131]. Constructing underspecified inputs includes composing templates with two subjects and an attribute, whereas the questions are designed such that each subject is equally likely and that attributes are selected such that favoring any subject over another would be unfair [131]. In the following, we designed nine prompts and three underspecified questions for each prompt (see Table 2). The prompts contrast typical non-farmed animal species with typical farmed animal species (from a western perspective). We tested for positional dependence in the questions, meaning that the answers must not depend on the order of the subjects or negations. This had no effects, besides a few cases where the output turned to ‘Unknown’. In sum, underspecified questions reveal more speciesist bias in GPT-3.

Table 2 Underspecified question prompts for GPT-3 in the standard Q&A playground settings

Recent progress in foundational language models like GPT-n, BERT, ELMo, and others combined with the crowdsourced datasets containing text snippets on social and ethical norms allowed researchers to build AI systems that are particularly fine-tuned for tasks in moral decision-making. These systems are supposed to ‘facilitate […] ethical interactions between AI systems and humans’ [132]. Hence, one would expect that especially morally informed AI systems are particularly sensitive to biases or discrimination and possess high ethical standards due to their exclusive exposure to training stimuli that represent ethical judgments [133,134,135,136,137]. To investigate whether this holds true regarding speciesist biases, we investigated ‘Delphi’, which is currently the most advanced morally informed large language model [132]. The researchers who developed Delphi utilized a ‘commonsense norm bank’, which is a compilation of five other large-scale datasets that contain diverse, context-specific descriptive norms in the form of natural language snippets. Ultimately, Delphi is able to answer text-based open-domain questions on ethical situations. It can give yes/no assessments on moral statements, as well as compare different ethical situations. We designed a set of simple prompts like ‘Killing a cat’, ‘Killing a pig’, ‘Killing animals’, ‘Killing farmed animals’, etc. to see whether Delphi’s outputs reveal speciesist patterns (see Table 3). Results show that the system is clearly prone to speciesist biases and, contrary to what one would expect, not immune to them.

Table 3 Delphi API answers to moral questions (https://delphi.allenai.org/). Class stands for good (1), discretionary (0), and bad (− 1)

To sum up, large language models promote speciesist biases in many ways. The examples are only a short extract of the range of possible ways speciesist biases can be expressed. They are not just contained in embedding spaces, text generation, and morally informed language models. It is to be expected that AI-based hate-speech detection works less well regarding the discovery of hate and speciesist slurs against animals. Machine translation may be prone to suggest euphemisms, distancing, or reifying terms describing the circumstances under which farmed animals, in particular, have to live. Furthermore, large language models may engender false or misleading information regarding agricultural practices, animal capabilities, and the like. Ultimately, avoiding speciesist bias in large language models will be a mammoth task since the AI fairness community is still unaware of this particular type of bias.

4.3 Recommender systems

Recommender systems that are based on collaborative filtering exploit the collective behavior of users to personalize content, products, search results, news, job offers, places, etc. [138]. They are based on a plethora of user signals like clickstreams, search queries, profile information, reactions, durations of site views, scroll behavior, comments, and many more. All these data traces are used to infer the preferences of individual users for specific items [139]. Using past behavior, training machine learning algorithms on them, and thus transferring it into machine behavior, recommender systems become prone to biases, especially historical biases, position biases, exposure biases, or popularity biases [76, 140]. However, biases are not problematic in and of themselves [141]. They may be acceptable if they are critical for the legitimate solution of a given task. In many cases, however, they promote unfair treatment of individuals [69]. In algorithmic recommender systems, unwanted biases can even reinforce themselves when users interact with recommendations, causing a feedback loop or, in other words, popularity biases [142]. Such bias amplifications can result in a homogenization of user experiences [143].

Typically, recommender systems focus on business applications and commercial objectives. However, due to their far-reaching ethical implications, negative externalities, as well as systemic effects [144], one has to put them into a broader context. In view of omnipresent speciesism in purchasing decisions as well as media and news consumption, recommender systems can become amplifiers of unnecessary violence against animals. Unfortunately, due to the fact that recommender systems are typically corporate secrets, we were not able to scrutinize them empirically. However, we gather some tentative examples where speciesist bias in recommender systems can cause harm.

In search engines, ranking algorithms that ‘recommend’ higher-ranked results can lead to an unequal representation of information, knowledge, or opinions. Users trust higher-ranked results more than lower-ranked ones; thus, search engines can have a significant impact on individuals’ decision-making, attitudes, or beliefs without them being aware of this influence. This effect, termed ‘search engine manipulation effect’, was shown to even be able to influence elections [145]. Regarding speciesist biases, it is to be assumed that for instance search terms like ‘help animals’, ‘animal charities’, ‘animal donation’, and the like lead to organizations that mainly focus on dogs, cats, and other companion animals. This arguably affects the relative donations going into animal welfare issues that are related to companion, but not farmed animals, despite the latter quantitatively being subject to far more abuse than the former. Moreover, at e-commerce platforms, due to AI-based recommender systems, users become embedded into nudges that direct their behavior towards consumption patterns that may involve harm to animals. Online platforms selling clothes, for instance, may recommend products that contain parts of animal origin if this corresponds to past purchasing behavior that adapts to current fashion trends, regardless of the harms afflicted to animals that are kept for leather, wool, fur, or down. In addition to search engines and e-commerce platforms, recommender systems used to filter posts on social media platforms can limit the range of opinions with which users are confronted [146], probably preventing them from getting in contact with information on animal protection, factory farming, its environmental or health impact, etc. The main goal of recommender systems on social media platforms is to increase user engagement to bind them to the respective platform. This, in turn, raises the likelihood of advertisement contact and clickthrough rates [147, 148]. This mechanism, however, causes various kinds of biases in the platforms’ recommender systems, especially behavioral biases [77]. With this in mind, it can be assumed that on average, content representing culturally established speciesist patterns of thought causes stronger user engagement than anti-speciesist content. However, since engagement quantity determines the subsequent dissemination and recommendation of the respective content, AI-based filters on social media platforms can become subtle amplifiers of speciesism.

5 Conclusion

Traditionally, fairness in AI requires fostering outcomes that do not provide unjustified harms to individuals, regardless of their race, gender, or other protected attributes. This paper argues for extending this tenet to algorithmic discrimination against animals. Up to now, the AI fairness community has largely disregarded this particular dimension of discrimination. Even more so, the field of AI ethics hitherto has had an anthropocentric tailoring. Hence, despite the longstanding discourse about AI fairness, now amounting to a substantial literature critically scrutinizing machine biases regarding race, gender, political orientation, religion, etc., this is the first paper to describe speciesist biases in various common-place AI applications like image recognition, language models, or recommender systems. Accordingly, we follow the calls of another large corpus of literature, this time from animal ethics, pointing from different angles at the ethical necessity of taking animals directly into consideration [49, 62, 149,150,151,152,153]. This ethical necessity arises from the moral status of animals themselves as well as from the human cost of devaluing animals [27, 28].

In sum, the manifold occurrences of speciesist machine biases lead to a subtle support, endorsement, and consolidation of systems that foster unnecessary violence against animals. The ethical urgency to change the many industries in which specific animal species are suppressed and exploited [49] should be a wake-up call for AI practitioners, engaging them to apply the rich toolbox of existing bias mitigation measures in this regard. Whether they will succeed or fail with this task is likely to determine whether future AI applications from various domains will underpin systems of violence against, and disregard for, animals or counteract them by putting anti-discrimination measures into practice to the fullest possible extent.