Charting the AI perception gap: divergent views on risk, benefit, and value between experts and the public challenge the societal acceptance of AI

65 min read Original article ↗

1 Introduction

Artificial intelligence (AI) dates back decades (McCarthy et al. 2006; Hopfield 1982; Rumelhart et al. 1986). Recent advances in algorithms, computing power, and training data have led to transformative breakthroughs and increased funding (Deng et al. 2009; Lecun et al. 2015; Statista 2022). AI is increasingly integrated across sectors such as education (Chen et al. 2020), healthcare (Amunts et al. 2023), journalism (Diakopoulos 2019), farming (Holzinger et al. 2024), innovation management (Bouschery et al. 2023), and production (Brauner et al. 2022). Beyond technical applications, AI could also reduce loneliness and enhance well-being: chatbots and social AI actors provide companionship (Ventura et al. 2025; Malfacini 2025), while embodied robots support healthcare, education, and social integration (Ahmed et al. 2024), though these applications carry risks of emotional dependency, erosion of human relationships, and other ethical consequences if not carefully designed and governed. Consequently, AI offers benefits but also raises significant societal concerns, including privacy infringements (Lim and Shim 2022), job displacement (Acemoglu and Restrepo 2017), algorithmic bias (Sadek et al. 2024; Brauner et al. 2019), and broader ethical challenges (Awad et al. 2018). Overall, expectations about AI remain divided: While some view it as a revolutionary tool that can improve our lives (Brynjolfsson and McAfee 2014; Makridakis 2017), others emphasize its potential risks (Cath 2018; Bostrom 2003) and note that large language models have already begun shaping human communication (Geng et al. 2025).

Algorithms, AI, and AI-based systems are products of human design and are shaped by developers’ values, assumptions, and biases. Hence, their output and the decisions they inform are not value-neutral and may lead to unintended or harmful consequences, such as perpetuating inequality or reinforcing existing biases (Friedman and Nissenbaum 1996; Nissenbaum 2001; Sadek et al. 2024). The “AI alignment problem” addresses these concerns by seeking to ensure that AI systems act in line with human values (Hristova et al. 2024; Gabriel 2020). This alignment involves designing systems that understand, interpret, and follow human-aligned objectives, even as AI becomes more advanced and autonomous.

Rather than addressing properties of AI systems or the “AI alignment problem” directly, this study examines how stakeholders interpret potential AI futures through distinct evaluative frameworks and mental models (Johnson-Laird 2010). Specifically, we analyze divergences in perceived likelihood, risk, benefit, and overall value between academic AI experts and members of the general public (hereafter: the public). AI experts are a particularly consequential group, as they educate practitioners, conduct foundational research, and shape governance, safety, and design paradigms that influence AI development and alignment. Accordingly, our focus is not whether AI systems can be aligned with human values in an engineering sense, but if experts and the public form diverging view about AI.

In this article, we surveyed both academic AI experts and the public regarding 71 future AI scenarios, each presenting a different capability or impact as a vignette. As individual perceptions of risk and benefit shape attitudes, usage intentions, and actual behavior (Witte and Allen 2000; Hoffmann et al. 2015; Huang et al. 2020), we measured the expected likelihood of occurrence, perceived risks and benefits, and the overall value (or sentiment) associated with these topics. By analyzing similarities and differences in these responses and visualizing the findings as cognitive maps of AI perception, this study identifies both shared and divergent views. This approach contributes to the development of a research agenda for better, human-centric AI, highlights areas that may require stronger regulation, and suggests that accessible public education on AI could play a role in bridging the identified perception gap (Marx et al. 2022).

2 Related work

This section reviews the relevant literature. It begins with an overview of prior research on the perception of AI, followed by studies comparing how experts and the public perceive technology in general and AI in particular. The section concludes by identifying the research gaps and formulating the research questions that guide the present study.

2.1 Public perception of AI

While AI has existed for decades, the rapid adoption of generative tools like ChatGPT (Hu 2023) has intensified academic focus on how the public perceives AI’s implications across diverse domains. These perceptions are not a monolithic reaction to technology but a multifaceted social construction shaped by a tension between utopian narratives of benefit and deep-seated anxieties regarding control. Although the literature identifies various drivers, including media framing, cultural context, and individual literacy, it reveals a landscape of “polarized expectations” where public imagination often diverges from technical reality. Currently, a consolidated understanding of how these factors can be integrate into a holistic model remains absent. This section reviews these influences to highlight a critical research gap: the need for an empirical foundation that systematically maps public sentiment across a broad spectrum of societal domains, moving beyond the fragmented, application-specific insights that currently dominate the field.

Media plays a significant role in shaping public opinion about AI. Fast and Horvitz (2017) conducted an analysis of three decades of AI coverage in The New York Times, observing a rise in public interest after 2009. Coverage has generally been more positive than negative, balancing optimism with concerns over control and ethical issues. Recently, reporting has reflected heightened enthusiasm for AI’s potential, particularly for healthcare, mobility, and education. News coverage often emphasizes the benefits of AI while downplaying potential risks, contributing to a perception of AI as superior to human capabilities and fostering the anthropomorphization of technology (Puzanova et al. 2024). Cave et al. (2019) examined AI narratives in the UK and identified four optimistic and four pessimistic themes. These narratives frequently evoke anxiety, with only two highlighting benefits over risks, such as AI’s potential to make life easier.

Furthermore, sentiment analysis of WIRED articles reveals an increase in polarized views, with both positive and negative sentiments intensifying over time (Moriniello et al. 2024). Sanguinetti and Palomo (2024) investigated how news outlets portray AI and found that coverage often frames AI as something to be feared, depicting it as an autonomous and opaque entity beyond human control. Using an AI anxiety index, the study analyzed newspaper headlines before and after the launch of ChatGPT, reporting both increased coverage and heightened negative sentiment.

Perceived risks and benefits play a crucial role in shaping public attitudes toward AI. Surveys indicate that the public views AI as both a risk and an opportunity. Common concerns include privacy violations and cybersecurity threats (Brauner et al. 2023), while perceived benefits are often associated with applications in urban services and disaster management (Schwesig et al. 2023; Yigitcanlar et al. 2022). Both perceived risks and opportunities shape individuals’ behavioral intentions to use AI applications and a higher perceived opportunity-risk ratio is associated with greater willingness to adopt AI, though with notable variation depending on the application context (Schwesig et al. 2023).

Neri and Cozman (2019) argue that experts significantly influence public perceptions of AI risks. Their public statements can amplify awareness of certain threats, such as existential risks, which may be rooted more in expert discourse than in actual incidents. Lee et al. (2024) found that individuals with higher education levels, greater political interest, and more knowledge about ChatGPT tend to perceive AI as more risky. This finding challenges the conventional “knowledge deficit” model, suggesting that negative perceptions may stem from a critical mindset that engages with AI technology more cautiously.

Trust in AI varies across contexts, demographic groups, and individual attitudes. For instance, people tend to trust AI more in personal lifestyle applications but remain more skeptical about its use by companies and governments (Yigitcanlar et al. 2022). Willingness to engage with AI is shaped by the perceived balance of risks and benefits, which differs across domains such as healthcare, transportation, and media (Schwesig et al. 2023). In addition to variations in trust, there remains a significant gap in public understanding of AI, which can contribute to irrational fears and misinformed beliefs about control. Promoting AI literacy is therefore essential to enable informed decision-making and support responsible innovation (Ng et al. 2021; Marx et al. 2022).

Public perception of AI also varies significantly based on local context, political ideology, and exposure to science news. For example, people in the United States generally expect more benefits than harms from AI, with a substantial portion supporting regulation to mitigate potential risks (Elsey and Moss 2023). However, existential risks are not a primary concern for most; instead, the public tends to worry more about tangible issues such as job displacement.

Sindermann et al. (2022) examined cross-cultural differences in AI attitudes among Chinese and German participants, linking fear of AI to neuroticism in both groups, while also highlighting cultural variations in AI acceptance and concern. Kelley et al. (2021) surveyed over 10,000 participants across eight countries (Australia, Canada, the United States, South Korea, France, Brazil, India, and Nigeria), finding that respondents in developed nations predominantly expressed worry and futuristic expectations, whereas those in developing countries showed greater enthusiasm for AI’s potential. In particular, South Korea emphasized AI’s practical usefulness and future applications, though widespread uncertainty about its broader societal impact persisted across all regions. In Taiwan, science news consumption and respect for scientific authority positively influence AI perceptions (Wen et al. 2024). Interestingly, a recent large cross-cultural study building on Hofstede’s cultural dimensions found that AI perception is rather shaped by individual differences than by cultural context (Wang 2025).

Public discourse on AI often oscillates between fear and inflated expectations—particularly regarding artificial general intelligence (AGI), which remains largely speculative and fictional at present (Jungherr 2023). A survey by Ipsos (2022) found that the public frequently lacks a nuanced understanding of AI’s technical capabilities and limitations. Similarly, Pew Research and Center (2023) reported that only a small percentage of Americans could accurately identify AI in everyday scenarios, highlighting widespread confusion about its scope and functionality. The Alan Turing Institute (2023) likewise observed that public understanding of AI varies considerably depending on education level and context. Common concerns tend to focus on automation and robotics, especially in relation to employment and security. This limited awareness contributes to persistent misconceptions and overly simplistic views of AI’s societal impact, ultimately hindering informed public discourse.

In summary, the drivers of public sentiment are well-documented individually as multifaceted and shaped by media narratives, perceived risks and benefits, levels of trust, and cultural and contextual factors. Yet, the resulting perceptions remain fragmented across specific use cases; this underscores the need for a systematic, cross-domain mapping to identify the stable mental models that underpin societal acceptance of AI.

2.2 Similarities and differences in risk perception between experts and the public

A robust finding across scientific disciplines is the systematic divergence between expert and lay perceptions of risk, a phenomenon that is particularly pronounced in the context of emerging technologies like AI. This “perception gap” is not merely a product of varying knowledge levels but reflects a difference in evaluative frameworks: while experts typically adopt a probabilistic and technical approach to risk, the public prioritizes qualitative dimensions such as trust, ethical implications, and dread. By comparing findings from general risk research with AI-specific studies, this section illustrates that experts often report higher trust and lower perceived risk than the general public. Identifying this perception gap is essential for governance, yet direct comparisons using identical psychometric frameworks remain remarkably scarce; a research gap that the present study seeks to fill.

First, research has shown that health experts and the public often differ in their assessments of health risks (Krewski et al. 2012). Experts typically perceive behavioral health risks (such as smoking and obesity) as more significant, while the public may prioritize other concerns. This discrepancy underscores the importance of effective risk communication strategies to better align public perception with expert evaluations.

Similarly, public perception of risks related to industrial production facilities tends to be more subjective and emotionally driven, in contrast to the more objective evaluations made by safety professionals (Botheju and Abeysinghe 2015). Such misalignments call for two-way communication approaches to address concerns proactively and prevent unnecessary escalation.

In their study of expert and laypeople perception of nanotechnology, Siegrist et al. 2007 found that while experts’ judgments are primarily driven by technical evidence and probabilistic risk, the public relies heavily on trust and the ‘affect heuristic’ (the inverse relationship between perceived risk and benefit). The findings revealed that differences in risk perception extended beyond knowledge gaps, reflecting underlying value-based judgments. We will extend this inquiry to the domain of AI to determine if this systematic perception gap persists across diverse AI applications, or if the unique societal integration of AI alters the traditional expert–layperson divergence identified in other studies.

In the case of environmental hazards, such as nuclear waste, experts and laypeople also differ markedly in their perceived risks. Here, risk perception is shaped more by attitudes and moral values than by cognitive factors (Sjöberg 1998). Laypeople, who generally have less technical knowledge, tend to rely on intuitive and emotional reasoning, whereas experts base their assessments more on technical evidence and probabilistic analysis (Sjöberg 1998). This difference often leads to divergent views on policy and regulation, with members of the public perceiving higher levels of risk than experts (Siegrist et al. 2007). The public prioritize ethical and societal implications, while experts focus primarily on scientific and technical risks.

In contrast, when it comes to natural hazards such as hurricanes and cyclones, there is often a considerable degree of agreement between expert risk assessments and public perceptions, particularly in high-risk areas (Peacock et al. 2005; Md. Abdus and Cheung 2019). However, public risk perception in these contexts can still be shaped by factors such as trust in authorities and prior experience with disasters. In the domain of autonomous vehicles (AVs), public risk perception is strongly influenced by trust in both the technology and the institutions that regulate it. Greater knowledge about AVs can enhance trust, which in turn reduces perceived risk, highlighting the importance of targeted trust-building initiatives (Robinson-Tay and Peng 2024). In aviation, by contrast, experts typically possess a more accurate understanding of relative risks, while novices’ perceptions may be distorted by overconfidence or limited experience (Thomson et al. 2004).

Elena and Johnson (2015) examined differences in expert and public perceptions of cloud computing services. The findings indicate that experts tend to have a more nuanced understanding of risks, particularly regarding data security and integrity, while members of the public are more likely to experience a generalized dread risk in response to unfamiliar or abstract technological threats. Perceptions were also influenced by factors such as trust in regulatory bodies and the perceived benefits of the technology.

A decade ago, Müller and Bostrom (2016) surveyed AI experts on their expectations for the future capabilities of artificial intelligence, finding that most anticipated the emergence of superintelligence between 2040 and 2050. Notably, one-third of these experts considered this development bad or extremely bad, highlighting substantial concerns even within the expert community.

Crockett et al. (2020) compared trust and risk perceptions of AI between the public and computer science students as individuals with—supposedly—above-average expertise in AI. The findings revealed clear differences, suggesting that education plays a crucial role in increasing trust and reducing perceived risks. Similarly, Novozhilova et al. (2024) found that greater technological competence and familiarity with AI are associated with higher levels of trust in AI systems.

Still, comprehensive comparisons between experts and the public remain relatively rare. Recently, Jensen et al. (2024) conducted interviews with 25 members of the public and 20 AI experts in the United States to examine their perceptions of AI. Both groups emphasized that AI systems reflect the values and biases of their creators, acknowledging inherent limitations in the technology. Ethical concerns cited by participants included AI’s lack of transparency, its profit-driven development, and the risk of exacerbating existing social inequalities. Human oversight was widely supported, particularly in high-stakes contexts such as healthcare. Although AI is perceived as efficient, its inability to replicate human empathy emerged as a central barrier to trust. Across both groups, reflections on humanness and ethics played a critical role in shaping attitudes toward AI.

A recent study surveyed 111 AI experts to assess their beliefs about catastrophic AI risk, familiarity with AI safety concepts, and responses to alignment arguments (Field 2025). It finds that experts divide into two broad perspectives: viewing AI as a controllable tool or as a potentially uncontrollable agent, overall with lower concern about AI risk closely linked to limited familiarity with core AI safety concepts.

Recently, in a survey of 2,778 AI researchers, Grace et al. (2025) report that experts have accelerated their forecasts for high level of machine intelligence, now predicting a 50% probability of its arrival by 2047 (thirteen years earlier than estimated in 2022). However, while a majority of experts foresee positive outcomes, nearly half assigned at least a 10% probability to extremely catastrophic scenarios, including human extinction.

In summary, while the literature highlights a consistent divergence between expert and public risk assessments, the underlying reasons for this “perception gap” remain under-explored. This study addresses this gap by employing a unified psychometric framework to compare how both groups weigh risks and benefits across a broad spectrum of AI scenarios.

2.3 Risk perception and the psychometric model

To address the fragmented nature of AI perception research, this study adopts the psychometric paradigm as its primary theoretical lens. Originally developed by Slovic et al. (1986), this framework posits that risk perception is not a technical calculation of probability but a subjective construct influenced by factors like “dread” and “knowability”. It focuses on individuals’ subjective evaluations of risk, often measured through rating scales (Fischhoff et al. 1978; Slovic et al. 1986). The psychometric paradigm has been successfully applied across diverse contexts, including nuclear energy (Slovic et al. 2000), gene technology (Connor and Siegrist 2010), genetically modified food (Verdurme and Viaene 2003), climate change (Pidgeon and Fischhoff 2011), and carbon capture (Arning et al. 2020). Consequently, it is well suited for studying risk perception in emerging technologies, as it offers a structured framework for understanding how individuals evaluate and navigate the complex balance between perceived risks and benefits.

Crucially, research within this paradigm has revealed a consistent inverse relationship between perceived risk and benefit, known as the affect heuristic (Alhakami and Slovic 1994; Slovic et al. 2007; Efendić et al. 2021). This heuristic suggests that evaluative judgments are often guided by an overall affective state: technologies perceived as highly beneficial are frequently judged as lower risk, while those viewed as risky are rated as less beneficial. By applying this framework to AI, we can determine whether the perception gap results from experts and the public utilizing different weighting schemes—that is, whether one group’s overall value judgments are driven more by perceived benefits while the other is more sensitive to perceived risks.

2.4 Research questions

The preceding review reveals two critical limitations in current AI perception research: (1) a lack of direct comparisons between academic experts and the general public using consistent evaluative criteria, and (2) a focus on narrow domains that obscures the broader mental models underlying perception. This study addresses these gaps by applying the psychometric paradigm to 71 diverse scenarios, providing a comprehensive cognitive map of the “AI perception gap”. Identifying these divergences is essential, as misaligned risk–benefit perceptions may impede technology design, individual and societal acceptance, and regulatory effectiveness. To this end, we investigate the following research questions:

  1. 1.

    Perception and impact: How do academic AI experts (who educate and influence practitioners, contribute to foundational research and develop methodological frameworks, and provide governance-relevant expertise that inform AI system development) and members of the public (who use or are affected by AI-based systems) differ in their perceptions of AI’s capabilities and impacts?

  2. 2.

    Value attribution: What values do academic AI experts and the public assign to potential AI futures? In which domains is AI perceived positively or negatively, and how do these perceptions differ between the two groups?

  3. 3.

    Risks and benefits: What risks and benefits are perceived in AI’s potential futures? How do these perceptions differ between experts and the public?

  4. 4.

    Risk–benefit trade-offs: Is the overall value attributed to AI more strongly influenced by perceived risks or by perceived benefits? How do these trade-offs vary between experts and the general public?

3 Methods

This section details the empirical approach used to investigate the perception gap between academic experts and the general public. To ensure a comprehensive mapping of AI perceptions, our methodology transitions from theoretical constructs to a comparative analysis across 71 distinct scenarios. The following subsections describe: (3.1) the systematic development of the scenario catalog and the operationalization of variables, (3.2) the recruitment and composition of the two participant samples, (3.3) the specific characteristics of both samples, (3.4) the statistical procedures employed for data filtering and the analysis of group differences and weighting schemes, and (3.5) a robustness check to verify that the international expert sample is comparable to the German public sample.

3.1 Survey design

To address our research questions, we designed two surveys: one for the public and another for academic AI experts. Both surveys centered on a shared core, in which participants evaluated a randomized random subset of 15 out of 71 micro-scenarios (to reduce participant fatigue), each describing a potential AI development or imaginary AI future (Brauner et al. 2022). Participants rated each scenario using five assessment items (see below). This design supports two complementary perspectives in interpreting the data: 1) individual-level interpretation: responses serve as reflexive indicators of participants’ underlying dispositions or differences. 2) Scenario-level interpretation: responses reflect participants’ attributions to specific technologies or topics, enabling analysis and visual mapping across groups.

To develop the list of scenarios, we first identified potential capabilities and impacts that AI might have in the near future. Drawing on prior research (Brauner et al. 2023) and three interdisciplinary expert workshops with partners from our interdisciplinary collaborations (including specialists in Computer Science, Ethics, and Sociology), we compiled an initial set of potential survey topics. For topic selection, we built on Luhmann’s system theory (“Systemtheorie”), which conceptualizes modern society as comprising six core subsystems: economy, law, science, politics, religion, and education (Luhmann 1989). Each of these subsystems fulfills a unique and non-substitutable function within society (Peterson et al. 2004). Through multiple iterative rounds, we ensured that each subsystem was represented, removed redundant items, and refined the phrasing of topics for clarity and conciseness (Brauner 2024) details strategies for development of the topic list, discusses potential biases, and outlines mitigation strategies). The final set includes both plausible and speculative AI projections, such as “AI creates many jobs”, “AI promotes innovation”, “AI acts according to moral concepts”, and “AI considers humans as a threat”. Table 4 in the Appendix lists all 71 projections.

We used five dependent variables to evaluate each topic on 6-point single-item semantic differential scales. Single-item measures enable efficient data collection and are considered appropriate for capturing well-defined constructs such as risk, benefit, and value judgments (Rammstedt and Beierlein 2014; Fuchs and Diamantopoulos 2009; Wolfers and Baumgartner 2025). The use of 6-point scales avoids midpoint bias by eliminating a neutral option, thereby encouraging more decisive responses (Tourangeau et al. 2000).

  • Expected likelihood: How likely is it that this development will occur within the next 10 years (“will not happen–will happen”)?

  • Personal risk: How do you assess the risk of this development for yourself personally (“low risk–high risk”)?

  • Societal risk: How do you assess the risk of this development for society as a whole (“socially harmless–socially harmful”)Footnote 1?

  • Perceived benefit: How beneficial or useful do you consider this development (“useless–useful”)?

  • Attributed value: Assuming this development comes true, would you consider it positive or negative (“negative–positive”)?

To assess participants’ overall evaluation of each topic, the final item drew on the value-based adoption model (VAM), which conceptualizes attributed value (ranging from positive to negative) as a suitable criterion for technology evaluation (Kim et al. 2007).

In addition to these topic-specific evaluations, we collected demographic information. All participants were asked to report their age (in years) and gender (following Spiel et al. (2019) as male, female, diverse, or prefer not to say).

In the AI expert survey, participants also self-reported their level of expertise in AI using a five-point scale (no expertise in the field of AI, basic knowledge, well informed, expert, recognized authority in the field of AI). Additional items assessed years of experience, number of scientific publications, current country of residence, and scientific discipline.

In the survey for the public, we collected data on education level, employment status, and several individual difference measures potentially influencing AI perception. These included Technology Readiness (Neyer et al. 2016), an AI Readiness Scale (Karaca et al. 2021), and interpersonal trust (Niessen et al. 2021). A detailed analysis of the impact of these individual factors is presented in a separate publication (Brauner et al. 2025). Figure 1 illustrates both surveys’ design.

Both surveys began with an informed consent, informing that participation was voluntary, that no personally identifiable information would be collected, and that the resulting data would be made available as open data. The expert questionnaire was administered in English, whereas the survey for the public was conducted in German. Our university’s Institutional Review Board (IRB) granted ethical approval (protocol ID 2023_02b_FB7_RWTH Aachen).

Fig. 1
Fig. 1

The alternative text for this image may have been generated using AI.

Full size image

Design of both surveys. In each, participants evaluated 15 randomly selected AI projections (or vignettes) from a pool of 71. The measured explanatory user factors differed between academic AI experts and the general public

3.2 Sample acquisition

The general public sample was acquired using the independent fieldwork agency Consumerfieldwork GmbH, Germany. A total of 1354 participants were recruited to represent the German population in terms of age, gender, socio-economic status, and regional distribution. Participants received an incentive of approximately 1 Euro for completing the survey. While this rate is lower than the standard hourly wage, it was calibrated to acknowledge effort without creating undue inducement that might compromise voluntary consent. Data for the public were collected between 2023-07-04 and 2023-07-09.

The academic AI expert sample was obtained via convenience and snowball sampling. To initiate this process, we contacted individuals within our academic networks, including collaborators from joint projects and funding proposals. Additionally, we contacted authors by compiling a list of publicly available email addresses of recent publications in leading AI-related venues. Between 2023-03-06 and 2023-05-10, both groups were invited via email and encouraged to forward the invitation to colleagues. In total, we contacted approximately 450 AI researchers worldwide, of which 139 proceeded beyond the landing page of the survey. To encourage participation, the expert invitation emphasized the academic relevance of the study and the open availability of its findings.

3.3 Data cleaning and analysis

We used both parametric and non-parametric statistical methods, including Bravais–Pearson (\(r\)) and Kendall’s Tau (\(\tau\)) correlation coefficients, Chi-square (\(\chi ^2\)), ordinary least squares (OLS) multiple linear regressions with robust standard errors, and multivariate analyses of variance (MANOVA) using Pillai’s trace (\(V\)) as the multivariate test statistic and effect size metric (as it represents the proportion of total variance in the dependent variables explained by the group factor (Stevens 2012)). Due to substantial different group sizes between the experts and the public, we ensured that all methods used are robust to sample imbalance. OLS point estimates are generally reliable under unequal sample sizes (Fox 2016). As their standard errors may be biased, we used robust standard errors. We further cross-validated key findings through random subsampling with equal group sizes (reducing the public’s sample to the size of the expert sample by random selection), which yielded similar results. Consistent with social science standards, we set the Type I error rate to \(\alpha =.05\) to determine statistical significance (Field 2009). Prior to conducting the analyses, we verified that all necessary statistical assumptions (including homoscedasticity and the absence of multicollinearity) were met and detected no violations. Beyond the core analyses, we conducted an exploratory cluster analysis, reported in Sect. A.2 of the Appendix, to examine whether the survey topics exhibit higher level structure and whether this structure is shared between expert and public evaluations.

We filtered the data from both samples for incomplete or low-quality responses using both reactive and non-reactive criteria (Leiner and Johannes 2019). Participants from the public were included if they completed the survey and passed an attention-check (“Please select ‘rather agree’”). For the expert sample, we included only those participants reporting at least basic knowledge of AI and had more than 1 year of experience working in the field. In both samples, we excluded participants who completed the survey in less than one-third of the median completion time (“speeders”). This threshold is generally considered sufficient to identify meaningful responses (Leiner and Johannes 2019). The median completion times of the surveys was 9.8 min for the laypeople and 9.5 min for the experts. Based on these criteria, we excluded 254 out of the 1354 participants in the public sample (exclusion rate of 18.8%) and 20 of the 139 participants in the expert sample (exclusion rate of 14.4%). All the participants fully completed the survey.

All reported statistical analysis, values, and figures in this manuscript are generated using RMarkdown and R version 4.5.1 (2025-06-13) directly from the underlying data, supporting transparent and reproducible research. Materials, including the full survey, (unfiltered) data and the filtering procedures, the manuscript with all analyses, and additional data tables and figures, are available as open data on OSF (https://osf.io/gt9un/).

3.4 Sample 1: general public

The sample of the public consists of 1100 participants, with 524 identifying as male and 570 as female. The participants’ ages ranged from 18 to 85 years, with a median age of 51 years (SD = 14.2) years. There was no significant correlation between age and gender in the sample (\(\tau =-0.031\), \(p=0.210>.05\)). A complementary article provides an in-depth analysis of the sample, focusing on the effects of individual differences (Brauner et al. 2025).

3.5 Sample 2: academic artificial intelligence experts

The expert sample consists of 119 participants with 93 identifying as male and 25 as female. Participants’ ages ranged from 23 to 75 years, with a median age of 36.5 years (SD = 13.4 years). Again, there was no significant association between age and gender (\(\tau =-0.113\), \(p=0.143>.05\)). The majority of the experts were from Germany (N = 77), followed by the Netherlands (N = 10). All other countries received less than 10 mentions.

Regarding the scientific background, most experts worked in general computer science (43%), artificial intelligence (24%), engineering (14%), or other disciplines, such as business administration, innovation management, and the social sciences (19%), reflecting both AI as a distinct disciplinary identity and AI-focused work emerging from adjacent fields. AI experience ranged from 1 to 40 years with an arithmetic mean of 10.3 and a median of 5 years. In total, the participants reported 2713 academic publications, with an arithmetic mean of 22.8 and a median of 5 publications per expert. As indicated by the high Gini coefficient (\(G=0.771\)), the number of publications is unevenly distributed among the experts—many have few, while a few have many—suggesting that both prominent figures and junior researchers have participated. When asked for their level of expertise, 11 participants reported having basic knowledge, 51 reported being well informed, 48 reported being experts, and 9 described themselves as a “recognized authority in the field of AI”.

3.6 Robustness check of the country composition of the expert sample

To assess the robustness of our findings with respect to the country composition of the expert sample, we conducted two additional multivariate analyses. Given that both the public sample and the majority of experts were drawn from Germany, these analyses were designed to examine potential bias arising from the expansion of the sample by including experts from outside Germany. Specifically, we tested (a) whether experts and the public from Germany differed in their evaluations, and (b) whether German experts differed from experts from other countries in their evaluations of expected likelihood, perceived risks and benefits, and overall attributed value. To this end, we estimated two MANOVAs, using sample (experts vs. public) as the independent variable for the German subsample and country (Germany vs. other countries) as the independent variable for the expert sample. (a) The MANOVA comparing German experts and members of the German public revealed a significant multivariate effect and significant differences in expected likelihood, perceived risks and benefits, and overall attributed value (Pillai’s Trace \(V= 0.060\), \(F(5, 1171) = 15.07\), \(p = <.001\), \(\eta ^2 = 0.060\)). (b) In contrast, the MANOVA comparing German experts with experts from other countries did not yield a significant multivariate effect (Pillai’s Trace \(V= 0.058\), \(F(5, 112) = 1.37\), \(p = 0.241\), \(\eta ^2 = 0.058\)). Taken together, these results indicate that the contrast between the public and experts is comparable when restricted to the German subsample, and that extending the expert sample to include participants from other countries is admissible in the context of this study.

4 Results

We begin by analyzing differences between the public and academic AI experts in terms of overall perceived societal and individual risk, benefit, attributed value, and expected likelihood, that is, how likely each of the AI projections is to come true within the next decade. Next, we interpret participants’ responses as reflexive measurements of latent constructs: Expectancy, risk, benefit, and value of AI are treated as individual-level personality differences, allowing us to examine if individuals of both groups trade-off perceived risks and benefits differently. Finally, we shift the perspective and interpret the responses at the topic level, focusing on how specific AI-related statements were evaluated. We assess how experts and the public differ in their expectations and valuations of these projections, including how they attribute risks and benefits to individual AI scenarios. Table 4 in the Appendix provides a complete overview of all ratings across all statements for both groups (a searchable and sortable spreadsheet is available on https://osf.io/gt9un/files/p7uet).

4.1 Do overall evaluation scores differ between academic AI experts and the public?

In this section, we examine how both groups evaluated the AI projections across the queried dependent variables: perceived individual and societal risk, perceived benefit, attributed value, and expectancy (i.e., the expected likelihood that the respective AI development will occur within the next decade). As illustrated by the box plots in Fig. 2 and the complementary Table 1, notable differences emerge between both groups.

A one-way MANOVA with group (public vs. experts) as independent variable revealed a medium and significant difference between both samples on the combined dependent variables (V=0.085, F(5, 1213) =22.572, p<0.001). Follow-up univariate analyses showed that four dependent variables exhibited significant group differences: Expectancy: F(1, 1217) = 18.572, p<0.001, risk: F(1, 1217) = 26.716, p<0.001, benefit: F(1, 1217) = 43.443, p<0.001, and value: F(1, 1217) = 21.929, p<0.001). However, the perceived societal risk did not differ between both groups (F(1, 1217) = 1.775, p=0.183). These findings indicate that academic AI expertise significantly influences not only the overall multivariate response but also each individual evaluation dimension besides the perception of societal risk.

In particular, AI experts, on average, rated the queried projections as more likely to happen within the next decade (25.2), personally safer (19.3), more useful (15.3), and more positive overall (−4.0) compared to the members of the public (expectancy: 12.7, individual risk: 34.7, benefit: −5.2, value: −19.7).

Table 1 Comparison of average assessments across the four evaluation dimensions expectancy, risk, benefit, and value for AI experts and the public

Full size table

Fig. 2
Fig. 2

The alternative text for this image may have been generated using AI.

Full size image

Box plots showing the overall average evaluations across all 71 topics for each assessment dimension, separately by sample (experts vs. public). The shaded areas represent the distribution density of the data. Asterisks indicate statistically significant differences between the two groups for each dimension (all at \(p<.001\))

4.2 Do academic AI experts and the general public differ in weighing AI risks and benefits?

Table 2 Correlation analysis of the assessment dimensions across the 71 topics for academic AI experts and the public

Full size table

Beyond differences in absolute evaluations, Table 2 reveals systematic differences in how academic AI experts and the public weigh risks and benefits across AI topics. For the public, perceived risks and benefits are strongly negatively correlated, both at the societal level (\(r=-0.711\)) and the individual level (\(r=-0.639\)), indicating a pronounced trade-off structure in which AI applications are evaluated.

Among experts, these associations are substantially weaker (societal risk: \(r=-0.368\); individual risk: \(r=-0.368\)), suggesting a more differentiated assessment in which risks and benefits are less tightly coupled.

Similar differences emerge for the relationship between expectancy and risk. While higher expected likelihood is positively associated with individual risk perceptions among the public \(r=0.212\), no such association is observed among experts. This pattern indicates that members of the public are more likely to interpret likely AI applications as personally risky, whereas experts do not systematically conflate likelihood with harm. In addition, individual and societal risk perceptions are more strongly correlated among the public (\(r=0.781\)) than among experts (\(r=0.528\)), suggesting that lay evaluations more often collapse personal and collective risk into a single dimension.

Together, these findings suggest that expert–public disagreements are driven less by opposing views on AI’s societal consequences and more by differences in how risks and benefits are cognitively structured and weighed. While both groups associate the overall value of AI negatively with perceived risks and positively with benefits, the public’s judgements appear more trade-off driven, whereas expert assessments reflect a more nuanced separation between these dimensions. Crucially, because perceived risks and benefits are themselves inversely related, their individual effects on AI evaluation are intertwined. To disentangle overlapping influences and isolate the unique contribution of each predictor within this risk–benefit trade-off, we conducted three multiple linear regression analyses. In these, we focused on variables reflecting individual-level cognitive attributions (specifically individual risks and benefits) while omitting the societal risk variable to maintain a consistent level of analysis. First, we ran separate regressions for each sample, using personal risks and benefits as independent variables and the attributed value of AI as the dependent variable. Second, we conducted a combined-sample regression,Footnote 2 including the sample identifier as a third predictor to assess group-level effects (Chow 1960). All VIFs were \(\le 1.691\), indicating no multicollinearity between the predictors.

Table 3 summarizes the results of all three significant regression models. The third regression model (right side of the table), which includes the factor distinguishing between the two samples, was significant and explained 80% of the variance in overall value (F(3,1215)=1773.106, \(p<0.001\), \(R^2 = 0.807\)). The grouping factor was significant, indicating a small but meaningful difference in the risk–benefit trade-offs between academic AI experts and the public (\(\beta =0.044\), \(p<0.006\)). Given these significantly different trade-offs, we proceed by analyzing the two samples separately in the following sections. Figure 3 illustrates both models.

Fig. 3
Fig. 3

The alternative text for this image may have been generated using AI.

Full size image

Visualization of the two significantly different multiple linear regression models for academic AI experts (left) and the public (right), including variable distributions shown as violin and boxplots. In both models, perceived risk and benefit significantly predict attributed AI value. However, the explained variance and the influence of risk are greater in the public. Line thickness reflects the strength of each predictor’s standardized effect (\(\beta\)). All predictors are significant at \(p <.001\)

For the sample of academic AI experts (left side of the table), the regression model was significant, explaining over 54% of the variance in value attributed to AI (\(F(2, 116) = 48.5\), \(p<0.001\), \(R^2 = 0.547\). The intercept was significant and negative (\(I=-0.101\)), indicating a slightly negative baseline evaluation, i.e., if an expert held neutral views on both AI risks and benefits, their overall perceived value of AI would tend to be slightly negative. Perceived AI risk had a moderate negative effect on value (\(\beta =-0.195\)), whereas AI benefit had a strong positive effect (\(\beta =0.623\)). Importantly, the influence of perceived benefits was approximately three times stronger than the influence of perceived risk (\(\times 3.19\)).

For the public (middle part of the table), the regression model was likewise significant and showed an even stronger fit, explaining 82% of the variance in overall AI value (\(F(2, 1097) = 2634.6\), \(p<0.001\), \(R^2 = 0.819\)). In contrast to the experts, the intercept was not significantly different from zero, indicating no consistent baseline tendency in AI evaluations when perceived risks and benefits are neutral. Overall value judgments were significantly influenced by both perceived risk (\(\beta =-0.361\)) and perceived benefit (\(\beta =0.703\)) with benefits having about twice the impact of risks (\(\times 1.95\)). Notably, the negative effect of perceived risk was significantly and substantially larger for the public compared to the experts, highlighting a greater sensitivity to potential downsides of AI among laypeople.

Table 3 Results from three multiple linear regressions examining the relationship between perceived AI risks and benefits (as rated by academic AI experts and the public) and overall attributed value

Full size table

4.3 Do academic AI experts and the public share the same vision for AI’s future?

As the previous analysis shows, academic AI experts and the public differ in their expectations of what AI is likely to achieve over the next 10 years (hereafter referred to as expectancy). To illustrate these differences at the topic level, Fig. 4 presents a scatter plot of the average expectancy ratings for each AI projection.

Each point in the figure represents an AI projection, with its position determined by the mean expectancy rating from academic AI experts (on the \(x\)-axis) and from the public (on the \(y\)-axis). This serves as a comparative map of perceived likelihoods for both groups: points farther to the right indicate higher expectations among experts, while points higher on the plot reflect higher expectations among members of the public. Items that lie on or near the dashed diagonal indicate agreement between the two groups: experts and the public rated the likelihood of these scenarios similarly. In contrast, points that substantially deviate from the diagonal indicate disagreement. Items positioned above the diagonal are considered more likely by the public than by experts, while those below are judged more likely by the experts. The blue regression line indicates the best linear fit (maximum likelihood estimation), and the shaded gray band around it shows its 95% confidence interval, highlighting the uncertainty range for the estimated relationship.

The figure illustrates both notable similarities and distinct differences in expectancy assessments between both groups. First, there is an overall strong and positive correlation between the two groups’ expectations, indicating substantial agreement on the perceived likelihood for many of the AI projections (\(r=0.734\), \(p<0.001\)). Second, the distribution of expectancy ratings among the public is markedly narrower than that of the experts, suggesting that experts hold a more nuanced and differentiated view of AI’s future potential. This difference in variance is also reflected in the flatter slope of the regression line. Third, despite this general agreement, several topics exhibit strong divergences between the two groups. Specifically, the public expressed higher expectations than experts for statements such as “AI will destroy humanity”, “AI will lead to personal loneliness”, and “AI can no longer be controlled by humans”. Conversely, experts reported substantially higher expectations for statements like “AI will improve our health”, “AI will prefer certain groups of people”, and “AI will become humorous”. Table 4 in the Appendix presents the overview of all items and their expectancy evaluations for both samples.

Fig. 4
Fig. 4

The alternative text for this image may have been generated using AI.

Full size image

Plot showing the expected likelihood ratings of the 71 AI projections (with experts’ mean ratings on the x-axis and the public’s mean ratings on the y-axis). The blue line represents the linear regression fit, and the gray shaded area shows its 95% confidence interval. While many topics exhibit similar evaluations across both groups, several topics reveal notable divergences (high resolution image on OSF)

4.4 Do value attributions differ between academic AI experts and the public?

Next, we examine the similarities and differences in the overall value attributed to AI (or AI sentiment) between academic AI experts and the public. In Fig. 5, we show a scatter plot of average value ratings for each AI projection, with expert ratings on the \(x\)-axis and public ratings on the \(y\)-axis. It shows the strong and positive correlation between both groups’ value assessments (\(r=0.855\), \(p<0.001\)) that is notably higher than the correlation in expectancy ratings. This suggests a greater degree of agreement of both groups regarding the value attributed to AI.

Despite this overall agreement, important differences remain. On average, experts tend to hold a more positive outlook toward AI than the public. However, the sentiment diverges substantially for specific topics. The public, for instance, expresses more negative sentiment toward statements involving existential or societal threats, such as AI destroying humanity, increasing social division, or perceiving humans as a threat. In contrast, experts are more optimistic about scenarios involving AI acting in accordance with moral values, contributing to sustainability, or supporting medical decision-making. These differences reflect not only diverging attitudes toward specific applications but also broader differences in how each group weighs potential risks and benefits.

Fig. 5
Fig. 5

The alternative text for this image may have been generated using AI.

Full size image

Scatter plot showing the value attributed to the 71 AI projections by academic experts (horizontal axis) and the public (vertical axis). The blue line represents the linear regression line; the shaded gray area denotes the 95% confidence interval. Most topics show high agreement between the groups, with only a few exhibiting notable differences in attributed value (high resolution image on OSF)

4.5 Do experts and the public differ in risk and benefit attributions?

As the overall sentiment towards the AI projections differs between academic AI experts and the public, we analyze the attributed risks and benefits for each group independently. The left side of Fig. 6 shows the risk–benefit attributions for AI experts, and the right shows these for the public.

The horizontal axis represents the perceived risk attributed to each topic, and the vertical axis represents the perceived benefit. These plots only illustrate the relationship between risks and benefits, without integrating the value attributed to the projections. A regression analysis examining the relationships among perceived risks, benefits, and overall value is presented in Sect. A.3 of the Appendix.

The points in the plots illustrate where the AI projections lay in terms of risks and benefits. Points below the horizontal axis are seen as useless, while items above are seen as useful. Similarly, items on the left are perceived as more risky, whereas items to the right are viewed as safer. Items in the top-right quadrant are perceived as both risky and useful, while those in the bottom-left quadrant are considered neither risky nor useful. Items in the top-left quadrant are seen as risky but not useful, and items in the bottom-right quadrant are evaluated as not risky but useful.

The assessments of the AI experts display greater disparity compared to those of the public, suggesting more nuanced evaluations. Experts’ risk and benefit attributions span a broader range, with a substantial proportion of topics deemed useful and a smaller proportion assessed as useless. Also, experts perceive topics as both risky and safe across the spectrum. In contrast, the public’s evaluations are more concentrated, with most topics viewed as rather risky.

Fig. 6
Fig. 6

The alternative text for this image may have been generated using AI.

Full size image

Perceived risk and utility between the experts (left) and public (right). The blue lines show the regression line and the gray area signifies the 95% CI of the regression lines

5 Discussion

AI is reshaping our world. In fact, studies suggest that AI and LLMs already alter both communication practices and professional work workflows (Geng et al. 2025; Brachman et al. 2024). In this study, we examined how perceptions of AI differ between those who are affected by its societal implementation—the general public—and those who educate practitioners, inform policymaking, conduct foundational research, and advance the technology—academic AI experts. Overall, both groups acknowledged that AI is here to stay, as the majority of queried AI projections received above-average scores for their expected likelihood of occurrence. Across the wide range of statements, participants from both groups reported individual risk scores above the scale’s midpoint. However, AI experts consistently reported greater benefits than the public. Notably, the public’s overall evaluations were more negative than those of the experts. Overall, this finding aligns with previous studies comparing expert and lay perceptions in other domains (for example (Siegrist et al. 2007) on nanotechnology), which similarly found differences in technology perception and that experts tend to perceive lower individual risks than laypeople. Given the broad range of topics in the statements—spanning from plausible to speculative—the absolute scores are less informative than the relative differences between both groups.

5.1 Diverging expectations for the future of AI

First and foremost, experts and the public diverged in their expectations regarding the likelihood of AI developments. Compared to the public, experts generally assessed most AI projections as more probable. However, variance among expert responses was also substantially higher, suggesting more differentiated and topic-specific evaluations: That is, while experts rate many topics as much more likely, they also judge some projections as markedly less probable than the public does.

It is important, though, not to overinterpret the absolute expectancy ratings, as both experts and laypeople are known to struggle with forecasting future developments (Recchia et al. 2021). Although expert predictions tend to be slightly more accurate, the differences are typically small, reflecting the difficulty of making reliable forecasts about complex and uncertain technological trajectories; even with domain expertise (Recchia et al. 2021). Nonetheless, the relative differences are informative. Expert expectations are likely to influence the direction of research, funding, and policy development, while public expectations may shape political discourse, regulatory demands, and potentially failing societal acceptance. A misalignment between these perspectives could result in mismatched expectations, premature or excessive regulation, or public resistance to otherwise beneficial AI technologies.

5.2 Differences in value perception

In terms of the overall value attributed to the AI scenarios, experts generally perceive AI as more positive compared to the public. However, their evaluations are not uniformly optimistic: Again, experts exhibit greater variance, with more extreme ratings in both positive and negative directions. This suggests that experts may approach these evaluations more critically and deliberately, likely due to their knowledge and capacity to assess the nuanced trade-offs inherent in different AI applications.

While experts acknowledge significant opportunities, they also recognize risks and limitations that may not be as apparent to the public (Russell et al. 2015). In contrast, public opinion is often shaped by simplified narratives and popular media portrayals, which tend to emphasize dystopian or sensationalist aspects of AI (Cave et al. 2019; Puzanova et al. 2024). This tendency is arguably reinforced by a “negativity crisis” within the field of AI ethics( itself (and probably beyond), where institutional incentives favor alarmist critiques over balanced scientific analysis (Königs 2025). This discrepancy highlights a (science) communication gap: For policy-makers and science communicators, this underscores the importance of addressing both the optimism of experts and the skepticism of the public. Promoting a more informed and balanced public dialogue will be essential for aligning societal expectations with technological realities and fostering trust in the development and deployment of AI systems.

5.3 Perception of risks and benefits and the formation of value judgments

For both groups, perceived risk and perceived benefit were inversely correlated, consistent with previous findings in risk perception research (Alhakami and Slovic 1994; Efendić et al. 2021). While this relationship may appear intuitive, it is not inevitable. In the context of emerging technologies, including AI, it is entirely plausible for individuals to perceive certain developments as both highly beneficial and highly risky. For example, powerful AI systems may be seen as offering substantial societal or economic benefits while simultaneously raising ethical or safety concerns. We assume that the observed inverse correlation reflects a psychological tendency toward cognitive consistency. As suggested by earlier work (Alhakami and Slovic 1994; Efendić et al. 2021), individuals may align their perceptions of risk and benefit to avoid dissonant evaluations, simplifying their judgments by resolving ambivalence.

Beyond examining absolute evaluations, we analyzed the relative weight participants assigned to perceived risks and benefits in their overall value assessment of AI. In both groups, perceived benefits exerted a stronger influence on the overall valuation of AI than perceived risks. This pattern is consistent with previous findings in other technological domains, such as perceptions of nanotechnology hazards (Siegrist et al. 2007).

In our study, the strength of the influence of perceived benefits on overall AI valuation was comparable across both groups. However, we found a notable differences for perceived risk: among experts, risk had a significantly weaker impact on overall evaluation compared to the public. One possible explanation is that experts’ greater familiarity with AI mitigates their perception of risk, reducing its influence. Conversely, the public may focus more on potential downsides, possibly due to limited knowledge, uncertainty, or exposure to more negative portrayals of AI in public discourse.

However, our analysis primarily focused on personal risk–benefit trade-offs, the strong correlations observed between societal and individual risk in both groups suggest that collective concerns play a significant role in AI valuation. As our study did not include a corresponding measure for societal benefits, we focused our regression models on the individual dimension to maintain conceptual symmetry. Future research should incorporate both personal and collective dimensions of benefits and value to determine if the “societal” vs. “individual” distinction creates a secondary trade-off axis. Investigating whether people are willing to accept personal risks for the sake of perceived societal gains—or vice versa—would provide a more granular understanding of the ethical and social weighing processes that drive AI acceptance.

5.4 Conclusion

Taken together, these findings reveal two levels of evaluative asymmetry between experts and the public: (1) in their absolute evaluations of AI, and (2) in their cognitive trade-off weighting risks and benefits. While experts tend to emphasize benefits and downplay risks, the public places relatively more weight on the potential harms. Such misalignment may pose risks for the societal uptake and governance of AI.

First, it may lead to public distrust, policy overreach, or underutilization of beneficial technologies. This underscores the need for fostering communication and negotiation across the key stakeholders: AI researchers, developers, policy-makers, and the public as users of AI. This is consistent with the EU AI Act’s emphasis on public engagement and risk-based regulation (European Union 2024; Calero Valdez et al. 2024) and the OECD AI Principles (Organisation for Economic Co-operation and Development 2019), which call for human-centered, fair, and accountable AI systems.

Second, McLuhan’s media theory states that “we shape media and then media shapes us” (McLuhan 1964; Culkin 1967)Footnote 3. As AI becomes increasingly ubiquitous, it is likely to reshape social, cultural, and communicative norms. If AI and its applications are primarily designed by academic AI experts and practitioners trained within these benefit-oriented mental models, the resulting systems may unintentionally tilt toward technical affordances rather than reflecting broader societal values.

We characterize this risk as “procrustean AI”, drawing on the Greek myth of Procrustes, who forced his guests to fit a standard bed by stretching or cutting their limbs. In a modern socio-technical context, this refers to a potential bias in the prioritization of research and the implementation of applications that implicitly privilege expert-centric perspectives. Without deliberate efforts toward participatory alignment, we risk creating systems that require the public to conform to rigid technological frameworks and expert assumptions, rather than developing technologies that are just, inclusive, and responsive to diverse human needs.

Our findings suggest the importance of transparent and participatory processes that integrate diverse perspectives into AI development, implementation, and governance. As recently argued (Shneiderman 2021; Decker et al. 2024; Calero Valdez et al. 2024), achieving responsible and trusted AI requires centering values such as equity, safety, inclusivity, and democratic responsiveness. To that end, the visual maps of differing AI perceptions can serve as an actionable tool to identify “tension points”, such as justice and political decision-making, where public and expert perspectives align and diverge. Such cognitive maps can support participatory governance, guide research agendas, and highlight areas requiring critical reflection and public discourse. Ultimately, aligning these perspectives is necessary for ensuring the legitimacy and trust required for AI systems.

6 Limitations

Several conceptual and methodological aspects warrant consideration and provide directions for future research. These include the characteristics of our measurement and sampling, the scope of the selected scenarios, the relatively small group-level effects, and the absence of qualitative insights on underlying motives. Each is discussed in the following.

First, participants evaluated brief, hypothetical future imaginary scenarios involving AI. Thus, we captured affective evaluations of mental models rather than rational assessments of potential outcomes. Nevertheless, mental models and affective evaluations serve as key human decision heuristics that shape how both experts and the public perceive technological risks and benefits (Johnson-Laird 2010; Brossard and Scheufele 2013; Gigerenzer and Brighton 2009). Affect influences not only initial judgements but also long-term trust, acceptance, and resistance, making it central to understanding adoption patterns (Slovic et al. 2007). Crucially, affect often operates below conscious awareness, subtly biasing decisions that appear rational but are driven by intuitive emotional appraisals (Lerner et al. 2015).

Second, although our AI scenarios were informed by prior research, the set of topics may still be subject to selection bias. Such bias introduces a potential risk of spurious correlations, such as Berkson’s paradoxFootnote 4 (1946). Future research may adopt more comprehensive sampling strategies and explore under-represented or emerging domains within AI. Nevertheless, the consistent associations between topic-level and individual-level evaluations suggest that the scenarios were appropriate for our analytical objectives. Furthermore, because topic selection bias does not necessarily translate into bias in individual differences, our core findings remain robust.

Third, this study is geographically and culturally quite homogeneous, as our samples were predominantly German; the identified perception gap may therefore reflect specific European values rather than a universal phenomenon. Future research should therefore broaden both samples to include AI experts and laypeople from non-European, particularly non-Western, countries to investigate how diverse cultural, political, and economic contexts influence the socio-technical perception and alignment of AI (Wang 2025). For example, using the same research design on academic convenience samples from China and Germany, a pilot study of ours found that Chinese participants were generally more optimistic and weighted risks and benefits equally. Conversely, German participants were generally more cautious, though value judgements were driven primarily by perceived benefits rather than risks (Brauner et al. 2015). Replicating this globally with representative samples would enable a nuanced understanding of AI perceptions and risk–benefit trade-offs across diverse cultural contexts. Beyond this, future research could extend this comparison to contrast academic experts, industry practitioners, and policy-makers, as these groups operate under distinct incentive structures that may further shape AI risk–benefit perceptions.

Fourth, the surveys were conducted in 2023, shortly after the public release of ChatGPT, during a period of rapidly increasing public attention to generative AI. Since then, both AI capabilities and public discourse surrounding AI have evolved quickly (e.g., with AI agents and vibe coding). Consequently, the present findings should be interpreted as reflecting perceptions during a specific phase of the recent AI development cycle rather than as a permanent assessment. Future research may replicate this design longitudinally to examine how perceptions shift over time (cf. Protzko and Schooler (2023)) and whether the expert–public perception gap changes.

Firths, despite statistically significant differences between academic AI experts and the public across all evaluation dimensions, the corresponding effect sizes were generally small. This likely reflects the broad range of topics in the survey, which may have introduced individual variance and thereby attenuated group-level effects. We posit that employing more narrowly defined or domain-specific scenarios could yield more pronounced perceptual differences between both groups.

Sixths, traditional survey responses may reflect not only participants’ genuine attitudes but also biases arising from social desirability or linguistic characteristics of questionnaire items (Gefen and Larsen 2017). In contrast, our micro-scenario approach employs reflexive measurement across a broad range of topics rather than slightly reworded items (Brauner 2024). Although this method primarily captures affective responses rather than deliberative reasoning, the consistency of the observed patterns indicates that it offers a reliable and informative lens on technology perception.

Lastly, while our study quantitatively examined perceptions of risk, benefit, and value, it did not capture the underlying motives shaping individual evaluations of AI-related issues. As AI increasingly penetrates both personal and societal domains, future research should integrate qualitative and quantitative methods to identify the factors informing these judgments. Recent work, for instance, extends the conventional risk–benefit framework by incorporating moral considerations such as perceived (dis)honesty, (un)naturalness, and (dis)accountability of AI (Eriksson and Karlsson 2025). Further, our data suggest that while personal and societal risk perceptions are correlated and influence value judgements, they are not interchangeable and may reflect different evaluative perspectives. Advancing this would support closer alignment between AI development, human needs, and societal values.

References

Download references