When AI Sets Wages: Pricing Logic, Embedded Bias, and Strategic Implications
The Rise of Generative AI in Labor Markets
Generative AI is no longer a sideshow in labor markets. Instead, it has become increasingly central to how freelance platforms recommend hourly rates, match workers to tasks, and shape pricing expectations. In this context, a recent study titled When AI Sets Wages: Biases and Labor Discrimination in Generative Pricing offers a detailed examination of how large language models (LLMs) behave when tasked with wage-setting. Importantly, the findings are empirical, not ideological. They reveal a system that, while often recommending higher wages than human benchmarks, also introduces significant disparities based on geography and age. Some of these issues can be corrected through prompt design, and some of them appear structurally embedded.
Experimental Design and Scope
The McGill University researchers, Maxime C. Cohen, Eddy Hage-Youssef, and Warut Khern-am-nuai from, conducted a series of controlled experiments using LLMs to generate wage recommendations for standardized freelance tasks. These tasks were presented with varying demographic and geographic attributes, allowing the team to isolate how the models responded to different inputs. The results were then compared to human-generated benchmarks, offering a clear view of how AI pricing logic diverges from human judgment.
Findings: Wage Inflation and Systematic Disparities
On average, LLMs recommended hourly rates between $30.72 and $45.77, compared to a human benchmark of $23.60. This inflation effect was consistent across tasks, suggesting that generative models may be reshaping perceived market value. However, the models also introduced substantial geographic and age-based disparities. Geographic wage gaps ranged from 19.5 percent to 130.4 percent, meaning that workers in some regions were assigned rates more than double those in others for identical tasks. Age premiums reached up to 45.97 percent, with older workers consistently receiving higher wage recommendations than younger ones. These disparities were not random. They were systematic.
Prompt Engineering and Bias Correction
The team then tested whether these disparities could be mitigated through prompt engineering. By rephrasing the input prompts by removing location indicators, emphasizing fairness, or explicitly instructing the model to ignore demographic variables, they were able to reduce geographic bias. This suggests that the model’s pricing logic is highly sensitive to framing. However, age-related disparities persisted even under strong corrective instructions. This implies that age bias is not merely a function of prompt design but is deeply embedded in the model’s training data.
There are some important caveats to the study, however.
With respect to geography, the researchers tested wage recommendations across a wide range of global locations, including cities in North America, Europe, Asia, and Africa. The disparities were substantial. For identical tasks, LLMs recommended significantly different wages depending on the location of the worker. These differences were not explained by local cost-of-living adjustments or market norms; they were generated by the model’s internal logic, which appears to reflect patterns in its training data rather than explicit economic reasoning. The study did not attempt to validate whether these recommendations aligned with actual wage differentials between, say, New York and Islamabad; it focused on the model’s behavior, not external labor market data.
On age, the researchers introduced age as a variable in the prompts and found that older workers consistently received higher wage recommendations than younger ones. However, the model did not justify these premiums based on experience or tenure. In fact, even when the prompt explicitly instructed the model to ignore age or assume equal experience, the disparity persisted. This suggests that the model associates age with higher value by default, likely due to correlations in its training data. The study did not test for actual experience levels or skill profiles, it isolated age as a standalone input and observed the model’s response.
In short, the study revealed that LLMs encode and reproduce wage disparities based on geography and age, but they do so without automatically grounding those differences in economic rationale or individual qualifications. The disparities are therefore statistical artifacts, not calibrated assessments.
Strategic Implications for AI Deployment
These findings have direct implications for how generative AI is deployed in labor markets. First, they challenge the assumption that AI pricing is neutral or objectively correct for a given context. The models reflect statistical patterns learned from training data, which may include historical inaccurcies, regional stereotypes, or demographic assumptions. Second, they highlight the importance of prompt design as a governance tool. If geographic bias can be reduced through input framing, then platforms can use structured templates to guide model behavior. Third, they underscore the limitations of prompt engineering. Some biases and assumptions, particularly those related to age, may require post-training interventions, such as reinforcement learning or fine-tuning, to correct.
Balancing Efficiency and Risk
The broader strategic question is whether generative AI can streamline labor markets without introducing new forms of distortion. On the efficiency side, AI-driven pricing offers speed, scalability, and consistency. It can reduce friction, help workers set competitive rates, and allow platforms to match supply and demand more effectively. On the risk side, it can encode and amplify disparities caused by inaccuracy, creating feedback loops that disadvantage certain groups or regions. The challenge is not to eliminate AI from labor pricing, but to govern it intelligently.
Operational Recommendations
One approach is to maintain human oversight. While AI can generate initial recommendations, human reviewers can audit outputs, flag anomalies, and apply contextual judgment. Another approach is to increase transparency. Freelancers should be informed when wage recommendations are AI-generated and given tools to challenge or adjust them. This preserves agency and reduces reliance on opaque systems. A third approach is to audit models regularly. Developers of AI systems should test outputs across diverse profiles, scrutinize training data for demographic skew, and implement correction mechanisms where necessary.
Market-Level Consequences
The real-world impact of these dynamics is immediate. Wage recommendations influence how freelancers position themselves, which jobs they pursue, and how clients perceive their value. If models systematically and inaccurately undervalue younger workers or those in certain regions, it can distort competition and reduce access to opportunity. For platforms, these dynamics affect user trust, retention, and market efficiency. For policymakers, they raise questions about economic competitiveness, algorithmic fairness, disclosure requirements, and audit standards.
The study also opens several avenues for future research. One is to extend the analysis to other demographic variables, such as gender or race. Another is to test across multiple LLM architectures, to determine whether the observed disparities are model-specific or systemic. A third is to explore longitudinal effects. If AI pricing becomes dominant, how will it shape labor market outcomes over time? Will disparities compound, stabilize, or self-correct?
Conclusion: Governing the System, Not Rejecting It
What emerges from this research is not a moral indictment, but a strategic insight. Generative AI is a powerful tool, but its outputs are shaped by the data it was trained on and the prompts it receives. This means that bias is not inevitable, but vigilance should be. By designing better prompts, auditing model behavior, and keeping humans in the loop, stakeholders can guide AI toward more consistent and rational outcomes.
The researchers behind When AI Sets Wages have provided a framework for understanding how generative pricing works, where it fails, and how it can be improved. Their work is not a call for ideological reform, but for operational discipline. As AI continues to shape the future of work, the path forward is not to reject it, but to govern it, strategically, transparently, and with a clear understanding of its mechanisms. A plain-language summary of their findings is available in Harvard Business Review under the title “What Happens When AI Sets Wages”.



