Embedded & Irreversible: Why AI Safety Requires Pre-Training Intervention

Feb 13

AI safety debates often focus on existential risks and deployment guardrails. But there's a more immediate threat already unfolding: the incremental erosion of human agency through unexamined data extraction practices that transfer power to algorithmic control. My participation in the BlueDot Impact AGI Strategy and interdisciplinary practice in human-centered, systems design have convinced me we're missing the highest-leverage intervention point.

AI policy debates focus heavily on model deployment— how algorithms make decisions, who they impact, and whether outcomes are fair. But by the time a model is deployed, the most consequential decisions have already been made. The training data has been collected, often through workplace surveillance and user tracking. The patterns have been learned and embedded into billions of parameters. And here's the critical problem: once harmful patterns are embedded into foundation models, they become nearly impossible to audit or remove. Policymakers are trying to regulate the outputs while ignoring the inputs. If we're serious about preventing AI from systematically eroding human agency, we need to intervene upstream at the point where data is curated for pre-training, before the damage becomes irreversible.

Every day, billions of people generate data through searches, purchases, health app usage, social media interactions, and 'smart' device behaviors. This data is collected, aggregated, and used to train AI models, often without meaningful consent or understanding of downstream use. These models then shape what we see, what we buy, how we're assessed for loans or insurance, and, increasingly, what opportunities we're offered. Meanwhile, the same data productization logic applied internally has driven unprecedented waves of mass layoffs in the last few years, as companies train AI on workplace surveillance data to automate the very workers who generated it.

This dual strategy of productizing consumer data for personalization and worker data for automation serves ROIs that maintain stock prices and satisfy shareholders, while systematically transferring human agency to algorithmic control. The question isn't whether AI will reshape society. It's whether that reshaping will preserve human dignity or optimize it away. And critically, this transfer happens gradually enough that most people don't recognize it as disempowerment until the patterns are already locked in both in the foundation models that shape their options and in the societal expectations of how technology mediates their lives. The result is a growing sense of inhumanity, disconnection, disillusionment, and disempowerment— a diffuse awareness that something fundamental has shifted, even if the mechanism remains invisible.

This transfer of agency doesn't happen overnight. It is similar to a 'boiling frog' scenario. First came helpful recommendations on shopping sites—harmless, even delightful. Then social media feeds optimized for engagement, clicks, and endless scrolling. Then health apps tracking our sleep, steps, and heart rates. Then 'smart' assistants in our connected homes. Each step felt incremental, often beneficial, making our lives more predictable and managed. Combined with workplace surveillance in productivity monitoring from door entry to keystrokes, AI-optimized scheduling, and automated performance reviews, where these practices create a comprehensive data extraction infrastructure. You are simultaneously the data source and the target of optimization, whether you like it or not. Your consumer behavior trains models that shape your purchasing decisions and curate the information you see, all in the name of personalization. Your work behavior trains models that automate your role. In both cases, data extraction occurs without meaningful consent, with your behavior used as training material for systems designed to reduce your autonomy.

We need to consider data curation as part of the architectural foundation. Once patterns are deeply embedded in foundation models, they're nearly impossible to audit or remove. This irreversibility is not a bug, it's a feature of how deep learning works. Harmful patterns don't sit in a single location that can be identified and extracted. They're distributed across the model's architecture, embedded in the relationships between billions of parameters.

When a model learns from consumer data that certain demographics correlate with 'risky' financial behavior, or from workplace data that productivity correlates with constant availability, those associations become part of the model's fundamental logic. Post-deployment audits can identify discriminatory outputs, but they cannot trace those outputs back to specific training examples or excise the underlying patterns. The model has learned. The damage is structural. This makes pre-training the highest-leverage intervention point, the last moment when we can prevent harm from becoming permanent.

As a human-centered systems practitioner and educator, I see an opportunity to intervene at precisely this chokepoint: the pre-training data curation stage. This intervention requires collaboration between model engineers and HCD practitioners to engage stakeholders: consumers, workers, patients, community members in decisions about how their data is collected, contextualized, and used as training material.

Human-centered data curation requires a fundamental shift in how training data is collected and documented. This means participatory data audits in which community representatives: consumers, workers, patients review how their data is being used before models are trained. It means contextual metadata that preserves information about consent conditions, power dynamics, and use limitations. It means red-teaming with affected communities before deploying models in high-stakes domains like healthcare, finance, employment, and education. And it requires ongoing reassessment to ensure training data continues to align with human dignity and societal values as contexts evolve.

Implementing this at scale requires policy support. Companies should be required to provide data-provenance documentation showing stakeholder consultation. Independent data curation review boards, similar to Institutional Review Boards in research, could evaluate training datasets before deployment. Liability frameworks should hold companies accountable for harms traceable to biased or exploitative training data. While this approach requires upfront investment in stakeholder engagement and transparency, the cost is trivial compared to the societal consequences of deploying AI systems that systematically erode human agency at scale.

The AI policy community must expand its focus beyond deployment governance. Pre-training data curation deserves equal attention in regulatory frameworks, industry standards, and academic research. The AI policy community must expand its focus beyond deployment governance. Pre-training data curation deserves equal attention in regulatory frameworks, industry standards, and academic research. This intervention requires collaboration across disciplines and sectors. If your organization is deploying AI systems trained on human behavioral data, this is the moment to intervene before the patterns are locked in. The foundation is still being built.

Nga Nguyen

Embedded & Irreversible: Why AI Safety Requires Pre-Training Intervention

VisionFwd

New York +++

Embedded & Irreversible: Why AI Safety Requires Pre-Training Intervention

The Human Experience in Data

VisionFwd

New York +++