AI Productivity Is Not Output

Andrea Viliotti
7 giorni fa
Tempo di lettura: 13 min

It Is Verified Value

In many companies, generative AI has already increased the amount of material produced. A marketing team can generate twice as many campaign drafts. A sales office can prepare more emails. A technical department can produce more documentation. A customer service team can respond faster.

Then someone has to check.

Someone has to verify whether the claim is true, whether the tone fits the brand, whether the promise can be defended, whether the data are correct, whether the answer helps the customer, whether the output can be signed, sent, integrated, sold, audited, or trusted.

The output has increased. The value has not necessarily followed.

The debate on generative AI has a measurement problem. We keep asking whether AI helps people produce “more”: more text, more code, more slides, more summaries, more answers, more variations. But companies do not compete on the amount of output they can generate. They compete on what survives verification.

A better decision. A better-served customer. An avoided error. A transferable practice. A shorter operating cycle. A human capability released from repetitive work.

This is the managerial paradox of AI: the technology improves many tasks before it improves the firm.

At the task level, the effects can be rapid. At the enterprise level, value emerges only when the organization can absorb the technology: data, workflows, training, validation roles, metrics, governance, and accountability.

AI does not reward the company that produces more. It rewards the company that builds the system capable of turning production into value.

1. Why AI Adoption Numbers Seem to Contradict One Another

Part of the confusion starts with adoption numbers. Some sources suggest that AI is almost everywhere. Others show much lower percentages. These figures are not necessarily inconsistent; they often measure different things.

The Stanford AI Index 2026 reports that organizational AI adoption reached 88%, and that generative AI is used in at least one business function by 70% of organizations. That is a broad measure of organizational diffusion, not proof that work has already been deeply transformed. [7]

The OECD measures AI use at the firm level across covered countries. In 2025, 20.2% of firms reported using AI; the figure was 52.0% among large firms and 17.4% among small firms. That is a narrower denominator, closer to firm-level adoption capacity. [8]

McKinsey’s 2025 Global Survey adds another layer. It reports that 88% of respondents say their organizations use AI in at least one business function, but nearly two-thirds have not yet begun scaling AI across the enterprise, and only 39% report EBIT impact at the enterprise level. [9]

An NBER working paper based on almost 6,000 CFOs, CEOs, and executives across the United States, the United Kingdom, Germany, and Australia finds that around 70% of firms actively use AI, but more than 80% report no impact on either employment or productivity over the past three years. Executives expect larger effects over the next three years, which makes the gap between use and realized value even more important. [10]

The point is crucial: usage, adoption, scaling, and economic value are not the same thing.

A company can buy licenses without changing workflows. An employee can use AI without the organization learning from it. A department can improve a task without that value reaching margins, revenue, customer experience, or quality.

The question is no longer: “Who is using AI?” The question is: which AI use becomes verified value?

2. The AI Productivity J-Curve

The lag between AI use and measurable enterprise value is not an anomaly. It is often the normal shape of a general-purpose technology entering organizations.

Brynjolfsson, Rock, and Syverson describe a “Productivity J-Curve”: in the early phase of a general-purpose technology, organizations invest in intangible complements—software, processes, skills, redesign, coordination—before the benefits appear in measured productivity. In other words, inputs rise before outputs become visible. [11]

This is a critical managerial lens. Many boards still treat AI as an immediate cost compressor. But in the first phase, AI often increases invisible work: deciding where to embed it, what data it needs, which workflows must change, who validates outputs, which errors are tolerable, and which risks are not.

The paradox is not that AI does not work. The paradox is that AI works only when the organization is redesigned enough to let it work.

At first, the firm may see more activity before it sees more value: more tools, more pilots, more meetings, more policies, more training, more reviews. If these inputs remain fragmented, they become cost. If they converge into new workflows, they become organizational capital.

AI does not create transformation by itself. It requires co-invention: technology and organization changing together.

3. From Task Gains to Enterprise Value

The best evidence does not tell one simple story. It tells several stories at different levels.

A longitudinal study by Tomaz and colleagues observed three agile software teams for roughly thirteen months inside a large technology consulting firm. After the introduction of tools such as GitHub Copilot and internal GPT systems, team performance and perceived efficiency increased while raw activity—commits and lines of code—remained broadly flat. The authors describe this as an increase in “value density”: more value per unit of activity, not simply more work volume. [1]

The boundary matters: three teams, software development, agile environment, specific company context. This is not a universal law. But it is a powerful lens. If companies measure AI only through activity counters, they may miss where value is actually being created.

In customer support, Brynjolfsson, Li, and Raymond study the rollout of a conversational AI assistant across 5,172 agents. AI access increased average productivity by 15%, measured as issues resolved per hour. The most important finding, however, is heterogeneity: less experienced and lower-skilled workers benefited more, while the most experienced workers saw smaller speed gains and some quality risks. [4]

This shifts the discussion. AI is not only automation. It is also practice transfer. It can make patterns, formulations, procedures, and micro-decisions once embedded in top performers more available to everyone else.

A randomized experiment by Cruces and co-authors reinforces this point. In a general business problem-solving task involving 1,174 adults, AI increased performance for both education groups, but more for participants with lower formal education. The gap between higher- and lower-education participants fell from 0.548 to 0.139 standard deviations, closing roughly three quarters of the initial gap in the AI-assisted task. Yet the authors also stress that human capital does not disappear: differences remain in prompting quality and in follow-up performance without AI assistance. [3]

The managerial lesson is clear: AI can raise the floor of execution, but it does not remove the ceiling of judgment.

It can help less experienced workers produce better work. It cannot replace the ability to frame the problem, select the right criterion, interpret context, validate the output, and assume responsibility.

The right metric therefore does not evaluate the machine alone, nor the human alone. It evaluates the human-machine system. An AI assistant is not valuable because it generates a plausible answer. It is valuable if it improves the performance of the team that uses it: final quality, speed, control, accountability, and the ability to correct direction.

Weak companies use AI to multiply drafts. Strong companies use AI to transfer excellent practices.

4. The Invisible Labor of AI

Every AI output carries a hidden cost. Someone must give it context, correct it, verify it, integrate it, and take responsibility for it.

Glean’s Work AI Index 2026 calls this work “botsitting”: the labor required to make AI usable, including feeding it missing context, checking outputs, debugging mistakes, rerunning prompts, and cleaning up confident but wrong answers. The report estimates that workers spend an average of 6.4 hours a week on botsitting, and that 36% of AI sessions fail outright, requiring restart or substantial rework. [12]

This is not a side issue. Validation cost is the new managerial work of AI.

If a draft produced in thirty seconds requires twenty minutes of review, the value is not in the speed of the draft. It is in the net balance between acceleration, quality, risk, and accountability. If a tool generates ten alternatives and a manager must read them all, output has increased but attention may have decreased. If AI reduces writing time but increases rework, apparent productivity becomes an illusion.

AI does not eliminate work. It relocates work.

From production to validation. From execution to supervision. From speed to trust.

5. Adoption Is Not Transformation

The word “adoption” is dangerous because it sounds like an event: first we did not use AI, then we did. But transformation is not an event. It is absorption.

In From Exposure to Adoption, Golo Henseke analyzes approximately 36,600 workers across 35 European countries. Occupational exposure to generative AI predicts adoption, but it does not determine it. Skills, abstract task content, employee influence within the organization, digitalization, and training all shape the exposure-to-adoption gradient. The study also finds no clearly detectable effect yet of early adoption on worker-reported task restructuring, consistent with an initial integration phase. [2]

A BIS/EIB working paper on more than 12,000 European firms brings the same issue to the firm level. AI adoption is associated with a 4% increase in labor productivity and no adverse firm-level employment effects in the short run. But the benefits are concentrated in medium and large firms and depend on complementary investments in software, data, and training. [5]

This is the sentence many AI strategies do not want to hear: the AI budget cannot be only a license budget.

Licenses without data create fragmentation. Prompts without workflows create exceptions. Outputs without review create risk. Training without metrics creates unmeasured enthusiasm. Governance without process creates bureaucracy.

A mature AI habitat is not made only of tools. It is made of strategy, human capital, operating model, technology, data, and scaling capability. If one dimension accelerates while the others remain still, AI does not scale. It fragments.

AI becomes productive when it enters a system capable of metabolizing it.

6. Work Does Not First Disappear. It Is Re-Coded into Tasks

The opposite of productivity hype is employment panic. Here too, the evidence requires discipline.

Baslandze and co-authors survey nearly 750 corporate executives and document positive productivity effects and expectations of stronger gains, but also a productivity paradox: perceived productivity gains are larger than revenue-based productivity gains already visible in measured outcomes. On labor, they find little evidence of near-term aggregate employment declines, but they do find compositional reallocation: routine clerical roles decline, while skilled technical roles gain relative importance. [6]

Humlum and Vestergaard, linking adoption surveys to administrative labor records in Denmark, estimate precise null effects on earnings and recorded hours two years after chatbot adoption, while documenting task restructuring and occupational switching. [13]

Acemoglu brings the same caution to macroeconomics. In a task-based model of AI, aggregate effects depend on how many tasks are actually transformed and how large the average savings are. His benchmark calculation implies that total factor productivity effects over the next ten years should be no more than 0.66% in total, with smaller estimates when harder-to-automate tasks are considered. [14]

The conclusion is not that AI does not matter. It is more precise: AI changes the content of work before it changes the aggregate employment statistics.

Work is not first destroyed or saved. It is first re-coded into tasks.

Some tasks lose value. Others gain value. Some roles become more technical. Some skills move from production to validation, from execution to supervision, from speed to judgment.

The managerial issue is not to predict one destiny for labor. It is to redesign where work creates value.

7. AI Value Density: The Missing Metric

We need a more demanding metric than apparent productivity. I call it AI Value Density.

AI Value Density = verified value generated by AI-assisted work ÷ the total cost of production, validation, coordination, and risk

This is not a universal accounting formula. It is a managerial discipline. Its purpose is to prevent companies from confusing output with value.

An AI use case increases value density when the improvement it generates exceeds its hidden costs: review, rework, control, integration, coordination, operational risk, reputational risk, legal risk, loss of expertise, or dependence on outputs people do not understand.

A minimum scorecard should include eight dimensions:

Dimension	Managerial question	Practical indicator
Result value	What actually improves?	Quality, margin, service, decision quality, avoided error
Validation cost	How much human work is required to trust the output?	Review time, corrections, escalations
Rework	Does AI reduce or increase work that must be redone?	Revisions, reopened tickets, bugs, disputes
Cycle time	Does the process really become shorter?	Lead time, handoffs, bottlenecks
Practice transfer	Does individual use become collective capability?	Playbooks, validated templates, learning curve
Human-AI performance	Does the assisted system outperform the unassisted workflow?	Final quality, avoided errors, better decisions
Organizational absorption	Is the firm ready to scale?	Data, training, workflow, security, governance
Economic realization	Does the improvement reach business results?	Revenue, margin, retention, avoided cost, revenue-based productivity where measurable

This scorecard is not designed to give AI a cosmetic score. It is designed to decide what to scale and what to stop.

The Board Question

Do not ask: how much AI are we using?

Ask: what share of AI-assisted work becomes verified value after validation, rework, coordination, and risk?

The Stop Rule

An AI use case is not validated when it works in a demo. It is validated when it survives the workflow: real data, real users, real errors, real accountability, real control cost.

A use case scales only if it improves a relevant outcome, reduces or does not increase rework, has a clear owner, has a validation procedure, and converts saved time into measurable value.

The rule is simple: do not scale what produces more output. Scale what produces more value after validation.

8. What a Board Should Do

A serious AI strategy does not start with the model. It starts with the places where value can be converted.

Take a hypothetical case: customer operations. A company introduces an AI assistant to help agents respond faster to customers.

The weak metric says: more responses generated, more tickets handled, less time to first draft.

The right metric asks different questions: did reopened tickets fall? Did escalations decrease? Did customer satisfaction improve? Did new agents learn faster? Are senior experts spending less time correcting basic errors and more time improving the knowledge base? Are the best answers becoming reusable playbooks, or remaining individual tricks?

AI can create value here in three different ways.

The first is assisted drafting: AI proposes, the agent validates. Value increases only if the time saved exceeds the cost of review and does not raise the ticket reopening rate.

The second is practice transfer: the best responses become reusable standards. Value increases if new agents learn faster and quality converges toward the level of the best performers.

The third is process redesign: AI helps identify recurring problems, knowledge-base gaps, and steps that generate escalation. Here the value is not only answering faster; it is reducing the cause of tickets.

That is the difference between using AI and transforming work.

In the first 30 days, the CEO should not ask for a list of tools. The CEO should ask for a map of three high-friction processes: customer operations, software delivery, finance planning, procurement, HR onboarding, or compliance documentation. For each process, the baseline should measure time, quality, rework, handoffs, accountability, exceptions, review burden, and value for the internal or external customer.

Between days 30 and 60, each process should become an AI playbook: standard prompts, validated examples, acceptance criteria, limits, escalation paths, responsibilities, error logs, reusable templates, and review roles.

Between days 60 and 90, the company should scale only what increases value density: less rework, better quality, shorter cycles, time redirected toward higher-value work, transferable learning, and better decisions.

After 90 days, AI should no longer be treated only as software spend. It should be budgeted across four chapters: tools, data, process, and human capital. The CFO should see not only license cost, but validation cost, integration cost, training cost, and risk cost.

AI does not really enter an org chart. It enters a habitat.

The worker sees saved time. The expert sees quality and exceptions. The manager sees coordination. The CFO sees cost and return. HR sees skills and roles. The CIO and CTO see data and integration. Legal and risk see accountability. The customer sees only one thing: whether the value is real.

If these observers do not speak to one another, AI increases noise. If they are coordinated, AI increases value density.

What We Still Do Not Know

We still do not know how much of today’s task-level gains will transfer stably into revenue. We do not know how long competitive advantages will last once everyone has similar tools. We do not know how much invisible supervision will emerge when AI agents and assistants become embedded in critical workflows. We do not know who will capture the value: workers, firms, customers, capital, or platforms.

But we know enough to stop measuring AI the wrong way.

The evidence says that AI can improve performance in specific contexts. It can help those who begin farther from the frontier in certain tasks. It requires skills, training, autonomy, data, and organization. It rewards firms that invest in complements. It changes the content of work before it changes aggregate employment numbers.

For entrepreneurs, CEOs, and boards, this is not paralyzing caution. It is strategy.

Do not chase AI as a fashion. Do not use it as a shortcut to produce more material. Do not measure it only through output and hours saved. Do not delegate to technology the work that belongs to management.

The company that wins will not be the one that generates more content.

It will be the one that turns more content into verified value.

AI does not transform the firm when it answers faster.

It transforms the firm when it forces management to decide what deserves trust.

Notes and Sources

Evidence maturity note: This article intentionally separates published field evidence, working papers, institutional statistics, executive surveys, and operator/vendor signals. Source types are listed to preserve the boundary between evidence and managerial interpretation.

[1] Tomaz et al., “Impacts of Generative AI on Agile Teams’ Productivity: A Multi-Case Longitudinal Study.” Preprint / longitudinal multi-case study on three agile teams; used for value density, the SPACE framework, and the distinction between raw activity and delivered value.

[2] Golo Henseke, “From Exposure to Adoption: Generative AI in European Workplaces.” Preprint / EWCS 2024 analysis of approximately 36,600 workers across 35 European countries; used for exposure, adoption, skill, training, and absorptive capacity.

[3] Guillermo Cruces et al., “Does Generative AI Narrow Education-Based Productivity Gaps? Evidence from a Randomized Experiment.” NBER Working Paper 34851; used for partial compression of execution gaps and the persistence of human capital. Source link

[4] Erik Brynjolfsson, Danielle Li, and Lindsey Raymond, “Generative AI at Work.” Quarterly Journal of Economics, 2025; field study on 5,172 customer-support agents; used for productivity, heterogeneity, and practice transfer. Source link

[5] Iñaki Aldasoro et al., “AI adoption, productivity and employment: evidence from European firms.” BIS Working Paper No. 1325, 2026; used for firm-level productivity, short-run employment, and the role of software, data, and training. Source link

[6] Salomé Baslandze et al., “Artificial Intelligence, Productivity, and the Workforce: Evidence from Corporate Executives.” NBER Working Paper 34984; used for the productivity paradox, revenue-based productivity, and workforce composition. Source link

[7] Stanford HAI, AI Index Report 2026. Institutional context on organizational AI adoption and use of generative AI in business functions. Source link

[8] OECD, “AI use by individuals surges across the OECD as adoption by firms continues to expand.” Institutional data on firm-level AI use, differences by firm size, and sector patterns. Source link

[9] McKinsey & Company, “The state of AI in 2025: Agents, innovation, and transformation.” Managerial survey context on AI usage, scaling, EBIT impact, and workflow redesign. Source link

[10] Yotzov et al., “Firm Data on AI.” NBER Working Paper 34836; used for the gap between active AI use and reported impact on productivity and employment. Source link

[11] Erik Brynjolfsson, Daniel Rock, and Chad Syverson, “Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics.” NBER Working Paper 25148; used for the productivity J-curve. Source link

[12] Glean Work AI Index 2026. Operator/vendor signal used for the concept of botsitting and the invisible labor of validation. Source link

[13] Anders Humlum and Emilie Vestergaard, “Large Language Models, Small Labor Market Effects.” NBER Working Paper 33777; used for caution on short-run earnings, hours, and task restructuring. Source link

[14] Daron Acemoglu, “The Simple Macroeconomics of AI.” NBER Working Paper 32487; used as a task-based macroeconomic counterweight to automatic productivity-boom narratives. Source link