The Data Market in 2026: an industry splitting in two directions

May 5, 2026

— min read

In 2026 the AI data platform market is dividing into two camps. The first group, which includes Fivetran, AWS Glue, Microsoft Fabric, and Weld, is positioning as the trusted data foundation that AI runs on top of. Their pitch is about reliability, governance, and being the plumbing that makes AI possible. The second group, which includes Matillion, Boomi, and Databricks, has moved AI agents to the front of the product and the homepage. Their pitch is about outcomes: virtual data engineers, 75,000 agents in production, decisions made automatically. But they appeal to different buyers, require different proof points, and create very different competitive dynamics for mid-market companies trying to figure out which vendor to trust with their data stack.

‍

Why everything changed at once

AI arrived into the data industry fast and every product team in the market had the same conversation within about six months of each other: our product moves and stores data, and AI needs data, so we are now essential to AI.

That logic is not wrong, but the interesting part is what each vendor decided to do with it. Some chose to stay close to what they were already good at and frame it as AI infrastructure. Others decided to rebuild their product narrative from scratch around agents and autonomous work.

The result is a market where the messaging has diverged significantly even among companies that are functionally still doing a lot of the same things. Understanding which camp a vendor is in matters more than reading their feature list, because the camp tells you what they are optimizing for, where their roadmap is pointed, and whether they are selling to your data team or your executive team.

‍

‍

Camp 1: "We are the data foundation for AI"

These vendors are making a trust and reliability argument. The pitch is that AI is only as good as the data underneath it, and they are that layer. It is a defensible position because it is true, and it maps cleanly onto the concerns of data engineers and CTOs who have lived through enough hype cycles to be skeptical of anything that promises to replace their team.

Fivetran has rebranded its core message from "automated data movement for analytics" to "the data foundation for AI, automated data for autonomous agents." The product itself has not changed dramatically. It is still 700+ pre-built connectors moving data from SaaS tools, databases, and ERPs into warehouses and data lakes. But every proof point on the homepage is now framed around AI enablement, all of it positioned as what becomes possible when your data foundation is solid. The message is: we are what makes your AI investment not fail.

AWS Glue has made a smaller version of the same move. The service is increasingly presented as part of Amazon SageMaker's unified studio rather than a standalone ETL product, which is a meaningful reframe. Glue becomes the serverless data integration layer inside a broader AI development environment rather than a tool you configure separately. Generative AI assistance for pipeline authoring and code generation is getting more prominent placement, but the headline has stayed stable: discover, prepare, and integrate all your data at any scale. AI is a feature layered on top of a reliable infrastructure story.

Microsoft Fabric has moved from "unified analytics platform" to "AI transformation platform" over the past several months, with Fabric IQ and Copilot now the dominant narrative. The underlying product is OneLake, a unified storage layer that lets every Fabric workload (data engineering, science, warehousing, real-time analytics, Power BI) work from the same data without duplication. The AI story sits on top of that: Copilot is woven through every workload, and Fabric IQ integrates with Microsoft's Foundry to share unified context for AI-driven decisions. The data estate is still the hero. AI is positioned as what the data estate unlocks.

Weld has started layering agentic workflow language and LLM tooling around dbt onto an otherwise classic ELT, modeling, and reverse-ETL product. AI is appearing in developer workflow content rather than as a repositioning of the platform itself. Weld is the most honest member of this camp in the sense that it is not overclaiming: agentic workflows are mentioned as a capability, not as the identity of the product.

What all four have in common is that the buyer they are primarily talking to is someone who cares about reliability, governance, and not having their pipelines break too often. The AI language is a response to budget and boardroom pressure more than a genuine product transformation in most cases.

‍

“Most AI vendors are supposedly scaling by moving away from human engagement. We are doing the opposite. The technical work, getting data clean, building a model, and visualizing the answer, is commoditizing fast. What can’t is knowing what the context is for the questions, whether the answer is true and what to do when it contradicts what the team expected. Trust sits with people, and to that end our analysts sit alongside our clients, making AI useful and delivering real ROI, not just impressive in a demo.” - Paul Coggins, CEO at Kleene.ai

‍

Camp 2: "AI agents are working on your data right now"

These vendors have done something pretty structurally different. They have moved AI agents to the front and demoted the underlying platform to supporting cast. The pitch is "AI is already working here, and you should want that."

Matillion now leads with Maia, described as an agentic AI "data engineering buddy" that builds pipelines from plain-language prompts. The messaging shift over the last six months has been significant: from "cloud-native ETL for Snowflake" to "AI-built-in data integration" with virtual data engineers as the selling concept. The underlying product is a mature, well-regarded ETL platform with pushdown architecture into Snowflake, Databricks, and AWS. The target buyer has broadened: Matillion is now explicitly talking to data leaders worried about ROI and scaling.

Boomi has gone furthest. It has literally renamed itself "The Data Activation Company" and opened with "Activate AI. 75,000+ agents already in production." Agentstudio, Boomi's no-code platform for building, managing, governing, and orchestrating AI agents with observability built in, is now the product lead. The original iPaaS (integration platform as a service) capability, which is what most people still actually buy Boomi for, is reframed as the substrate that makes data activation possible.

Databricks has been on the most visible journey of any vendor in this space. The positioning has moved from "lakehouse for data and AI" to "Data Intelligence Platform" and most recently to "the database your AI agents deserve." Lakebase, a serverless Postgres layer for production apps and agents, and Genie, a natural-language analytics product positioned as "BI built on AI from the start," are the new lead concepts. The warehouse and lakehouse story that Databricks spent years building is now described as supporting infrastructure. The headline product is the thing your AI agents run on and the interface through which anyone in the business can ask questions without writing SQL. Genie specifically is a direct challenge to every standalone BI vendor in the market, and it is notable that Databricks is pursuing that challenge from the data platform layer rather than from the visualization layer.

Kleene.ai started as a data platform focused on ELT: getting data from source systems into a clean warehouse without heavy engineering overhead. That was the story for several years. KAI Assistant now handles SQL generation, pipeline debugging, and schema inspection conversationally. The KAI Analytics Suite sits above it: AI models covering media mix modeling, segmentation, demand forecasting, attribution, and inventory management, running directly on warehouse data and producing outputs a business can act on without a data scientist in the loop. The more meaningful shift is the underlying bet. Where most platforms are racing toward self-serve scale, Kleene.ai has moved the other way, pairing the platform with analytics consultants embedded in each client engagement. The technology fuels the advisory. The advisory makes the technology worth having.

‍

Why the consultancy layer matters too

Expanding a product with AI is genuinely fast now. The technical barriers that used to protect incumbents have come down. A well-resourced team can build a reasonably capable analytics model in weeks. Most of the vendors in both camps have done exactly that.

What is harder to replicate is knowing whether the model is telling you something true and useful about a specific business, and what to do with the output when it disagrees with what the team expected to see. This is why Kleene.ai pairs the platform with an expert consultancy team embedded in each client engagement.

"The intelligence underneath is Kleene's: proprietary models, built and refined over years, tuned to each client's data, their domain, and with the human understanding of context that no general-purpose model has access to. The client's data stays the client's, and the value stays inside their business.” - Paul Coggins, CEO at Kleene.ai

The vendors moving fast into the agent camp are, in most cases, moving away from this kind of engagement model. For businesses that need someone who understands both the model and the business context to make the output trustworthy, it leaves a space that Kleene.ai is looking to fill.

‍

What this split means for the industry in the future

What the split actually reveals is that modern data infrastructure has two distinct jobs that need to work together. The first is a reliable data foundation: clean pipelines, governed data, a single source of truth that does not break when a source system changes its schema. The second is a conversational and agentic layer on top: the interface through which people across the business can actually access that data, ask questions, build reports, and surface insights without writing SQL or waiting for the data team.

Both layers matter. Neither one works without the other. A reliable warehouse with no usable interface is an expensive engineering project. An AI agent built on inconsistent data produces confident answers that happen to be wrong. The problem is that most businesses evaluating data vendors in 2026 are being sold one layer when they need both, and the marketing from both camps makes it genuinely difficult to tell what you are actually buying.

The more useful question before evaluating any vendor is what your business actually needs from its data stack right now. That starts with an honest audit of where things currently stand: whether your data is trustworthy enough to build on, whether the people who need insights are actually getting them, and where the real gaps are. Check out our guide on how to audit your data stack if you want to start this process.

‍

Where this leaves mid-market buyers

If you are a growing business evaluating data platforms in 2026, the two-camp framing is a useful filter but it does not make the decision for you.

The questions worth asking of any vendor are: does the AI capability in their product change what my team can do today, or is it on the roadmap? Is the "agent" a virtual data engineer who handles pipeline work, or is it something that generates business decisions from my data? And critically: what happens after the data is ready? Which vendors stop at data readiness, and which ones take you all the way to a decision?

The market is moving fast enough that vendor positioning in early 2026 may look quite different by Q4. What is clear is that the companies that win the next buying cycle will be the ones that can answer "what did your AI actually do for the business last quarter?" with something more specific than a connector count.

‍

Could there be a third camp?

Gartner has predicted that a quarter of CDAO vision statements will shift from "data-driven" to "decision-centric" by 2028. The research consistently finds that decision intelligence is where the marginal return on analytics investment is highest.

‍