blogs

How to choose a data stack for your business: a five-stage framework

June 18, 2026
— min read
Henry Owen
Product Marketing Manger
icon

Almost every company chooses its data stack backwards. They start with the tools, usually because someone used one at a previous job or a vendor demo landed well, and then they spend the next year finding out whether those tools fit the business. Sometimes it works. More often it produces a stack that's technically impressive and commercially useless, or a reporting tool the company outgrew before the contract ran out.

The fix is almost embarrassingly simple: make the decisions first, pick the tools last. Call it the questions-first stack, as opposed to the demo-first stack most companies accidentally build. This guide sets out the five stages we walk companies through before any vendor conversation happens. It's written for SMB and mid-market leaders who need a data platform for analytics rather than a science project, and by the end you'll have a framework you can hold any tool against, ours included.

__wf_reserved_inherit
An overview of the five stage framework for choosing a data stack, for SMBs

Stage 1: write down the questions you need answered

Not data questions. Business questions. Before anyone says the words warehouse or pipeline, get the leadership team to list the ten decisions they'd make differently if they had reliable answers. What's our customer acquisition cost by channel, really? Which products should we discontinue? How much stock do we need for Q4? Which customers are about to churn, and what's that worth?

This list does more work than anything else in the process. It defines scope, because every source you connect and every model you build should trace back to a question on it. It exposes the gap between what people say they want (a dashboard) and what they actually want (to stop arguing about whose numbers are right). And it hands you the acceptance test for the whole project: twelve months from now, can the business answer these questions without a spreadsheet excavation?

One discipline makes this list far more useful. Mark each question as descriptive, diagnostic, or predictive. "What did we sell last month" is descriptive, and almost any tool answers it. "Why did margin drop in March" is diagnostic, and it needs joined-up data across systems. "What will demand look like in November" is predictive, and that's where AI analytics earns its place, because answering it well means models, not just charts. The mix on your list tells you what class of platform you're actually shopping for, and most companies find their list skews more predictive than their current tooling can handle.

Stage 2: audit what you have and who you have

Two inventories, both short. The first is your data sources. List every system holding data someone uses to make decisions: the ecommerce platform, the CRM, the finance system, the marketing channels, the operational tools. For each, note roughly how much data it holds, how fast it changes, and who owns the login. Most SMBs land between five and fifteen sources, and the count matters, because connector numbers drive pricing on nearly every platform you'll evaluate.

The second inventory is your people, and this one decides more than the technology does, so be honest. Do you have anyone who writes SQL? Anyone who's run a data pipeline in production? Anyone whose actual job, not their stretch project, will be keeping this alive? For most companies under a few hundred people the honest answer is "no, not really," and that's fine. It just rules out a whole category of tooling that assumes a data team exists, and ruling things out early is the cheapest filtering you'll ever do.

This is also where the ETL versus ELT question turns up, and it's worth demystifying, because vendors make it sound more consequential than it is for a buyer. ETL for analytics transforms data before loading it into your warehouse; ELT for analytics loads raw data first and transforms it inside the warehouse. Modern stacks have mostly settled on ELT because warehouses got cheap and powerful. As a buyer, the question isn't which acronym a vendor uses. It's who writes and maintains the transformations, because that's either your team's time or a service you're paying for. (If you're choosing an ingestion tool at this stage, our Airbyte alternatives guide runs this same who-owns-it test across the main options.)

Stage 3: budget for the stack, not the tool

This is where most evaluations come apart, because the number on a vendor's pricing page is rarely the number that hits your P&L.

A working data stack has five layers: ingestion (getting data out of your sources), transformation (making it usable), a warehouse (storing and computing it), BI (dashboards and reporting), and increasingly an AI layer (forecasting, segmentation, anything predictive from Stage 1). Some tools cover one layer, some cover two or three, and very few cover all five. Whatever a vendor quotes, the budgeting question is always the same: what does the complete stack cost, and how does that cost behave when my data grows?

That second clause is the one that bites, because of how this market now charges. Most platforms moved to consumption pricing, metering rows, credits, compute, or capacity, which means your cost is a function of your data volume rather than your contract. We priced every major platform's full stack line by line for a standardized mid-market configuration, and the spread is the headline: identical requirements land anywhere from about £41,000 to £95,000 a year depending purely on architecture. Same data, same five users, more than double the cost at one end versus the other. So budget a range, not a number, unless you're looking at a flat-fee model where the number is the number.

One line nobody puts in the budget: operating cost. A multi-tool stack needs a human to keep it running, and whether that's a hire, a consultant, or a slice of an existing engineer's week, it's real money that belongs in the comparison. It's also the line that tends to decide build versus buy, which is Stage 4.

Stage 4: decide how much of this you want to own

With the questions, the inventories, and the budget in hand, you reach the actual strategic decision, and it isn't which tool. It's how much of the data function you want to build versus buy.

Owning it means assembling best-of-breed tools, hiring or assigning people to run them, and keeping full control of every layer. That's the right call for companies where data is the product, or where an experienced data team already exists and wants specific tooling. It's the wrong call when the plan secretly depends on hiring your first data engineer to babysit five vendor relationships, because you'll spend year one building plumbing instead of answering the Stage 1 questions.

Buying it means a managed or bundled platform where someone else runs the layers and your team consumes the outputs. You trade some control for speed and predictability, and the evaluation question becomes whether the platform's opinions match your needs. The AI data management platforms in this category deserve a hard test against Stage 1: can they actually answer your predictive questions, or do they stop at dashboards and leave the forecasting to a tool you haven't bought yet?

Most SMBs are best served somewhere on the buy side of that line, not because building is bad, but because the questions on their list are worth more than the control they'd be paying to keep. Here's the test we give people: ask what the business would notice if your data function vanished for two weeks. If the answer is missing insights, that's worth building toward. If the answer is missing maintenance, you're about to build a cost center and call it a capability.

Stage 5: evaluate tools against the framework, not against each other

Only now do vendors enter the room, and the work you've done changes every conversation. Instead of watching demos and comparing feature grids, you're holding each tool against your own criteria: can it connect my actual sources, can it answer my actual questions including the predictive ones, does its pricing behave acceptably at my growth rate, and does it fit the build-versus-buy call I've already made?

A few evaluation habits earn their keep here. Ask every vendor what your specific configuration costs at your current volume and at ten times your volume, in writing, because the gap between those two numbers is the pricing model showing its true character. Ask who does the implementation, and what happens when a pipeline breaks at month seven. And ask to see your own data in the platform before signing, not sample data, because connector quality varies wildly and you want to find the rough edges during evaluation, not after kickoff.

The month-seven question matters more than it sounds. When Huel needed a custom PayPal connector, the difference that mattered wasn't a feature on a grid, it was getting it built in about two weeks instead of the months they'd been quoted, on a reconciliation problem that was eating 58 FTE-days a month and, once solved, saved over £100k a year. That's the kind of thing you only surface by asking what implementation and support actually look like when something specific to your business breaks.

When you're ready to shortlist, we reviewed the leading options in our guide to the best AI data platforms in 2026, covering the full field from ingestion specialists to end-to-end platforms. Kleene's in there, and so are the tools we lose deals to, because a shortlist built on someone else's blind spots isn't one worth trusting.

The framework in one place

Write down the business questions and tag each one descriptive, diagnostic, or predictive. Inventory your sources and your people, honestly. Budget the full five-layer stack and stress-test how each pricing model behaves as your data grows. Decide whether you're building a data function or buying one. Then, and only then, evaluate tools against your framework, with your own data.

Companies that run this sequence choose differently from companies that start with demos, and they tend to still like their choice two years later, which is the only review that counts.

And because we'd rather be useful than just be picked: if you want to talk through where your business sits in this framework, talk to our team. We'll give you an honest read, including the cases where the right answer is that you don't need us yet.

start your journey

Power your data with AI

Join leading businesses with modern data stacks who trust Kleene.ai
icon

Take a quick look inside Kleene.ai app

Watch a product walkthrough and see how Kleene ingests your data, builds pipelines, and powers reporting – all in one place.
icon