Migrating from a transactional database to a cloud data warehouse for analysis

At kleene.ai we see a lot of companies using a transactional database for reporting purposes. This can work for some companies, for a certain amount of time, but in reality, it just isn’t scalable. The migration from a transactional database to a cloud data warehouse will transform your reporting and analysis, fast tracking growth. 

Starting out with a transactional database 

Whether MySQL, PostgreSQL or SQL Server, it’s common for the data team to query the production database directly for reporting and analysis. However, as data volumes and query complexity increases, data teams tend to move to a read only replica to take load off the main production database, and provide some controls on access and accidental writes. 

We believe in moving fast at kleene. But we don’t believe in over-engineering. A transactional database for reporting is great if:

  1. You have a very small data set.
  2. You’re not worried about merging other data sets (e.g CRM, Google Analytics, ERP, Finance etc.) with your production data.

Taking it a step further

Occasionally, teams go further and stand up a separate database, loading data to this from various sources. This creates a data warehouse with data sources loaded in and materialised transforms happening. Typically, this is only embarked on by data teams that are comfortable managing the database, or have a DBA in the tech team who is able to support the extra system. Hosted options like RDS from Amazon make this less of a concern. It’s also a much better option than trying to join data sets in your reporting platform.

However, if you have a setup like this you may find that you’re:

  1. Struggling with query times.
  2. Considering upgrading your database to cope with the reporting query load.
  3. Spending time on optimisation to deal with the fundamental limitations of the database.

We often see confusion at this point. If the database is good enough for big production workloads, why is it struggling with our reporting requirements?

Time for a change

It’s at this point that teams often start to look at other options. Cloud data warehouses like Redshift from Amazon, BigQuery from Google, Synapse from Microsoft and non-cloud specific options like Snowflake have been around for a while and generated a lot of buzz, but can seem daunting (but they’re not!). Migrating to a completely different database technology is fraught with unknowns (but it shouldn’t be!). 

Surely this will take a long time and require new technology skills? Plus, if these systems are so good at processing massive data sets, they must be super expensive?

Overcoming the hurdles with a cloud based data warehouse

While all of these solutions vary, there are some major commonalities between them, addressing the user key concerns.

Firstly, cost. With most of these warehouses charging by compute, it can be hard to figure out how much you will use. Fears of accidentally using a large amount, resulting in high bills are common. However, with a column store architecture, cloud warehouses are optimised to process less data during analytical workloads. Combined with high levels of compression enabled by columnar storage, users see huge speed enhancements on big queries and find that they use far less compute and storage than expected. 

Secondly, the technical capabilities required. Whatever flavour of SQL you are used to, PLSQL, MSSQL, mySQL, warehouses are set up to be tolerant of it. Often your queries work without any changes, or the differences are so slight that within a few weeks you feel comfortable. These are managed to a large extent, so worries about backing up data, or downtime and recovering are gone, allowing you to focus on adding value.

The last major concern teams have is how to get it all set up. Moving data from many different data sources into the warehouse is perceived as a huge task. Ensuring your data is modelled into a clean, useful form and reliably collaborating on updates and figuring out when source data has changed are key to setting up a robust data ecosystem.

While this used to involve a lot of engineering – bolting together multiple systems and requiring custom built tech – new cloud based software solutions make it simple.

Cloud data warehouse wins

Teams that make the shift from a transactional database to a cloud data warehouse experience the power of the modern cloud warehouse. Analysis at the speed of thought, rapid iteration on data modelling, collaborative building and exploring. Plus stakeholders are no longer bottlenecked by technology.

kleene.ai has been helping customers move to a modern data stack using the latest technology to ensure it’s fast and easy to get answers to business problems from your data. We provide a software layer which can get you up and running in a cloud data warehouse at the pace that startups need to move at – weeks not months. Talk to us to find out how.

Book a demo today to see the impact we could have for your scale-up.