Introduction to Data Analytics Terminology

The world of data analytics can be confusing with so many terms, acronyms and technical processes to understand.

Whether you’re familiar with a transactional database and new to cloud data warehousing or just beginning your data journey, kleene can help.

The kleene introduction to data analytics provides a dictionary of all the key terms you need to know in order to understand the data analytics process and infrastructure.

Data analytics infrastructure

Source – Any system used by a business, for example CRM, finance system, marketing tool, spreadsheets, advertising platforms and customer service tools.

Data Pipelines – Technology or code used to move data from one place to another and ensure it’s usable either operationally or analytical.

ETL or ELT tool – The ELT or ETL tool(s) enable the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source – through applying context and business logic.

Connector – Connectors are the components of an ETL/ELT tool that establish connections to data sources, including databases and applications, building data pipelines and enabling extraction and loading.

Data Lake – Central repository for all data types, including structured, semi-structured, unstructured. An example of a typical data lake used by businesses is AWS S3.

Data Warehouse – The secure, electronic housing of data by a business or organisation, integral to the data infrastructure providing a central repository for structured data. The warehouse is the library of historical data. All data in the warehouse can be retrieved and analysed to inform decision-making in the business.

Cloud Data Warehouse – Specifically designed for larger data volumes and analytics, cloud data warehouses provide on-demand scalability and cost efficiency. A cloud data warehouse has no physical hardware and are typically ‘column store’. Popular examples include Snowflake and Redshift.

Visualisation Tool – Home of graphs, charts, dashboards and analysis. Also known as a BI tool, some of the most popular tools include Tableau, Looker and PowerBI.

Transactional Database – Traditional row store database technology such as MySQL and PostgreSQL. Transactional databases are designed for lower data volumes, which are updated at higher speed. Usually the backend of websites and eCommerce tools.

Data analytics process

Extract – Extracting data from source systems. That is to say taking relevant data out of applications and databases.

Load – Loading extracted data into a central repository – either a data lake or data warehouse.

Transform – Applying data query language (e.g SQL, Python) to data sets, in order to make it usable. This can include removing duplicates, cleaning and connecting data tables and data sets.

ELT – The acronym for the extract, load, transform process, which is the modern paradigm for data pipelining. The ELT process brings all data into a single repository, where it is transformed using the power of the warehouse. This is kleene’s method.

ETL – The traditional and more common paradigm for data pipelining, in which data is extracted, transformed and then loaded. This method transforms the data before loading into the warehouse, which can result in the loss of data.

The basics of data

Data Silos – Data that is kept in multiple source systems which aren’t speaking to each other. This can cause multiple problems for a business and prevents a true view of the business. A data warehouse brings all business data together in order to remove data silos.

Batch Load – Moving large volumes of data in one go, so it can be used for analytical purposes, for example moving a whole 24 hours of data in one data load. This type of data movement is kleene’s speciality.

Real-time or Event Data – An action is taken on the CRM or any other business tool and immediately passed to a new system for use in client communications, to ensure product stability etc.

Operational Data Usage – This includes typical business as usual uses of data. For example – passing information between HubSpot (CRM) and Zendesk (Customer Service), so that it can be used operationally. This often takes place in real-time.

Analytical Data Usage – Data used for reporting, dashboarding and BI, driving value from data back into the business. Usually ‘batch load’.

Codes & languages

SQL – Programming language specifically designed for querying data.

Python – Programming language designed for querying data and general coding.

Query – Code written to explore, combine and cleanse data.

Data professionals

Analyst – Data professional tasked with answering business questions through data, primarily using SQL.

Engineer – Data professional tasked with building and maintaining data pipelines, managing data integrity and security, primarily using SQL, Python, Spark, Go.

Scientist – Data professional specialising in building machine learning and AI models primarily through Python.

Want to find out more? Get in touch to start your data journey.