Business Data Explorer with infinite layers of description, bound to source data without ETL.

Category Theory based Business Data Explorer to democratise access to data locked in databases, with the freedom to define, extend, compose, prove data descriptions by and for business users.

Sep 29, 2023

Summary: If the title sounds like magic, it is, Mathematical magic! Access to data, with confidence, is a pressing business, without going through the tech team is long due for unavailability of apt tools. Sure there are tools which need tech intervention to surface metadata but what if there were a tool that just maps to any databases/data sources’ metadata and allows any user to EXTEND the metadata in any dimension, without disintegrating the source data? What is this tool can capture the business metadata continuously, as business events are unfolding, while also closely related to the hard truths of technical data? This document describes the problem, gaps in prior art/existing solutions, the promise of the solution that leverages the strengths of Category Theoretical approach to this.

Category Theoretical Business Data Explorer – a Universal, Tech-Free Business Data explorer for enterprises: A platform to provide business users ‘business view of data’ that is complete, accurate, faithful, and connected to the source data.

Problem:

“Increasingly, business users are starving for data, but are forced to make important decisions without access to data that could guide them in the right direction” (HBR, Forrester).

The need for data-driven decision making, or at least data-influenced decision making is more pressing now than ever. But for all the talk about data-driven decision making, at all levels of enterprise hierarchy, it has never taken off for lack of enabling factors mentioned below. There has never been a proven technology that can give the confidence of systemic data sanctity to decision makers, to address the below impediments.

Impediments to data-driven decision making:

Business Users (Business Unit or Profit Centre users) / Functional users (e.g. Marketing, Procurement, Supply Chain) do not / can not know what data exists within their organizations, not even their own domain data. (unless they go to Data/IT teams, of course)
Data sources are not structured for business consumption, too technical and lack business context.
Hence their exposure is only limited to curated, pre-purposed data meant for specific use cases.
These users, being the Subject Matter Experts, know the best on how to structure data that is true representation of business operations but cannot do the same due to the heavily technical tooling.
Instead, they must go through intermediaries (IT / Data teams) adding significant delays to accessing data, not practical for at-scale data driven decisions/actions.
Such IT driven views of data sources are purpose-built for specific uses/users – BI/Reports etc. These are not extensible, re-useable or composable.
Business operations are alive, continuous, and changing, which need a robust model of business that is representative of the dynamics. Current rigid, purpose-built data models are not equipped to capture the true dynamics of business operations.
The challenge is not data modelling, data labelling or data cataloguing but to enable business users to see what data and relationships exists and to explore what is possible, “strictly within constraints of source data and relationships”.

The intensity of this challenge is summed up by HBR (Data Driven Culture, 2020) “By far the most common complaint we hear is that people in different parts of a business struggle to obtain even the most basic data.”

Now with the CQL product built on robust mathematical (Category Theoretic) base, it is possible to address these challenges for the first time and even catalyse the movement towards data-driven decision making at organization-wide scale.

Current State:

Existing systems address the need inadequately.

Data Mesh: Like everything that starts with and depends on ETL/ELTs, Data Mesh’s, if they become practical at scale whenever in future, are disconnected from source data. They are not reusable, composable and cannot provide data lineage, cannot provide assurance of mathematical guarantee to data fidelity - that the Mesh faithfully represents source data and metadata when changes occur, at the pace of change. Which means cannot really by used for time-sensitive business decisions.

Enterprise Knowledge Graph: Several tools offer this but the Graphs are built on extracted data materialized to specific (graph) data models, which definitely is an improvement over traditional, relational rigid data models but does not connect to source data models, not faithful to data relationships and cannot be a ‘live’ model of business operations, especially for legacy enterprise environments that are heavy on relational databases. EKG also are not composable, and mathematically cannot guarantee the data sanctity.

Ad Hoc Reporting/Self-Service BI tools: These applications are limited to purpose-designed specific data models. The ‘ad hoc’ nature is limited to the boundaries of the data model created by the IT/Data teams, leaving out other data sources. The data models created in such tools are not ‘composable’, restricting reusability. Such data models are also locked in underlying proprietary languages hence cannot be translated, ported, or mapped without elaborate conversions.

Data Cataloguing Tools: These tools are meant for labelling / cataloguing, for enhancing descriptive metadata, mostly from business perspective. But they do not touch the data model aspects and hence are not a representation of a business operational view of data sources at all.

ETL / EAI Systems: Development is designed to be individualistic, rather than collaborative, hence the purpose-specific models developed by analysts and the knowledge to perform the ETL is locked in the heads of the individuals data engineers. The model itself is not reusable, composable or portable. Too technical for business users.

There does not exist an approach or a system that can guarantee true representation of data and relation sanctity mathematically, across every record in every data source.

Solution:

Businesses have all the critical data they need but lack the technology necessary to explore – i.e. draw and pay attention to the right data, at the right time, for the right reasons. A structured view of organizational data that is complete, faithful, and connected to sources is the missing catalyst for democratising data usage. This by itself will not deliver data-driven decisions but is the most important and enabling step towards the same.

The solution is an enterprise-scale SaaS platform that allows creation of a true and robust representation of organizations’ data that:

- is Faithful to the data sources, relationships, and constraints.

- is Complete, represents all business-critical data sources.

- Guarantees sanctity of relationships mathematically.

- is a live reflection of the underlying data structure changes, connected to data sources, as live as the business dynamics.

- is reusable, extensible, shareable, and composable (incrementally extensible).

The product has 3 components - Schema Translator that translates source schemas into Categorical Query Language schema; Universal Integrated Schema that represents a unified Data Model representing the most optimal, robust data model for a given organizations that is complete, faithful, and connected to the data sources. This is arrived at collaboratively by the Subject Matter Experts aligning the model to represent the business operations accurately and technical users capturing the data related constraints hidden in the data sources. This is done iteratively to converge on the Universal Integrated Schema to a high degree of confidence in representing the underlying data structures and business operations accurately, leveraging Categorical Databases’ native strengths of theorem proving, constraint integrity etc.

Once this level of confidence in data sanctity is established, business users can see all data and relationships that exist in their organizations, create their business/domain specific models leveraging the inherent flexibility but “strictly within constraints of source data and relationships established in the Universal Integrated Schema”, with the assurance of robust data and constraint sanctity, never possible before.

This opens a different magnitude of business usage of data. Each business unit, each functional unit is unique and captures a lot of context that forms the first step towards data driven decisions at scale. Such context can be modelled uniquely for each Business Unit / Functional Unit, representing a live model of one’s BU/FU that will be valid and current, unlike the existing approaches of creating overlapping models for each project/BU/FU separately that is not reusable or portable. The Business User/Functional User UIs enable data exploration within their modelled context or universal (organizational) context and facilitate creation of new models or allow to compose over existing models without the fear of data and relationship infidelity.

Potential Impact:

Several studies show the extent of this problem and the potential impact of a solution that can enable data democratization for decision making:

Companies analyze only 12% of the available data. On average, between 60% and 73% of all data within an enterprise goes unused for analytics. (Forrester)
Only 3% of business professional were able to act on all of the customer data they collect. (Harvard Business Review Study)
Highly data-driven organizations are 3X more likely to report significant improvements in decision-making compared to those who rely less on data. Global Data and Analytics Survey, “Big Decisions™,” PwC, 2016; Think With Google; HBS, 2020, Data Driven Decision Making.
“The most critical practice is to have employees consistently use data as a basis for their decision making.. nearly 1.5 times more likely to report revenue growth of at least 10 percent... points to the importance of getting data into the hands of decision makers” Why Data Culture Matters - The Democratization of Data, McKinsey, 2018
Various reports peg the business impact of utilization of data in a broad range of business value - $190 Bn – $ 2 Tn, across industry segments.

Conclusion and Extensions Possible:

Democratizing data is the first step. It enables a lot more, like the use cases below and more. Current brute force approaches, ignoring the elegant, proven and more effective mathematical approaches, will take ages, if at all, to achieve the below.

Setting up internal / external data marketplace / data portals
Ad-hoc addition of new types of information sources to dashboards / KPIs without a line of code
Custom business questions (GenAI :-) ) platform aiding discovery of hidden relationships.
Accelerated value from Data lake investments across multiple business units – find where that data lives in your data lake – without code or complex queries.
Collaborative mapping with canonical meaning of business data - catalyst in triggering network dependent technologies like Blockchain etc.,

Note: The above concepts, applications, user experience examples are protected by Intellectual Property Rights owned by Surendra Kancherla.

Surendra Kancherla’s Substack

Discussion about this post