Pros and Cons of Multi-Step Data Platforms

In the modern world, it's rare to have the data in the same shape and platform from the beginning till the end of its journey. Yes, some technologies can achieve quite a good range of functionalities but sometimes at the expense of precision, developer experience, or performance. Therefore, to achieve better or faster results, people might select a new tool for a precise task and start an implementation and integration process to move the data around.

This blog post highlights the pros and cons of a "one shoe fits all approach," where one platform is used for all the use cases vs. the "best tool for the job," where various tools and integrations are used to fulfill the requirements.

The Initial Data Touchpoint

To discuss data evolution, it's necessary to define its origin. Most of the time, the data is originated by some sort of application facing an internal or external audience and storing the transactions in a backend datastore. It's common for the backend to be a database (relational or not) that can immediately store and validate the data or a streaming service (like Apache Kafka) that can forward the transaction to the relevant parties.

No matter which solution is chosen, at this point of the data journey, the backend datastore must provide excellent performances in both the ability to write and read a single transaction since the application will mainly work at a single click/purchase/order level.

Evolving With Time

But, as for the intro, data rarely stays in place. On top of the transactional needs, new data stakeholders appear in the company: people in charge of the inventory are not interested in the single order movement but more in the overall daily trends; executives need to have a vision week by week of defined KPIs; selling agents want an immediate "maybe you should offer this" kinda functionality when talking to prospects.

All the above needs are very different from the original "transactional" task and require data storage techniques, functionalities, and APIs that must be adapted and customized for the single functionality and use-case to perform best.

Yes, there are tools that can perform a wide range of functions, but is it always suggested to stick with them?

The "Downloading to Excel" Problem

I've been in the Data and Analytics space for 15 years, and for 10 of them, I've been building centralized BI systems to provide a "unique source of truth" to companies.

For the same amount of years, I've been trying to battle against the "download to excel" practice. Whenever a certain functionality/option/calculation wasn't available in the tool of choice, people downloaded the entire dataset and recreating the analytical process in Excel. This, on top of the security chills, was causing a plethora of different ways to calculate the same KPI, therefore defeating the "unique source of truth" mission.

How to avoid that? By having frequent feedback loops about desired functionalities or outcomes and trying to build them into the main product. But, sometimes, the centralized BI efforts were not fast, accurate, or flexible enough for the end user; therefore, the leakage of data continued.

Later in my career, I realized that the main problem was trying to force everyone to use THE BI tool I was overlooking, which wasn't fitting all the stakeholder's needs. Therefore, it was important to include in the process that the feedback wasn't only about what to build next and what other tooling and integrations were needed to achieve the goal best. In other words, one centralized team on one platform can't scale, but a centralized team empowering others to use their favorite tool via trusted integration can!

The Modern Era of Business Units as IT Purchasers

Nowadays, a lot of the spending is not governed by central IT. Every division in a company is able to purchase, test, deploy and manage their tool of choice to achieve their goals. When data is in the mix, this again represents the "Excel" problem, but only with nicer UIs. Therefore it's very important that data teams act as enablers allowing business units to act, improve, analyze, and evolve over time by providing accurate and up-to-date data.

I feel that the times of "one tech fits all" are over… but what are we going to gain or miss?

Pros of a Single "One Tech Fits All" Solution

The main pros of a single "one tech fits all" solution were:

Cons of the "One Tech Fits All" Solution

On the other side, the "one tech fits all" solution had also the following limitations:

Pros of the "Best Tool for the Job" Solution

When choosing the best tool for the job after a proper evaluation phase, you have the following pros:

Cons of the "Best Tool for the Job" Solution

On the other side, when now multiple tools are part of the data chain, you could face the following problems:

What Lessons Can We Take From the "One Tech Fits All" to the "Best Tool for the Job" Solutions?

We somehow defined that the "best tool for the job" solution is the unavoidable choice nowadays, but has its own limits. How can we mitigate them with the lessons taken by the "one tech fits all" solution?

Summary

As a data team, embrace the "best tool for the job." Your role is to provide best-in-class, fast, monitorable, and observable integrations between systems.

 

 

 

 

Top