Food For Tech | Setting the platform scene

Author: Sjors Otten

In our previous blogs “Food For Tech | Robust and Scalable Data Solution / Part 2”  and  “Food For Thought | Stream the Dream” we talked about how we do data with the FFA-platform and the benefits of having an Event Streaming Platform (ESP). However, we are a strong believer of the “Show and Tell” principle. As we already did the tell-part we are going to do the show-part through a series of blogs by means of a use case. Showing you how we hook up a widely used ERP-application to both a streaming and batch-pipeline resulting in a near-real-time dashboard whilst preserving all data in a delta-lake ( .

In this blog we will set the scene regarding our use case, the data capability focus, and the corresponding data capability-tooling.

Our use case is as follows: A successful company X uses an opensource ERP-application and finds its reporting and analysis-capabilities not up to par. The CEO has requested the CIO to come up with a proposal for a near-real-time analytics-environment that provides him with the right insights at the right time and in the right format for the sales domain. To be more specific, when the sales-department enters a sales-order in the ERP-system (left-side of the screen) you will see the sales-order being reflected in the near-real-time dashboard (right-side of the screen) in a matter of seconds.

Keeping in mind the use case and the schematic overview of the FFA-platform of our previous blog, we will utilize it with a subset of components as depicted below. For this use case we extended the FFA-platform with data-capabilities data serving and data consumption.

We will focus on the data capabilities data in motion, data at rest, data processing, data serving, and data consumption to get the data ready for registration, serving, consumption, and presentation in a dashboard. As you can see in the overview above, we've selected a number of open-source data capability tooling for this use case. a more detailed elaboration of the open-source tooling is presented in the table below.

Odoo ERP is the go-to place for the sales-department of X. It is their bread and butter. All master-data-management- and sales-processes are facilitated in this application. Each sales-order is managed throughout the whole sales-process, from quote-entry to sales-order up until invoicing.

Apache Kafka together with Apache Spark form the backbone of our near-real-time-data pipeline. Whilst AWS S3 facilitates our data-storage. Apache Druid and Apache Superset provide us with the data serving and data-consumption capability on which we will deploy our near-real-time-dashboard.

Combining and integrating this unique set of tools enables company X to get near-real-time insights in their sales-data whilst in parallel they gradually build up a delta-lake (, thereby being ACID-compliant out-of-the-box. ACID (Atomicity, Consistency, Isolation, Durability) -compliancy is a must, whether you're using  a datalake- or a database-system. ACID is a set of properties of a transaction intended to guarantee validity of the transaction and data belonging to it. It helps us in audit-processes, data recovery, and time travelling through our data.

In the following blogs we will deep dive, from both a business and tech perspective, into the components presented here. We will look into it step-by-step, starting with hooking up the Odoo ERP system to Kafka and populating the delta-lake in near-real-time with raw data. Later in our blog-series we will focus on Apache Kafka and Apache Spark combined with delta-lake to show how we deliver a near-real-time-pipeline and create a data serving layer for analytics. Which in its turn will be used as the foundation for Company X' dashboard. Hopefully we have triggered your interest to see how we use the FFA-platform to deliver a Scalable, Flexible and Robust solution that supports the requirements of the CEO.

Feel free to reach out in case you have any questions! Stay tuned for more and follow us on LinkedIn to be updated automatically.