Food For Tech | Hooking up Odoo ERP to Apache Kafka
Author: Sjors Otten
In our previous blog “Food for Tech | Setting the platform scene” we presented the use case for Company X and how the FFA DEMO PLATFORM can fulfill this specific use case and many more.
In this blog we will take the first step into our data-journey: hooking up source-application ‘Odoo ERP’ to the data-in-motion-component ‘Apache Kafka’ and write the raw data in (near)-real-time to the data-storage-component ‘AWS S3’. Check out the final result in the short clip below.
The remainder of this blog describes in detail how we managed to get to the final result as shown in the short clip above. For this, we will expand on the data capabilities data register, data in motion, data storage. In this blog we will use the subset of the FFA-DATA-PLATFORM as depicted in the figure below.
As you can see in the overview above, the following open-source data capability tooling will be leveraged to deliver the first part of Company X’s use case:
Where components Data Register and Data Storage have not changed per our previous blog, there is a change in component Data In Motion. If we take a closer look at the schematic overview we see the following Data In Motion-components.
Odoo ERP is the go-to-place for the sales-department of X. It holds all the company’s master- and sales-data in several database-tables. The Kafka-connect-SOURCE-connector listens for events on the database-tables’ transaction log and writes it to the respective Kafka-topic. Subsequently, the Kafka-connect-SINK-connector is subscribed to each Kafka-topic and writes each event to AWS S3 as a JSON-file in near-real-time. Apache Kafka will be hooked up to the following Odoo ERP-database-tables.
Based on the above we can distinguish between two main dataflows:
- Data Register -> Data In Motion
- Data In Motion -> Data Storage
The remainder of this blog will consist of (1) getting the FFA-demo-platform up-and-running, (2) realizing dataflow ‘Data Register -> Data In Motion’ and finally (3) realizing dataflow ‘Data In Motion -> Data Storage’.
Spinning up the FFA-DEMO-PLATFORM
The FFA-DEMO-PLATFORM is completely developed in docker-containers. For demonstration purposes we use standard docker-images provided by Odoo ERP, Apache Kafka, and LocalStack. Please find an excerpt of FFA DEMO PLATFORM-docker-compose-definition-file below:
To start the FFA-DEMO-PLATFORM we execute the following command:
“docker-compose -f docker-compose-demo-step1.yml up -d”
Resulting in the following output:
This means that the following docker-containers are brought up:
Based on the aforementioned table we can conclude that each docker-container has its own role. Simply put, it adheres to the microservice-architecture.
Data Register -> Data In Motion
So, we’ve brought up the FFA-DEMO-PLATFORM and it’s running. Great! However, data is not yet flowing from platform-component ‘Data Register’ into platform-component ‘Data In Motion’ as depicted in the figure below.
To get data flowing, we need to take the following actions:
Create Kafka connect-SOURCE-connector
Having the source of the data (Odoo ERP) and destination of the data (S3://local-raw) setup is great. However, nothing is happening yet. To get things going it is required to create the Kafka-connect-SOURCE-connector. Let’s create a ‘Kafka connect-connector-definition’ in the following way.
For reference-documentation on parameters used please see http://debezium.io
The Kafka-connect-connector-definition is stored as ‘odoodemo-src-register-postgres-schemaregistry-avro.json’ in docker-container ‘ffa-demo-platform_kafkaconnect-server_1’, which we will use to deploy the connector.
Deploy Kafka connect-SOURCE-connector
Although having defined the Kafka-connect-SOURCE-connector, there is still no data flowing between platform-components ‘Data Register’ and ‘Data In Motion’. Let’s make that happen by executing the following command:
The output shows a successfully deployed Kafka connect-SOURCE-connector named ‘postgres-connector’ in docker-container ‘ffa-demo-platform_kafkaconnect-server_1’.
Validate if the data is landing into Kafka-topics
So, the Kafka-connect-SOURCE-connector has been successfully deployed. This implies that the Kafka-connect-SOURCE-connecter has successfully connected to platform-component ‘Data Register’ , and created 6 Kafka-topics in platform-component ‘Data In Motion’, for each ODOO table one topic. And above all it is populating these topics with data from corresponding source-tables.
To validate if the newly created contain actual events from the tables we will validate if Kafka-topic ‘dbserver1.public.product_product’ has data. Therefore, we need to execute the following command:
The output shows that we have 34 events in Kafka-topic ‘dbserver1.public.product_product’, which corresponds to the number of records in source-table ‘public.product_product’ of platform-component ‘Data Register’.
Great! We have brought up the FFA-DEMO-PLATFORM, configured it, and validated that data is flowing in (near) real-time between platform-component ‘Data Register’ and ‘Data In Motion’. Now let’s store them in S3.
Data In Motion -> Data Storage
Per the use case it is required to push the data from platform-component ‘Data In Motion’ to platform-component ‘Data Storage’ in (near) real-time, as depicted in the figure below.
To get this running we need to take the following actions:
Create Kafka connect-SINK-connector
To enable the dataflow between platform-components ‘Data In Motion’ and ‘Data Storage’ the power of Kafka connect is leveraged once again. To get this done let’s create a ‘Kafka-connect-SINK-connector-definition’.
For reference-documentation on parameters used please see https://docs.confluent.io/kafka-connect-s3-sink/current/overview.html
The Kafka-connect-SINK-connector-definition is stored as ‘odoodemo-dst-s3sink-schemaregistry-avro.json’ in docker-container ‘ffa-demo-plaform_kafkaconnect-server_1’.
Deploy Kafka connect-SINK-connector
The Kafka-connect-SINK-connector is deployed by executing the following command:
As the output shows, the Kafka-connect-SINK-connector named ‘odoodemo-dst-s3-sink’ has been successfully deployed in docker-container ‘ffa-demo-platform_kafkaconnect-server_1’.
Validate if the kafka-topic-events are written to AWS S3
To validate if the Kafka-topics contain the Kafka-events (JSON-files) the following command is executed:
The output shows that the Kafka-events are present in AWS S3-bucket ‘s3://local-raw/’ as JSON-files This means the events are flowing correctly between platform-components ‘Data In Motion’ and ‘Data Storage’. Secondly, there are 34 JSON-files (0 through 33), which corresponds to the number of events and records in the respective Kafka-topic and source-table!
This concludes the 2nd blog in our blog-series ‘Stream the Dream’. To summarize, this blog demonstrates how to set up a (near) real-time dataflow in the FFA-DEMO-PLATFORM between platform-components ‘Data Register’ and ‘Data Storage’ by leveraging the power of platform-component ‘Data In Motion’.
In the next blog of this blog-series the focus will be on transforming the events in the respective Kafka-topics to functional and reusable datasets by processing them with data-processing-application ‘Apache Spark’ and storage-format ‘Delta-lake’.
I hope you enjoyed this blog. Feel free to reach out in case you have any questions!
Stay tuned for more and follow us on LinkedIn to be updated automatically!
For additional reference and codebase: https://github.com/foodforanalytics/ffa-demo-platform.