Food For Tech | Using Weather Data
Author: Benito van Breugel
Nowadays, a data product is never finished. there is a never-ending desire and possibility to add more features and data to it. Besides your own data, there are numerous external data sources that can be incorporated into your data product. Within multiple industries, the weather has a very high impact on your sales-operations. In essence, weather can influence the sales figures of your company. Next to that, more people will go outside, get a drink at a café, or will plan a BBQ when the weather is better so to say. this requires you, from food manufacturer up until retailers and supermarkets, to adjust your production capacity or planning of your business accordingly to meet demand. What if we could help you with integrating historical, actual, and forecasted weather-data into your data product?
In this blog we will show you how to unlock, store, and integrate weather data into a data product that is built within Microsoft Azure with just a few lines of code. Afterwards the data is ready to be consumed by your consume / reporting tool of choice. To conclude, we will expose the data within the data product through a standard REST API interface, allowing for easy integration with 3rd party systems.
Weather data is broadly available on the internet. There are multiple API endpoints available to interact with. For example, the Dutch KNMI has its own data platform to share weather data. However, there are many more websites with similar and even better API endpoints out there. For the purpose of this blog, we will use the www.visualcrossing.com weather API endpoints. Before starting to get the weather data, it is important to determine your context, your location, and most important, what type and frequency of data you would need? As example, in case you are just downloading locations for a wide range of dates, you will end up with huge response data sets, with hourly-values, which might not even be required.
Starting with the API endpoint, we need to have the basics right to be able to interact with it. Within this blog, we use the Python language to interact with the API endpoints. To perform an API call, we will need the API URL and the API KEY together with the parameters ‘location’ and ‘date’ in order to receive the data. Once defined, choose your output format, which in our case is JSON, and the API URL is constructed. The corresponding Python package for this is the request package.
In Python this results in the following code:
Once executed, we are transforming the JSON response provided by the API endpoint into a pandas-dataframe (import pandas) that contains all information in 1 line, like below:
Unfortunately, the pandas-dataframe is not structured right away to our liking. As a result, all daily information about sun, rain, humidity etc., is grouped in one array in the dataframe column days, as can be seen above. Therefore, we need to apply some transformations in order to make the data more structured. For this, there is the option to convert a JSON string subpath to a structured pandas-dataframe. By using the JSON conversion statement on a dedicated record_path, will enforce the conversion of the specified sub path within the JSON string to a separate pandas-dataframe. In Python code this looks as follows:
Joining the standard dataframe (in code block 1) with the dataframe_days (above) will make sure the information is together in one Pandas-dataframe and structured properly. As can be seen below, the daily information is now split up into corresponding columns, like ‘temp’, ‘dew’, & ‘winddir’.
Now data can be changed, stored or consumed. All of this is possible, based on your solution requirement. In this demo we have chosen to store the information first on a daily basis in Azure blob storage. From an architecture point of view, it is important to think about the landing location, folder structure and partitioning of the data. To be able to store multiple locations and dates, each with its own key, the following structure has been defined: can easily be achieved by using the azure.storage.blob package to upload the dataframe to the specified blob and corresponding name.
At this moment, information is stored, and all history is available in the blob storage. Next, it is possible to combine and group all daily files per location into one dataset. In python, you can use the .append method, while looping through the files and storing the final composed Pandas-dataframe as a dataset in a new blob location.
Retrieving the weather data via the API endpoint is great, although during development, we are manually copy/pasting the python code into a terminal to execute the code and see it working (or not). For daily consumption without manually intervention, it is interesting to explore automation possibilities. In the current data landscape of execution data pipelines, many applications or cloud services . Scheduling and automated execution of these data pipelines can, for example, be done by applications as: Azure Data Factory, Apache Airflow, or AWS Step Functions. Besides, one can also run python code inside a Docker container, orchestrated by Kubernetes. Although these are all valid options, we will use Azure Functions for this demo as the serverless execution component of the Python code. One of the advantages of using Azure Functions is that it easily integrates with the existing Python code, can be triggered by a schedule, and above all is cost-efficient and easy to implement.
Azure Functions can be created within the Azure portal or directly with Visual Studio Code (please make sure to install the extension). First, we will set the function specifics, like trigger mechanism, schedule and core language. The following JSON string will define the type of trigger and source file for the execution (__init__.py).
The above schedule is defined through a cron expression, in the form “second minute hour day month year”. More information about cron expressions can be found here: https://www.freeformatter.com/cron-expression-generator-quartz.html
In our case, we are scheduling the execution daily at 8.00.31 UTC. By specifying the seconds (31), we guarantee that it will only execute once per day. If we don’t specify this, the function is able to run continuously within a minute, which is not something we require.
In the __init__.py script, the main functions will plan the execution of the get_weather_data function, which executes the earlier described logic of retrieving the data via the API. The timer is the input for the azure function that triggers an execution on the defined schedule.
The above function can be deployed to the Azure environment and will execute the Python logic.
Note that you will need to configure the azure blob storage connection within the Azure function as well to make sure the data can be stored in the respective blob account. This is done in the file ‘settings.json’ within property‘AzureWebJobsStorage’.
Putting it all together in a schematic overview, the solution now looks as follows, where we have a time trigger schedule on a daily basis, which executes an Azure function that integrates weather data, developed in the Python language.
At this point we have setup an automated flow to retrieve weather data into our data product. Last step in the end-2-end pipeline is to consume this data in any format. It is possible to convert the data into a CSV-file and download it locally, but it is more interesting to present the data through an API endpoint to the end-user. Let’s see this in practice.
Once again, we are doing this in Python code, and install the package Flask is a small webserver that enables you to build API endpoints and corresponding web routes in an easy way. For example, you can setup a local http request to ‘/’ and return a string text as a response, like below:
If you start your FLASK-app, you will see the following response within the web-portal:
In order to return the weather data through an API endpoint within the web-portal, we need to do the following:
- Specify the http request with ‘location’ as an input parameter.
- Retrieve the specific data from the data product (i.e. Azure blob storage).
- Return the data in a JSON format to the flask API web route ‘location’.
These 3 lines will then retrieve the data and present the response in the web route’s http location://xxx.x.x.x:yyyy/weer/location.
As a result the data from Amsterdam will be retrieved and presented in a JSON string.
To conclude, in this blog we have shown you how to retrieve weather data from an API endpoint, how to automate the process with Azure Functions, combine small datasets into a larger comprehensive dataset, and serve the data, stored on azure blob storage, via a REST API endpoint in JSON format.
Note, that this is not a production ready solution, but just shows the end-2-end pipeline and possibilities. Off course, one should improve the code and solution with conventions, standardizations, unit test, compliance and security, just to name a few.
I hope you enjoyed this blog. Feel free to reach out in case you have any questions!
Stay tuned for more and follow us on LinkedIn to be updated automatically!