As is the case in many domains today, data in the agricultural domain is fragmented, stored in disparate silos that have private standards and closed architectures. The lack of interoperability and trust resulting from such a setup hinders the sharing of data, which is necessary to unlock its full potential. One promising solution that aims to address these challenges are data spaces.
Within the scope of the X-KIT project, we are concerned with exploring initiatives aimed at realizing data spaces – such as Eclipse Dataspace Components (EDC) and Gaia-X –, investigating the existing technologies for realizing data spaces, and analysing their benefits for the agricultural domain. To gain a proper understanding of these concepts, we selected a simple use case from the agricultural domain and implemented it using some of the existing technologies.
In this blog post, we will explain our use case, use it as a step-by-step guide to demonstrate the concepts and technologies we explored, and share some insights based on our experience.
Technologies for building data spaces
A data space is “both a multi-organizational agreement and a supporting technical infrastructure that enables data sharing between two or more participants” [1]. Another definition of data spaces is “A data space is a federated, open infrastructure for sovereign data sharing, based on common policies, rules and standards” [2]. Decentralization, openness, transparency, self-determination, security, portability, and interoperability are key objectives of data spaces. In the following, we list some of the existing technoligies for realizing data spaces.
Eclipse Dataspace Components (EDC)
EDC is an open-source project concerned with developing a framework to enable the realization of data spaces. The EDC framework provides several software components, each of which consists of a set of software libraries and modules (referred to as “extensions”), which are designed to be used, combined, and extended by developers to build components for realizing data spaces or enabling companies to join existing ones. According to the EDC project, their framework is powered by the specifications of the Gaia-X AISBL Trust Framework and the IDSA Dataspace Protocol [1].
Data Space Connectors
The connector is the main software component in data spaces. Each participant in a data space has to integrate this component into their infrastructure to act as its interface to the data space. Participants of a data space can use their connectors to share and consume data assets in a secure way, discover each other’s data offerings, establish automated contract negotiations, enforce policies, transfer data, and audit processes.
The EDC framework offers a variety of extensions for building data space connectors of different capabilities. Developers can reuse a subset of these extensions that meets their requirements, modify extensions, or extend them to build their own connector distribution.
Notes
- The EDC framework offers a tool kit, i.e., a collection of extensions, that can be used to accelerate the implementation of data space components. EDC does not offer off-the-shelf or ready-to-use components.
- Developers have to select, combine, and possibly modify existing EDC extensions or create new ones to build a connector distribution that meets their requirements.
- Some EDC extensions complement each other; while some are mutually exclusive.
- The EDC framework does not offer a ready-to-use connector, thus, the term “EDC Connector” can be misleading. Connector distributions can vary, for example, in terms of the extensions they include. Such differences between connector distributions can affect their compatibility. To eliminate misunderstandings and acknowledge the heterogeneity of connector implementations, we prefer using the term “EDC-based Connector” to describe connector distributions that are developed using the EDC framework.
Sovity Community Edition EDC
During our investigation in the X-KIT project, we came across sovity, a Fraunhofer spin-off that went the extra mile to enable data spaces. sovity used the EDC framework, developed additional extensions, and combined those extensions to offer a ready-to-use connector distribution, the “sovity Community Edition EDC”, in the form of a Docker Image.
Furthermore, to manage and interact with the sovity Community Edition EDC, sovity offers the “sovity EDC UI”. The sovity EDC UI is an open-source user interface (UI) for their connector that is based on EDC’s DataDashboard. sovity offers the sovity EDC UI in the form of a ready-to-use Docker Image.
Attention!
- In the remainder of this blog post, we will focus on the sovity Community Edition EDC, a ready-to-use connector implementation based on EDC, which is intended for rapid self-deployment and testing of EDC features.
- The information provided in this blog post is based on our investigation of sovity Community Edition EDC version 4.2.1, which is based on EDC version 0.0.1-20230220.patch1 (based on 0.0.1-milestone-8).
- We control the sovity Community Edition EDC manually using sovity EDC UI version 0.0.1-milestone-8-sovity14.
Agricultural data spaces – a use case
To get a deeper understanding of the concept and benefits of data spaces and the usage of EDC-based connectors, in the following we will describe the agricultural demo use case that we have realized in the scope of X-KIT project.
Use Case Description
In the scenario of our demo use case, two companies are involved, as described below:
- Basic FMIS Provider: Is a service provider that operates a Farm Management Information System, Basic FMIS, and provides it as a Software-as-a-Service to their customers, mainly farmers who use the service to manage their fields. The Basic FMIS provider is looking for ways to extend the capabilities of Basic FMIS and introduce new features to their customers. One interesting idea they have is to display soil moisture measurements of farmers’ fields on a heatmap. Such a feature can help farmers to use water more efficiently and thus ensure better production. However, the Basic FMIS provider does not have such data, nor does it have the means to collect the data itself.
- SMS Provider: Is a soil measurement service Provider that has recently conducted research involving the use of satellite imagery to collect soil moisture measurements. The company sees a good opportunity in the agricultural domain and is considering selling soil moisture measurements to agricultural service providers. However, the SMS provider is having a hard time finding customers and implementing data exchange in an interoperable manner. It is also unsure how to ensure its sovereignty over its data, fearing that its data could be used for unwanted purposes or that recipients could share it with others without consent or adequate compensation.
In this scenario, the Basic FMIS provider is interested in getting soil moisture measurements, while the SMS provider is looking for customers to sell soil moisture measurements to under its own conditions. This is where data spaces come into play: Joining a common data space would allow both the Basic FMIS provider and the SMS provider to find each other and reduce the effort needed to collaborate and exchange data and services in a secure and trusted way (see Figure 1).
Use Case Realization
To realize this use case, we set up an environment for the Agricultural Data Space, our playground data space with two participants: the Basic FMIS provider and the SMS provider. For the Basic FMIS provider, we implemented a simplified FMIS called Basic FMIS that is based on our results from the COGNAC project and can display field-related data as well as soil moisture measurements. Basic FMIS will serve as a data sink in our use case. For the SMS provider, we implemented a Soil Measurements Service (SMS) that provides randomly generated soil moisture measurements. SMS will serve as a data source in our use case. Each participant runs its system as well as its own instances of the sovity Community Edition EDC connector and the sovity EDC UI. sovity Community Edition EDC relies on a Dynamic Attribute Provisioning Service (DAPS) for authentication. The DAPS serves as an identity provider for the connectors in the data space, providing them with a distinct identity and credentials to be used when communicating with each other. We decided to use the DAPS instance operated by Fraunhofer AISEC and configured the connectors accordingly. Figure 2 illustrates our setup.
Note
- To participate in a data space, an organization would need to run its own connector instance or connect its system to a Connector-as-a-Service solution that acts as its gateway to the data space.
Use Case Walkthrough
To illustrate how data can be offered and consumed in a data space, in the following we provide a step-by-step walkthrough of our demonstrator that implements the aforementioned use case.
1. Offering data in the Agricultural Data Space
The process for offering the soil moisture measurements in the Agricultural Data Space is threefold: data asset creation, policy creation, contract definition creation. In the following, we will explain these three steps in detail.
A. Data Asset Creation
To create a data asset, an SMS provider employee can use the UI of their connector, navigate to the “Assets” page, and provide metadata about the data (e.g., name, version, description, keywords, etc.) and technical details about the data source (e.g., a REST-API endpoint from which the connector can retrieve the data). See Figures 3 and 4.
Upon successful creation of a data asset, the data asset is stored in the connector’s storage, but is not yet visible to other participants in the Agricultural Data Space.
Note
- sovity Community Edition EDC currently only supports HTTP endpoints as data sources. Therefore, the SMS backend must expose an endpoint for the SMS connector to retrieve the soil moisture measurements when required.
- What is stored in the connector’s storage is the metadata and the technical information needed to retrieve the actual data when needed, not the data itself.
B. Policy Creation
The second step is to create a policy that describes the conditions under which the SMS provider agrees to share the soil moisture measurements with other participants in the Agricultural Data Space.
sovity Community Edition EDC supports three classes of policies:
- “Always-True”: To allow any use without limitation.
- “Connector-Restricted-Usage”: To restrict the use to specific connector instances.
- “Time-Period-Restricted”: To restrict the use to a certain period of time.
To create a new policy, an SMS provider employee can use the connector’s UI, navigate to the “Policies” page, and click on “Create policy” (see Figure 5). However, for our use case, since the SMS provider employee wants to ensure that the offer of the soil moisture measurements is discoverable by any participant in the Agricultural Data Space, the employee uses the default “Always-True” policy.
Notes
- The classes of policies supported in sovity Community Edition EDC are quite limited. Currently, there is no support for controlling the purpose of data usage, the number of uses, the location of consuming connectors, payments, restrictions on sharing with other parties, or obligations such as deleting the data after a certain time. Supporting these classes of policies requires additional implementation and customization of the connector.
- Most of the technical policy enforcement takes place before the data is transferred. Once the data is transferred to the consumer’s data sink (which is outside the consumer’s connector), there is no technical enforcement provided by the connectors. Compliance of the consumer’s data use with the policies defined by the data provider relies solely on organizational measures (e.g., legal contracts).
- The International Data Spaces have concepts for technically maintaining control over the data after it has been shared with other parties, but to the best of our knowledge, neither EDC nor sovity Community Edition EDC provides such a level of control out of the box.
C. Contract Definition Creation
The last step is to create a contract definition on the “Contact Definitions” page that attaches policies to the data asset (see Figure 6).
When creating a contract definition, the SMS provider employee must specify two types of policies (using the policies defined in the previous steps):
- Access Policy: The policy under which the contract definition is discoverable by the connectors of other participants in the data space.
- Contract Policy: The policy for consuming the actual data (the soil moisture measurements in our use case).
Once the contract definition has been created, it is published as an offer in the catalog of the SMS provider’s connector and becomes discoverable by other participants of the Agricultural Data Space based on the attached access policy. In our use case, since the access policy is set to “Always-True”, the contract definition can be discovered by any connector in the Agricultural Data Space.
Note
- EDC-based connectors, such as sovity Community Edition EDC, publish their offers (i.e., contract definitions) in their catalog, which can be accessed by the connectors of the other participants in the data space.
2. Consuming data in the Agricultural Data Space
As explained in the use case description, the Basic FMIS provider is interested in obtaining soil moisture measurements from the fields in the County of Kaiserslautern, Germany. The process for consuming data in the Agricultural Data Space is threefold: data offer discovery; contract negotiation and establishment; data transfer. In the following, we will explain these three steps in detail.
A. Data Offer Discovery
To discover relevant data offers, an employee of the Basic FMIS provider can use their connector UI to navigate to the „Catalog Browser“ page, which lists the offers published in the catalogs of the connectors of the other participants in the Agricultural Data Space (in our use case, the SMS provider is the only participant).
The employee will discover the contract definition for soil moisture measurements offered by the SMS provider (see Figure 7). The employee can click on the entry to see more details about the offer (e.g., the metadata of the asset and the attached policies).
Notes
- When configuring the connector UI of the Basic FMIS provider in our environment, we had to specify the address of the connector of the SMS provider, otherwise the Basic FMIS provider’s connector would not have been able to discover and query the catalog of the SMS provider’s connector.
- In sovity Community Edition EDC, in order for a participant to be able to discover offers from other participants in a data space, the participant has to provide its connector with the endpoints of the connectors of participants whose offers they want to retrieve. Such a restriction limits the value of data spaces as participants do not have a means to know all the other participants in a data space. In the EDC framework, this is solved by the Federated Catalog component, a component that automatically discovers and provides the offers published by the participants of a data space.
B. Contract Negotiation and Establishment
After discovering the offer, the Basic FMIS provider’s employee can initiate a contract negotiation by clicking a button in the detailed view of the offer in their connector’s UI (see Figure 8). The FMIS connector then sends a negotiation request to the SMS connector and the automatic contract negotiation starts (no need for human interaction on either side). Once the contract negotiation process has been concluded successfully and an agreement has been reached, a binding contract is created and persisted in the connectors of both SMS and Basic FMIS. Basic FMIS and SMS employees can find the contract in the „Contracts“ page of their connector UI and click on the entry to view more details about it (e.g., the attached policies).
Notes
- The negotiation capabilities are quite limited in sovity Community Edition EDC. A consumer can only agree to all the conditions listed in the offer (a “take it or leave it” offer) but cannot suggest a counter offer.
- The automatic negotiation eliminates extra effort needed for contract establishment between two parties; however, offering no possibility to manually check contract negotiation requests before accepting them might be perceived as a limitation from the data provider point of view.
C. Data Transfer
After establishing a binding contract between the Basic FMIS provider and the SMS provider, a Basic FMIS employee can request the transfer of the data (the soil moisture measurements) via a button click in the details view of the contract (created in the previous step) from their connector’s UI (see Figure 9).
The employee is then prompted to provide details about the data sink, namely, an HTTP endpoint address and an API key header, to authorize the request. The employee must then initiate the transfer by clicking a button (see Figure 10). When the button is clicked, the Basic FMIS connector sends a data transfer request to the SMS connector. In the request, the Basic FMIS connector refers to the established contract (agreement), and thus the SMS connector verifies the existence and validity of this contract. If the requested data transfer is allowed (a valid contract exists and the policies are met or can be met, depending on the attached policy), the SMS connector retrieves the data from the data source specified by the SMS provider employee in 2.A (i.e., issues an HTTP GET request to the SMS backend) and pushes it to the data sink (i.e., issues an HTTP POST request to the specified Basic FMIS backend endpoint).
Notes
- sovity Community Edition EDC supports only one of the data transfer methods supported by the EDC framework: HTTPData.
- The HTTPData transfer method is a provider-push method. In other words, for the data transfer, the provider connector pushes the data to a data sink specified by the consumer. The data sink should be an HTTP endpoint that can be accessed by the provider connector. This transfer method forces the consumer to expose their data sink to the Internet and share credentials with the provider connector to allow access. Therefore, to ensure security, it is the responsibility of the consumer to limit the access permissions granted by these credentials (e.g., one-time use API keys).
- In our use case, the Basic FMIS backend exposes a public HTTP endpoint to the Internet that accepts data deliveries from foreign connectors. As a simple solution for protecting this HTTP endpoint from unauthorized access, we implemented API keys.
Finally, now that Basic FMIS has the data, it visualizes the soil moisture measurements of its users’ fields (farmers’ fields) as a heatmap layer. See Figure 11.
Notes
- In order to get up-to-date soil moisture measurements, the Basic FMIS employee must manually trigger a new data transfer; new data sets are not pushed automatically.
- Based on the contract established in 2.B, the Basic FMIS employee may initiate multiple data transfers as long as the contract is still valid and the policies are met.
Conclusion
Data Spaces are a domain-independent, promising solution to enable secure data exchange while ensuring data sovereignty, and are a step towards realizing interoperability in the digital world. When the effort required to share data is reduced by defining rules and providing secure and interoperable mechanisms for data sharing, data spaces can increase the willingness to share data, thereby increasing the value of data. They also lower the barriers to collaboration, allowing new business models to emerge and providing opportunities for small players.
In this blog post, we presented a simple use case from the agricultural domain: While the Agricultural Data Space involves only one data provider (the SMS provider) and one data consumer (the Basic FMIS provider), a real data space, regardless of the domain, would consist of multiple data providers providing different data offers of different quality or under different conditions and multiple consumers. However, the process described still applies: Publishing data offers in the data space, discovering them, negotiating and establishing contracts, and consuming data will be as simple as described in this blog post, regardless of which offer and which provider the consumer interacts with, because they are all participating in the same data space and following the same rules and principles.
As our use case illustrates, current technologies still have limitations (e.g., limited support for policies and enforcement mechanisms, limited negotiation capabilities, no interoperability at the data level) and improvements are needed before they can be widely adopted.
Initiatives such as Gaia-X and IDSA are continuously fine-tuning the concepts and specifications of data spaces. Furthermore, technologies for realizing data spaces such as EDC and sovity EDC are undergoing rapid continuous development, so the features are being incrementally improved and the limitations are being addressed.
Note
- We published a walkthrough video for the realization of our use case. You can find our video here.
If you have any questions or would like to know technical details about our demo, please do not hesitate to contact us.
References
[1] https://projects.eclipse.org/projects/technology.edc
[2] https://gaia-x-hub.de/wp-content/uploads/2022/10/White_Paper_Definition_Dataspace_EN.pdf