Graph-Oriented Solutions Enhancing Flexibility Over Mutant Requirements

Relational DataBase Management Systems (RDBMS) represent the state-of-the-art, thanks in part to their well-established ecosystem of surrounding technologies, tools, and widespread professional skills. 

During this era of technological revolution encompassing both Information Technology (IT) and Operational Technology (OT), it is widely recognized that significant challenges arise concerning performance, particularly in specific use cases where NoSQL solutions outperform traditional approaches. Indeed, the market offers many NoSQL DBMS solutions interpreting and exploiting a variety of different data models:

"Errors using inadequate data

are much less

than those using no data at all."

CHARLES BABBAGE

A less-explored concern is the ability of software architectures relying on relational solutions to flexibly adapt to rapid and frequent changes in the software domain and functional requirements. This challenge is exacerbated by Agile-like software development methodologies that aim at satisfying the customer in dealing with continuous emerging demands led by its business market.

In particular, RDBMS, by their very nature, may suffer when software requirements change over time, inducing rapid effects over database tabular schemas by introducing new association tables -also replacing pre-existent foreign keys- and producing new JOIN clauses in SQL queries, thus resulting in more complex and less maintainable solutions.

In our enterprise experience, we have successfully implemented and experimented with a graph-oriented DBMS solution based on the Neo4j Graph Database so as to attenuate architectural consequences of requirements changes within an operational context typical of a digital social community with different users and roles.

In this article, we:

The Neo4j Graph Database

The idea behind graph-oriented data models is to adopt a native approach for handling entities (i.e., nodes) and relationships behind them (i.e., edges) so as to query the knowledge base (namely, knowledge graph) by navigating relationships between entities. 

The Neo4j Graph Database works on oriented property graphs where both nodes and edges own different kinds of property attributes.

We choose it as DBMS, primarily for: 

Furthermore, the Neo4j Graph Database also offers Java libraries for Object Graph Mapping (OGM), which help developers in the automated process of mapping, persisting, and managing model entities, nodes, and relationships. Practically, OGM interprets, for graph-oriented DBMS, the same role that the pattern Object Relational Mapping (ORM) has for relational persistence layers.  

Comparable to the ORM pattern designed for RDBMS, the OGM pattern serves to streamline the implementation of Data Access Objects (DAOs).
Its primary function is to enable semi-automated elaboration in persisting domain model entities that are properly configured and annotated within the source code.

With respect to Java Persistence API (JPA)/Hibernate, widely recognized as a leading ORM technology, Neo4j's OGM library operates in a distinctive manner:

Write Operations

Read Operations

Solution Benefits of an Exemplary Case Study

To exemplify the meaning of our analysis, we introduce a simple operative scenario: the UML Class Diagram of Fig. 1.1 depicts an entity User which has a 1-to-N relationship with the entity Auth (abbr. of Authorization), which defines permissions and grants inside the application.
This Domain Model may be supported in RDBMS by a schema like that of Tab. 1.1 and Tab. 1.2 or, in graph-oriented DBMS, as in the knowledge graph of Fig. 1.2.


UML Class Diagram of the Domain Model

Fig. 1.1: UML Class Diagram of the Domain Model.


users table
id firstName lastName
... ... ...

Tab. 1.1: Table mapped within RDBMS schema for User entity.

AUTHS table
id name level user_fk
... ... ... ...
Tab. 1.2:  Table mapped within RDBMS schema for Auth entity.


 Knowledge graph related to the Domain Model of Fig. 1.1

Fig. 1.2:  Knowledge graph related to the Domain Model of  Fig. 1.1.

Now, imagine that a new requirement emerges during the production lifecycle of the application: the customer, for administrative reasons, needs to bound authorizations in specific time periods (i.e., from and until the date of validity) as in Fig. 2.1, transforming the relationship between User and Auth in a N-to-N.
This Domain Model may be supported in RDBMS by a schema like that of Tab. 2.1 or, in graph-oriented DBMS, as in the knowledge graph of Fig. 2.2

UML Class Diagram of the Domain Model after the definition of new requirements

Fig. 2.1:  UML Class Diagram of the Domain Model after the definition of new requirements.


users table
id firstName lastName
... ... ...

Tab. 2.1: Table mapped within RDBMS schema for User entity.

users_AUTHS table
user_fk auth_fk from until
... ... ... ...

Tab. 2.2: Table mapped within RDBMS schema for storing associations between User and Auth. entities.

AUTHS table
id name level
... ... ...
Tab. 2.3:  Table mapped within RDBMS schema for Auth entity.


 Knowledge graph related to the Domain Model of Fig. 2.1

Fig. 2.2:  Knowledge graph related to the Domain Model of  Fig. 2.1.

The advantage is already clear at a schema level: indeed, the graph-oriented approach did not change the schema but only prescribes the definition of two new properties on the edge (modeling the relationship), while the RDBMS approach has created the new association table users_auths substituting the external foreign key in auths table referencing the user's table.

Proceeding further with a deeper analysis, we can try to analyze a SQL query wrt a query written in the Cypher query language syntax under the two approaches: we’d like to identify users with the first name “Paul” having an Auth named “admin” with the level greater than or equal to 3.

On the one hand, in SQL, the required queries (respectively the first one for retrieving data from Tab. 1.1 and Tab. 1.2, while the second one for Tab. 2.1, Tab. 2.2, and Tab. 2.3) are:  

SQL
 
SELECT users.*
FROM users
INNER JOIN auths ON users.id = auths.user_fk
WHERE users.firstName = 'Paul' AND auths.name = 'admin' AND auths.level >= 3
SQL
 
SELECT users.*
FROM users
INNER JOIN users_auths ON users.id = users_auths.user_fk
INNER JOIN auths ON auths.id = users_auths.auth_fk
WHERE users.firstName = 'Paul' AND auths.name = 'admin' AND auths.level >= 3


On the other hand, in Cypher query language, the required query (for both cases) is:

Cypher
 
MATCH (u:User)-[:HAS_AUTH]->(auth:Auth)
WHERE u.firstName = 'Paul' AND auth.name = 'admin' AND auth.level >= 3
RETURN u


While the SQL query needs one more JOIN clause, it can be noted that, in this specific case, not only the query written in Cypher query language does not present an additional clause or a variation on the MATCH path, but it also remains identical. No changes were necessary on the "query system" of the backend! 

Conclusions

Wedge Engineering contributed as the technological partner within an international Project where a collaborative social platform has been designed as a decoupled Web Application in a 3-tier architecture composed of:

  1. A backend module, a layered RESTful architecture, leveraging on the JakartaEE framework;
  2. A knowledge graph, the NoSQL provided by the Neo4j Graph Database;
  3. A frontend module, a single-page app based on HTML, CSS, and JavaScript, exploiting the Angular framework.

The most challenging design choice we had to face was about using a driver that exploits natively the Cypher query language or leveraging on the OGM library to simplify DAO implementations: we discovered that building an entire application with custom queries written in Cypher query language is neither feasible nor scalable at all, while OGM may be not efficient enough when dealing with large data hierarchies that involve a significant number of relationships involving referenced external entities.

We finally opted for a custom approach exploiting OGM as the reference solutions for mapping nodes and edges in an ORM-like perspective and supporting the implementation of ad hoc DAOs, therefore optimizing punctually with custom query methods that were incapable of performing well.

In conclusion, we can claim that the adopted software architecture well responded to changes in the knowledge graph schema and completely fulfilled customer needs while easing efforts made by the Wedge Engineering developers team.

Nevertheless, some threats have to be considered before adopting this architecture:

 

 

 

 

Top