Building A Recommendation Engine With The GraphBase Graph DatabasePosted on July 1st, 2013 by Stephen Young.
In my previous entry we looked at some simple Bounds Queries. You can do a great deal in a well-structured data graph with simple queries, but sometimes more complex traversal patterns can help us to uncover new and useful information.
There's a problem, however - complex traversal patterns usually mean complex query code. We've seen that the data structures in a graph database can be less complex, and more expressive, than those used in a Relational Database. But until now, the query semantics for getting meaningful data back from a graph like this were more complex and less expressive. This process of "getting data back" has often been frustrating. So frustrating, in fact, that graph databases have to date been used successfully for only the simplest of graphs.
The GraphBase solution to this problem has been to provide query semantics that support multiple traversal patterns, and interaction between those patterns, within a single terse query. Let's take a look at how they work.
In this post, we're going to use the rating data we introduced into the "Sakila" data set to build a Recommendation System. But rather than the traditional approaches used by such sites as Amazon and Spotify, we'll use a movel method made possible by the traversal speed of a Graph Database.
The algorithm is pretty simple - we're going to get recommendations from people who like the same films that we like.
Simple to say, but not so simple to execute - particularly in a Relational Database. For the Sakila database this is the cartesian join that's required ...
(16044 x 4581) x 1000 x (16044 x 4581) x 599 x (16044 x 4581) x 1000
.. i.e. 237,818,512,102,606,773,812,744,256,000,000 rows need to be sliced, diced and filtered to return our result. Sure, some clever indexing, execution plans and intermediate views will make this number much less intimidating - but it's obvious that this strategy is just not viable in an RDBMS. Certainly not if we want to do it in real-time.
Below is a Bounds Query traversal pattern that works with the Sakila graph-structure we loaded into GraphBase using RapidGrapher in an earlier exercise.
During our traversal, we've collected; Fred's 5-star rental events, the films from those rentals, other 5-star rental events for those films, the customers who made those rentals, the 5-star rentals those customers have made, and the films associated with those rentals.
You'll find a comprehensive description of the terms used at Bounds Language - but a couple of features of this query are worth noting here.
Firstly, we've used XAND rather than AND in the latter parts of the traversal pattern. This allows the returned result graph to retain some useful vertices. The Navigator "tree-view" above and to the right are of the same "FilmsForFred" result graph and the above view of that graph lets us see the Customers whose recommendations we're using.
Secondly, the #1-20 cursor range expression lets us limit the result.
GraphBase Agility Edition does not impliment a Property Graph, and those of you who've worked with earlier Graph Databases will note that the structures and traversal patterns used here are dramatically different from those you would use in a Property Graph. A property graph implimentation for this problem would require you to fill the arcs of your graph with large numbers of redundant properties - with a concomitant performance hit. And you'd lose the power, flexibility and structural simplicity of the Framework.