This document will work through a simple example using the Graql shell to show how to get started with GRAKN.AI.
Edit me

Summary

This example takes a simple genealogy dataset and briefly reviews its ontology, then illustrates how to query, extend and visualise the graph, before demonstrating reasoning and analytics with Graql.

Introduction

If you have not yet set up GRAKN.AI, please see the Setup guide. In this tutorial, we will load a simple ontology and some data from a file, basic-genealogy.gql and use the Graql shell and Grakn visualiser to illustrate some key features of GRAKN.AI.

The Graql Shell

The first few steps mirror those in the Setup Guide, and you can skip to The Ontology if you have already run through that example. Start Grakn and load the example graph:

./bin/grakn.sh start
./bin/graql.sh -f ./examples/basic-genealogy.gql

Then start the Graql shell in its interactive (REPL) mode:

./bin/graql.sh

You will see a >>> prompt. Type in a query to check that everything is working:

match $x isa person, has identifier $n;

You should see a printout of a number of lines of text, each of which includes a name, such as “William Sanford Titus” or “Elizabeth Niesz”.

The Ontology

You can find out much more about the Grakn ontology in our documentation about the Grakn knowledge model, which states that

“The ontology is a formal specification of all the relevant concepts and their meaningful associations in a given application domain. It allows objects and relationships to be categorised into distinct types, and for generic properties about those types to be expressed”.

For the purposes of this guide, you can think of the ontology as a schema that describes items of data and defines how they relate to one another. You need to have a basic understanding of the ontology to be able to make useful queries on the data, so let’s review the chunks of it that are important for our initial demonstration:

insert

# Entities

person sub entity
  plays parent
  plays child
  plays spouse1
  plays spouse2

  has identifier
  has firstname
  has surname
  has middlename
  has picture
  has age
  has birth-date
  has death-date
  has gender;

# Resources

identifier sub resource datatype string;
firstname sub resource datatype string;
surname sub resource datatype string;
middlename sub resource datatype string;
picture sub resource datatype string;
age sub resource datatype long;
birth-date sub resource datatype date;
death-date sub resource datatype date;
gender sub resource datatype string;

# Roles and Relations

marriage sub relation
  relates spouse1
  relates spouse2
  has picture;

spouse1 sub role;
spouse2 sub role;

parentship sub relation
  relates parent
  relates child;

parent sub role;
child sub role;

There are a number of things we can say about ontology shown above:

  • there is one entity, person, which represents a person in the family whose genealogy data we are studying.
  • the person entity has a number of resources to describe aspects of them, such as their name, age, dates of birth and death, gender and a URL to a picture of them (if one exists). Those resources are all expressed as strings, except for the age, which is of datatype long.
  • there are two relations that a person can participate in: marriage and parentship
  • the person can play different roles in those relations, as a spouse (spouse1 or spouse2 - we aren’t assigning them by gender to be husband or wife) and as a parent or child (again, we are not assigning a gender such as mother or father).
  • the marriage relation has a resource, which is a URL to a wedding picture, if one exists.

The Data

The data is rather cumbersome, so we will not reproduce it all here. It is part of our genealogy-graph project, and you can find out much more about the Niesz family in our CSV migration and Graql reasoning example documentation. Here is a snippet of some of the data that you added to the graph when you loaded the basic-genealogy.gql file:

$57472 isa person has firstname "Mary" has identifier "Mary Guthrie" has surname "Guthrie" has gender "female";
$86144 has surname "Dudley" isa person has identifier "Susan Josephine Dudley" has gender "female" has firstname "Susan" has middlename "Josephine";
$118912 has age 74 isa person has firstname "Margaret" has surname "Newman" has gender "female" has identifier "Margaret Newman";
...
$8304 (parent: $57472, child: $41324624) isa parentship;
$24816 (parent: $81976, child: $41096) isa parentship;
$37104 isa parentship (parent: $49344, child: $41127960);
...
$122884216 (spouse2: $57472, spouse1: $41406488) isa marriage;
$40972456 (spouse2: $40964120, spouse1: $8248) isa marriage;
$81940536 (spouse2: $233568, spouse1: $41361488) has picture "http:\/\/1.bp.blogspot.com\/-Ty9Ox8v7LUw\/VKoGzIlsMII\/AAAAAAAAAZw\/UtkUvrujvBQ\/s1600\/johnandmary.jpg" isa marriage;

Don’t worry about the numbers such as $57472. These are variables in Graql, and happen to have randomly assigned numbers to make them unique. Each statement is adding either a person, a parentship or a marriage to the graph. We will show how to add more data to the graph shortly in the Extending The Graph section. First, however, it is time to query the graph in the Graql shell.

Querying the Graph

Having started Grakn engine and the Graql shell in its interactive mode, we are ready to make a number queries. First, we will make a couple of match queries.

Find all the people in the graph, and list their identifier resources (a string that represents their full name):

match $p isa person, has identifier $i;

Find all the people who are married:

match (spouse1: $x, spouse2: $y) isa marriage; $x has identifier $xi; $y has identifier $yi;  

List parent-child relations with the names of each person:

match (parent: $p, child: $c) isa parentship; $p has identifier $pi; $c has identifier $ci; 

Find all the people who are named ‘Elizabeth’:

match $x isa person, has identifier $y; $y val contains "Elizabeth";

Querying the graph is more fully described in the Graql documentation.

Extending the Graph

Besides making match queries, it is also possible to insert items (see further documentation) and delete items (see further documentation) through the Graql shell. To illustrate inserting a fictional person:

insert $g isa person has firstname "Titus" has identifier "Titus Groan" has surname "Groan" has gender "male";
commit

To find your inserted person:

match $x isa person has identifier "Titus Groan"; 

To delete the person again:

match $x isa person has identifier "Titus Groan"; delete $x;
commit

Alternatively, we can use match...insert syntax, to insert additional data associated with something already in the graph. Adding some fictional information (middle name, birth date, death date and age at death) for one of our family, Mary Guthrie:

match $p has identifier "Mary Guthrie"; insert $p has middlename "Mathilda"; $p has birth-date 1902-01-01; $p has death-date 1952-01-01; $p has age 50;
commit

Using the Grakn Visualiser

You can open the Grakn visualiser by navigating to localhost:4567 in your web browser. The visualiser allows you to make queries or simply browse the knowledge ontology within the graph. The screenshot below shows a basic query (match $x isa person; offset 0; limit 100;) typed into the form at the top of the main pane, and visualised by pressing “>”:

Person query

You can zoom the display in and out, and move the nodes around for better visibility. Please see our Grakn visualiser documentation for further details.

Using Inference

We will move on to discuss the use of GRAKN.AI to infer new information about a dataset. In the ontology, so far, we have dealt only with a person, not a man or woman, and the parentship relations were simply between parent and child roles. We did not directly add information about the nature of the parent and child in each relation - they could be father and son, father and daughter, mother and son or mother and daughter.

However, the person entity does have a gender resource, and we can use Grakn to infer more information about each relationship by using that property. The ontology accommodates the more specific roles of mother, father, daughter and son:

person 
  plays son
  plays daughter
  plays mother
  plays father
	
parentship sub relation
  relates mother
  relates father
  relates son
  relates daughter;

mother sub parent;
father sub parent;
son sub child;
daughter sub child;

Included in basic-genealogy.gql are a set of Graql rules to instruct Grakn’s reasoner on how to label each parentship relation:

$genderizeParentships1 isa inference-rule
lhs
{(parent: $p, child: $c) isa parentship;
$p has gender "male";
$c has gender "male";
}
rhs
{(father: $p, son: $c) isa parentship;};

$genderizeParentships2 isa inference-rule
lhs
{(parent: $p, child: $c) isa parentship;
$p has gender "male";
$c has gender "female";
}
rhs
{(father: $p, daughter: $c) isa parentship;};

$genderizeParentships3 isa inference-rule
lhs
{(parent: $p, child: $c) isa parentship;
$p has gender "female";
$c has gender "male";
}
rhs
{(mother: $p, son: $c) isa parentship;};

$genderizeParentships4 isa inference-rule
lhs
{(parent: $p, child: $c) isa parentship;
$p has gender "female";
$c has gender "female";
}
rhs
{(mother: $p, daughter: $c) isa parentship;};

If you’re unfamiliar with the syntax of rules, don’t worry too much about it too much just now. It is sufficient to know that, for each parentship relation, Graql checks whether the pattern in the first block (left hand side or lhs) can be verified and, if it can, infers the statement in the second block (right hand side or rhs) to be true, so inserts a relation between gendered parents and children.

Let’s test it out!

First, try making a match query to find parentship relations between fathers and sons in the Graql shell:

match (father: $p, son: $c) isa parentship; $p has identifier $n1; $c has identifier $n2;

Did you get any results? Probably not, because reasoning is not enabled by default at present, although as Grakn develops, we expect that to change. If you didn’t see any results, you need to exit the Graql shell and restart it, passing -n and -m flags to switch on reasoning (see our documentation for more information about flags supported by the Graql shell).

./bin/graql.sh -n -m

Try the query again:

match (father: $p, son: $c) isa parentship; $p has identifier $n1; $c has identifier $n2;

There may be a pause, and then you should see a stream of results as Grakn infers the parentships between male parent and child entities. It is, in effect, building new information about the family which was not explicit in the dataset.

You may want to take a look at the results of this query in the Grakn visualiser and, as for the shell, you will need to activate inference before you see any results.

  1. Browse to the visualiser at localhost:4567.
  2. Open the Query settings under the cog button, which is on the far right hand side of the horizontal icon menu (at the top of the screen).
  3. You will see the “Activate inference” checkbox. Ensure that it is checked.

Now try submitting the query above or a variation of it for mothers and sons, fathers and daughters etc. Or, you can even go one step further and find out fathers who have the same name as their sons:

match (father: $p, son: $c) isa parentship; $p has firstname $n; $c has firstname $n;

Father-Son Shared Names query

If you want to find out more about the Graql reasoner, we have a detailed example. An additional discussion on the same topic can be found in our “Family Matters” blog post.

Using Analytics

Turning to Graql analytics, we can illustrate some basic queries in the Grakn visualiser.

Statistics

The mean age at death can be calculated using compute mean as follows, entering it into the visualiser’s query form:

compute mean of age in person; # returns 78.23 (rounded to 2 decimal places)

Other statistical values can be calculated similarly, e.g. values for count:

compute count in person; # 60

A full list of statistics that can be explored is documented in the Compute Queries documentation.

Shortest Path

It is also possible to find the shortest path between two nodes in the graph. The documentation for the Grakn visualiser describes how to use the query builder tool, and includes a video.

In brief, let’s select two people from the genealogy dataset:

match $x has identifier "Barbara Shafner"; $y has identifier "Jacob J. Niesz";

and then search for relationships joining two of them using:

compute path from "id1" to "id2"; # Use the actual values of identifier for each person
# e.g. compute path from "114848" to "348264";

You can see below that the two people selected are married.

The path query uses a scalable shortest path algorithm to determine the smallest number of relations required to get from once concept to the other.

Shortest path between people

To narrow the path to specific relations between specific entities:

compute path from "id1" to "id2" in person, parentship;

The above limits the path to blood relations (parent/child relations) thus excludes marriage. As a result, the shortest path between the two people is now longer: Barbara Shafner and Jacob J. Niesz are cousins (their mothers, Mary Young and Catherine Young, are sisters, with their father being Jacob Young).

Shortest path between people

Data Migration

In this example we loaded data from basic-genealogy.gql directly into a graph. However, data isn’t often conveniently stored in .gql files and, indeed, the data that we used was originally in CSV format. Our CSV migration example explains in detail the steps we took to migrate the CSV data into Grakn.

Migrating data in formats such as CSV, SQL, OWL and JSON into Grakn is a key use case. More information about each of these can be found in the migration documentation.

Where Next?

This page was a very high-level overview of some of the key use cases for Grakn, and has hardly touched the surface or gone into detail. The rest of our developer documentation and examples are more in-depth and should answer any questions that you may have, but if you need extra information, please get in touch.

A good place to start is to explore our additional example code and the documentation for:

Comments

Want to leave a comment? Visit the issues on Github for this page (you’ll need a GitHub account). You are also welcome to contribute to our documentation directly via the “Edit me” button at the top of the page.