Frequently asked questions about Grakn.
Edit me

About GRAKN.AI

Why did you develop a new ontology and query language?

We are often asked why we have developed a new ontology and query language rather than use existing standards like RDF, OWL and SPARQL.

We have written a substantial explanation to this question on our blog. In summary, our underlying data model is that of a property graph, so in principle we’re able to import and export from/to RDF if needed. However, our ontology language is designed to strike a different and better balance between expressiveness and complexity than offered by the existing OWL profiles, especially in the context of knowledge graph structures. In consequence, our query language, Graql, is aligned with our ontology formalism to enable higher level query capabilities than supported by SPARQL over an RDF data model.

OWL is not well-suited for graph-structures. Because of its formal foundations and computational limitations it is in fact a more natural language for managing tree-shaped data instead. OWL also makes it hard to help validate consistency of data and ensure it is well-structured, and this is what knowledge graph applications require.

Bugs and strange behaviour

Why does Grakn hang when I try to start it?

I am running grakn.sh start but it hangs on Starting Cassandra. Why?

This may be because you have cloned the Grakn repo into a directory which has a space in its name (e.g. /grakn test). You can build our code successfully, but when you start grakn.sh, it hangs because Cassandra needs you to have single word pathnames. Remove the spaces (e.g. /grakn_test) and try again.

There are other possible reasons why Grakn hangs starting Cassandra. One may be that some other application is using the port 7199, which Cassandra needs. To find out what is using port 7199: lsof -i tcp:7199

From there, you’ll see the PID of application using that port. Check if you can safely kill it or change its port. It may be that another instance of Cassandra is blocking it, and you can simply kill it using: pkill -9 java

Then try grakn.sh start again.

Failing that, you can often find out more information by looking in the /logs directory under your Grakn installation.

Please see the answer to the question below “Can I run Grakn on an existing Cassandra Platform?” if you are already using Cassandra and want to run Grakn on a different instance of Cassandra to our default.

Why am I getting ghost vertices?

In a transaction based environment it is possible to have one transaction removing a concept while another concurrently modifies the same concept. Both transactions may successfully commit if the backend is eventually consistent, e.g. [Titan Cassandra](http://s3.thinkaurelius.com/docs/titan/1.0.0/common-questions.html.

The concept is likely to still exist with only the modified properties. When using the Titan Cassandra backend it is possible to safeguard against this by setting the checkInternalVertexExistence property to true. However, this will result in slower transaction as more reads will be necessary.

Working with Grakn

Which OS can I use with Grakn?

You can use Mac OS X or Linux right now. We plan to support Windows at a later date.

How do I load data into Grakn?

There are several ways to load data into Grakn. For small amounts of data (<1000 lines), you an load it directly via the Graql shell. For example, the following loads up the an example file called family-data.gql:

bin/graql.sh -f examples/family-data.gql

If you have a larger file, you will need to batch load it. The file will be divided in batches that will be committed concurrently. This differs from a regular load, where the whole file is committed in a single chunk when you call commit. See the example below, which loads the Graql file FILENAME.gql, from PATH.

bin/graql.sh -b PATH/FILENAME.gql

In order to check the status of the loading, you can open a new terminal window, navigate to the logs directory of your Grakn installation and run the command:

tail -f grakn.log

I want to load a large amount of data into a graph - how do I do it?

Graql is single-threaded and doesn’t support batch-loading. You may want to use the Java loader client, which provides multi-threaded batch loading, or the -b flag if you are using the Graql shell.

What are the differences between a batch graph load and a normal graph load?

The batch load is faster for larger datasets because it ignores some consistency checks, on the assumption that you have pre-filtered your data. Checks ignored include:

  • When looking up concepts any duplicates which are found are ignored and a random one is returned.
  • When creating a relation it is possible for an entity to be doubly associated with a role. This is later cleaned up by engine.
  • Concepts with duplicate ids can be inserted.
  • Duplicate relations can also be inserted.

Ignoring these checks allows data to be processed much faster at the risk of breaking consistency.

What is post-processing?

The distributed and concurrent nature of the Grakn system means that, sometimes, post processing is required to ensure the data remains consistent.

**Role Player Optimisation **

When allocating entities as role players to multiple relations for the first time it is possible to create duplicate associations. These associations do not affect the results of any queries or computations. For example, if in a new system we process simultaneously the following three statements in different transactions:

1. insert $x has name 'Brad Pitt' isa person; $y has name 'Fury'; (actor: $x, movie: $y) isa acted-in;
2. insert $x has name 'Brad Pitt' isa person; $y has name 'Troy'; (actor: $x, movie: $y) isa acted-in;
3. insert $x has name 'Brad Pitt' isa person; $y has name 'Seven'; (actor: $x, movie: $y) isa acted-in;

It is possible for the system to record that Brad Pitt is an actor multiple times. The duplications will later be resolved and merged by Grakn engine.

**Merging Resources **

When using a batch graph, many safety checks are skipped in favour of speed. One such check is the possible existence of a resource before creating it. So if the following transactions are executed simultaneously while batch loading:

1. insert $a has unique-id '1'
2. insert $b has unique-id '1'
3. insert $c has unique-id '1'

It would be possible to create multiple resources of the type unique-id with the value 1. These duplicate resources are similarly merged and resolved by Grakn engine.

Can I run Grakn on an existing Cassandra Platform?

By default, Grakn is shipped with TitanDB, which in turn relies on Cassandra. When you call grakn.sh start, this starts a Cassandra instance and then starts the Grakn server. You are not bound to use our instance of Cassandra, and can make adjustments to the settings in the .properties file in the conf/main directory of the Grakn, e.g. to make Titan use your Cassandra instance.

Specifically you should change the following parameters:

# Host Location
storage.hostname=127.0.0.1

You can also, for example, add the following to specify a custom port:

storage.port = 1234

Please refer to the Titan documentation for more information.

Do applications written on top of Grakn have to be in Java?

Currently, there is no official support for languages other than Java, although you can find blog posts that describe our experiments with Haskell, Python and R. We would be very willing to accept proposals from our community and work with contributors to extend these initial offerings, and/or create bindings to other languages.

How do I visualise a graph?

Grakn comes with a basic visualiser, with a web-interface. We appreciate any feedback you give us about it via the discussion boards. You will need to start Grakn, and then use your web browser to visit localhost:4567 to visualise a graph. Please see the Get Started Guide for more information about the visualiser.

How do I clear a graph?

I want to clear the graph I’ve been experimenting with and try something with a new, different schema and dataset. How do I do it?

If you are using the Java API, it’s a simple as:

graph = Grakn.factory(Grakn.DEFAULT_URI, "my-graph").getGraph();
graph.clear();

If you are using the Graql shell and have not committed what you have in the graph, you can just quit the shell and restart it, and all is clean.

If you’ve committed, then you must stop Grakn and specifically clean the graph:

./bin/grakn.sh stop
./bin/grakn.sh clean

How do I run Graql from a bash script?

If you want to run Graql from a bash script, for example, to grep the results, you don’t want to have to filter out stuff the license and command prompt. The best way therefor, is to use the -e flag or -f flag, which lets you provide a query to the shell. The -e flag accepts a query, while the -f flag accepts a filename. For example:

graql.sh -e "match \$x isa movie;"

Notice that you have to escape the dollars to stop the shell interpreting them. You can then pipe the output into a command or a file.

Comments

Want to leave a comment? Visit the issues on Github for this page (you’ll need a GitHub account). You are also welcome to contribute to our documentation directly via the “Edit me” button at the top of the page.