This document will teach you how to migrate CSV data into Grakn.
Edit me

Introduction

This tutorial shows you how to populate Grakn with CSV data. If you have not yet set up the Grakn environment, please see the setup guide.

Migration Shell Script for CSV

The migration shell script can be found in /bin directory of your Grakn environment. We will illustrate its usage in an example below:

usage: ./migration.sh csv -template <arg> -input <arg> -keyspace <arg> [-help] [-no] [-separator <arg>] [-active <arg>] [-batch <arg>] [-uri <arg>] [-null <arg>] [-quote <arg>] [-r <arg>] [-v]

OPTIONS
 -a,--active <arg>      Number of tasks (batches) running on the server at
                        any one time. Default 25.
 -b,--batch <arg>       Number of rows to execute in one Grakn
                        transaction. Default 25.
 -c,--config <arg>      Configuration file.
 -h,--help              Print usage message.
 -i,--input <arg>       Input csv file.
 -k,--keyspace <arg>    Grakn graph. Required.
 -l,--null <arg>        String that will be evaluated as null.
 -n,--no                Write to standard out.
 -q,--quote <arg>       Character used to encapsulate values containing
                        special characters.
 -r,--retry <arg>       Retry sending tasks if engine is not available
 -s,--separator <arg>   Separator of columns in input file.
 -t,--template <arg>    Graql template to apply to the data.
 -u,--uri <arg>         Location of Grakn Engine.
 -v,--verbose           Print counts of migrated data.

CSV Migration Basics

The steps to migrate the CSV to GRAKN.AI are:

  • define an ontology for the data to derive the full benefit of a knowledge graph
  • create templated Graql to map the data to the ontology
  • invoke the Grakn migrator through the shell script or Java API. The CSV migrator will apply the template to each row of data in the CSV file, replacing the sections indicated in the template with provided data: the column header is the key and the content of each row at that column the value.

Example: Cars

Let’s take a simple example. First, the CSV file, cars.csv:

Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture",,4900.00
1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00

Here is the ontology for the example:

insert

car sub entity
  has name
  has year
  has description
  has price;

name sub resource datatype string;
year sub resource datatype string;
description sub resource datatype string;
price sub resource datatype double;

And the Graql template, car-migrator.gql:

insert                                                                                                                             

$x isa car
  has name @concat(<Make>, "-", <Model>)
  has year <Year>
  has price @double(<Price>)
  if (<Description> != "") do { has description <Description>};  

The template will create a car entity for each row. It will attach year and price resources to each of these entities. If the description resource is present in the data, it will attach the appropriate description to the car.

The template is applied to each row by calling the migration script:

./<grakn-install-location>/bin/migration.sh csv -i ./cars.csv -t ./car-migrator.gql -k grakn

The resulting Graql statement, if printed out, looks as follows:

insert $x0 isa car has name "Ford" has description "ac, abs, moon" has price 3000.0 has year "1997";
insert $x0 isa car has name "Chevy" has price 4900.0 has year "1999";
insert $x0 isa car has description "MUST SELL! air, moon roof, loaded" has price 4799.0 has name "Jeep" has year "1996";

You will note that the second Graql insert is missing the description resource. This is because that value is not present in the data and the template uses an if statement to check if it exists.

Separator

The separator option allows you to specify the column separator. With this we are able to migrate a wider range of formats, including TSV.

Year  Make  Model Description Price
1997  Ford  E350  "ac  abs   moon"  3000.00
1999  Chevy "Venture" ""  4900.00
1996  Jeep  Grand Cherokee  "MUST SELL!
air  moon roof   loaded"  4799.00

This file would be migrated in the same way as the previous example when you specify the separator using the -s \t argument:

./<grakn-install-location>/bin/migration.sh csv -i ./cars.tsv -t ./car-migrator.gql -s \t -k grakn

Where Next?

We have an additional, more extensive, example that migrates genealogy data from CSV. Our sample-projects repository on Github also contains an example that migrates a simple CSV pets dataset, and another example for video games, was described in a separate blog post.

You can find further documentation about migration in our API reference documentation (which is in the /docs directory of the distribution zip file, and also online here.

Comments

Want to leave a comment? Visit the issues on Github for this page (you’ll need a GitHub account). You are also welcome to contribute to our documentation directly via the “Edit me” button at the top of the page.

Tags: migration