Setting up Fuseki
published: (updated: )
by Harshvardhan J. Pandit
is part of: Semantic Web
database ontologies semantic-web triple-store
Apache Jena is an amazing (Java) framework for working with semantic web ontologies. Fuseki is a SPARQL end-point which is super-easy to set up and use, and TDB is the native triple-store that is already configured into Fuseki and just needs to be enabled. If the purpose of setting up a SPARQL end-point or triple-store is mostly dev and doesn't need to be production grade, Fuseki+TDB is the best way to experiment.
The download for the Jena framework lists the Fuseki
downloads under Apache Jena Fuseki, and has two downloads, one of which is a
.tar.gz while the other is a
.zip. Fuseki needs
Java-8 to be installed, so if you don't have that,
you can install it with-
sudo apt-get install openjdk-8-jre openjdk-8-jdk
-and you might need to add some repositories (or PPA) to get the OpenJDK into apt. Most online resources detail installing the official version of Java, provided by Oracle, though I would prefer to use OpenJDK rather than get it from Oracle.
To download fuseki-files directly to the server, you can use curl like so-
# this will download the file into the current directory # link copied from fuseki download page curl -o fuseki.zip http://www-us.apache.org/dist/jena/binaries/apache-jena-3.4.0.zip
Then unzip the file with
Or if you don't have
unzip installed, you can use
java's packaging tool like-
jar -xf fuseki.zip
If you downloaded the
.tar.gz version, use
tar -xvf fuseki.tar.gz
The fuseki configurations are in the file
run/config.ttl which is in the Turtle format.
That's some nice dogfooding right there, a RDF triple-store and SPARQL endpoint configured
using RDF itself. The other bits of configurations are in the folder
run/configurations/ and are populated by Fuseki if you add in a service, or can have manually added services as well.
The configuration file documentation specifies the various parameters and options that can be entered into the file. There are two types of entries - services and datasets, with services providing a common endpoint for various datasets and configurations.
A service can be declared as (example from official docs) -
<#service1> rdf:type fuseki:Service ; fuseki:name "ds" ; # http://host:port/ds fuseki:serviceQuery "sparql" ; # SPARQL query service fuseki:serviceQuery "query" ; # SPARQL query service (alt name) fuseki:serviceUpdate "update" ; # SPARQL update service fuseki:serviceUpload "upload" ; # Non-SPARQL upload service fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read and write) # A separate read-only graph store endpoint: fuseki:serviceReadGraphStore "get" ; # SPARQL Graph store protocol (read only) fuseki:dataset <#dataset> ; .
which declares that the service has a SPARQL endpoint, with update and upload features,
and serves the dataset defined by
#dataset. As there is no special configuration,
the dataset is 'stored' in memory.
For storing the database using
TDB, define the dataset config as (from official docs) -
<#dataset> rdf:type tdb:DatasetTDB ; tdb:location "DB" ; # <----- THIS LINE --> # Query timeout on this dataset (1s, 1000 milliseconds) ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1000" ] ; # Make the default graph be the union of all named graphs. ## tdb:unionDefaultGraph true ; .
this creates (if not present) a folder called
DB and stores all the data files in it.
This folder is portable, so you can move the folder around, take backups, etc.
Exposing SPARQL end-point using Nginx
There are several more options on the official documentation which are highly encouraged to be read. I'll detail a use-case for setting up the server on localhost, exposing it using Nginx and serving an RDF dataset persisted by TDB.
Setting up Fuseki as a system service
Setting up fuseki as a system service allows the service to be managed using
the system utils (service or systemd).
There are official docs
detailing this, or alternatively, this can be done by creating a file in
/etc/systemd/system with the name
fuseki.service with the contents -
[Unit] Description=Fuseki server for SPARQL endpoint After=network.target [Service] User=<user> Group=<usergroup> WorkingDirectory=<location of fuseki jar> ExecStart=/usr/bin/java -jar fuseki-server.jar <options> [Install] WantedBy=multi-user.target
The one thing about fuseki is that it offers no security or access control by itself. Instead, Apache Shiro is used to provide a limited amount of security. Shiro allows for setting username/password for access to the fuseki server instance running, so that without the credentials, one cannot access the datasets.