Setting up Fuseki
published: (updated: )
by Harshvardhan J. Pandit
is part of: Semantic Web
database ontologies semantic-web triple-store
Apache Jena is an amazing (Java) framework for working with semantic web ontologies. Fuseki is a SPARQL end-point which is super-easy to set up and use, and TDB is the native triple-store that is already configured into Fuseki and just needs to be enabled. If the purpose of setting up a SPARQL end-point or triple-store is mostly dev and doesn't need to be production grade, Fuseki+TDB is the best way to experiment.
Installing Fuseki
The download for the Jena framework lists the Fuseki
downloads under Apache Jena Fuseki, and has two downloads, one of which is a .tar.gz
while the other is a .zip
. Fuseki needs Java-8
to be installed, so if you don't have that,
you can install it with-
sudo apt-get install openjdk-8-jre openjdk-8-jdk
-and you might need to add some repositories (or PPA) to get the OpenJDK into apt. Most online resources detail installing the official version of Java, provided by Oracle, though I would prefer to use OpenJDK rather than get it from Oracle.
To download fuseki-files directly to the server, you can use curl like so-
# this will download the file into the current directory
# link copied from fuseki download page
curl -o fuseki.zip http://www-us.apache.org/dist/jena/binaries/apache-jena-3.4.0.zip
Then unzip the file with
unzip fuseki.zip
Or if you don't have unzip
installed, you can use java
's packaging tool like-
jar -xf fuseki.zip
If you downloaded the .tar.gz
version, use
tar -xvf fuseki.tar.gz
Configurations
The fuseki configurations are in the file run/config.ttl
which is in the Turtle format.
That's some nice dogfooding right there, a RDF triple-store and SPARQL endpoint configured
using RDF itself. The other bits of configurations are in the folder run/configurations/
and are populated by Fuseki if you add in a service, or can have manually added services as well.
The configuration file documentation specifies the various parameters and options that can be entered into the file. There are two types of entries - services and datasets, with services providing a common endpoint for various datasets and configurations.
Service
A service can be declared as (example from official docs) -
<#service1> rdf:type fuseki:Service ;
fuseki:name "ds" ; # http://host:port/ds
fuseki:serviceQuery "sparql" ; # SPARQL query service
fuseki:serviceQuery "query" ; # SPARQL query service (alt name)
fuseki:serviceUpdate "update" ; # SPARQL update service
fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read and write)
# A separate read-only graph store endpoint:
fuseki:serviceReadGraphStore "get" ; # SPARQL Graph store protocol (read only)
fuseki:dataset <#dataset> ;
.
which declares that the service has a SPARQL endpoint, with update and upload features,
and serves the dataset defined by #dataset
. As there is no special configuration,
the dataset is 'stored' in memory.
Dataset
For storing the database using TDB
, define the dataset config as (from official docs) -
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "DB" ; # <----- THIS LINE -->
# Query timeout on this dataset (1s, 1000 milliseconds)
ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1000" ] ;
# Make the default graph be the union of all named graphs.
## tdb:unionDefaultGraph true ;
.
this creates (if not present) a folder called DB
and stores all the data files in it.
This folder is portable, so you can move the folder around, take backups, etc.
Exposing SPARQL end-point using Nginx
There are several more options on the official documentation which are highly encouraged to be read. I'll detail a use-case for setting up the server on localhost, exposing it using Nginx and serving an RDF dataset persisted by TDB.
Setting up Fuseki as a system service
Setting up fuseki as a system service allows the service to be managed using
the system utils (service or systemd).
There are official docs
detailing this, or alternatively, this can be done by creating a file in
/etc/systemd/system
with the name fuseki.service
with the contents -
[Unit]
Description=Fuseki server for SPARQL endpoint
After=network.target
[Service]
User=<user>
Group=<usergroup>
WorkingDirectory=<location of fuseki jar>
ExecStart=/usr/bin/java -jar fuseki-server.jar <options>
[Install]
WantedBy=multi-user.target
Security
The one thing about fuseki is that it offers no security or access control by itself. Instead, Apache Shiro is used to provide a limited amount of security. Shiro allows for setting username/password for access to the fuseki server instance running, so that without the credentials, one cannot access the datasets.