Setting up Fuseki

Getting Apache Fuseki up and running with minimal configuration
published: (updated: )
by Harshvardhan J. Pandit
is part of: Semantic Web
database ontologies semantic-web triple-store

Apache Jena is an amazing (Java) framework for working with semantic web ontologies. Fuseki is a SPARQL end-point which is super-easy to set up and use, and TDB is the native triple-store that is already configured into Fuseki and just needs to be enabled. If the purpose of setting up a SPARQL end-point or triple-store is mostly dev and doesn't need to be production grade, Fuseki+TDB is the best way to experiment.

Installing Fuseki

The download for the Jena framework lists the Fuseki downloads under Apache Jena Fuseki, and has two downloads, one of which is a .tar.gz while the other is a .zip. Fuseki needs Java-8 to be installed, so if you don't have that, you can install it with-

sudo apt-get install openjdk-8-jre openjdk-8-jdk

-and you might need to add some repositories (or PPA) to get the OpenJDK into apt. Most online resources detail installing the official version of Java, provided by Oracle, though I would prefer to use OpenJDK rather than get it from Oracle.

To download fuseki-files directly to the server, you can use curl like so-

# this will download the file into the current directory
# link copied from fuseki download page
curl -o fuseki.zip http://www-us.apache.org/dist/jena/binaries/apache-jena-3.4.0.zip

Then unzip the file with

unzip fuseki.zip

Or if you don't have unzip installed, you can use java's packaging tool like-

jar -xf fuseki.zip

If you downloaded the .tar.gz version, use

tar -xvf fuseki.tar.gz

Configurations

The fuseki configurations are in the file run/config.ttl which is in the Turtle format. That's some nice dogfooding right there, a RDF triple-store and SPARQL endpoint configured using RDF itself. The other bits of configurations are in the folder run/configurations/ and are populated by Fuseki if you add in a service, or can have manually added services as well.

The configuration file documentation specifies the various parameters and options that can be entered into the file. There are two types of entries - services and datasets, with services providing a common endpoint for various datasets and configurations.

Service

A service can be declared as (example from official docs) -

<#service1> rdf:type fuseki:Service ;
    fuseki:name                       "ds" ;       # http://host:port/ds
    fuseki:serviceQuery               "sparql" ;   # SPARQL query service
    fuseki:serviceQuery               "query" ;    # SPARQL query service (alt name)
    fuseki:serviceUpdate              "update" ;   # SPARQL update service
    fuseki:serviceUpload              "upload" ;   # Non-SPARQL upload service
    fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store protocol (read and write)
    # A separate read-only graph store endpoint:
    fuseki:serviceReadGraphStore      "get" ;      # SPARQL Graph store protocol (read only)
    fuseki:dataset                   <#dataset> ;
    .

which declares that the service has a SPARQL endpoint, with update and upload features, and serves the dataset defined by #dataset. As there is no special configuration, the dataset is 'stored' in memory.

Dataset

For storing the database using TDB, define the dataset config as (from official docs) -

<#dataset> rdf:type      tdb:DatasetTDB ;
    tdb:location "DB" ; # <----- THIS LINE -->
    # Query timeout on this dataset (1s, 1000 milliseconds)
    ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000" ] ;
    # Make the default graph be the union of all named graphs.
    ## tdb:unionDefaultGraph true ;
     .

this creates (if not present) a folder called DB and stores all the data files in it. This folder is portable, so you can move the folder around, take backups, etc.

Exposing SPARQL end-point using Nginx

There are several more options on the official documentation which are highly encouraged to be read. I'll detail a use-case for setting up the server on localhost, exposing it using Nginx and serving an RDF dataset persisted by TDB.

Setting up Fuseki as a system service

Setting up fuseki as a system service allows the service to be managed using the system utils (service or systemd). There are official docs detailing this, or alternatively, this can be done by creating a file in /etc/systemd/system with the name fuseki.service with the contents -

[Unit]
Description=Fuseki server for SPARQL endpoint
After=network.target

[Service]
User=<user>
Group=<usergroup>
WorkingDirectory=<location of fuseki jar>
ExecStart=/usr/bin/java -jar fuseki-server.jar <options>

[Install]
WantedBy=multi-user.target

Security

The one thing about fuseki is that it offers no security or access control by itself. Instead, Apache Shiro is used to provide a limited amount of security. Shiro allows for setting username/password for access to the fuseki server instance running, so that without the credentials, one cannot access the datasets.