Setting up Openlink Virtuoso

Setting up Virtuoso as a triple-store and serving with Nginx

Openlink Virtuoso is a powerful triple-store (and also a traditional RDBMS) with many different features. Setting up virtuoso is easy as packages are available in most distributions. Virtuoso has a bizzare collection of documentation which is scattered, unorganised, and sometimes missing. Depite this, it is a solid tool which is easy to set up and use, and comes configured ready for production use.

Installation

The package virtuoso-opensource is available on Debian based systems, and can be installed with -

sudo apt-get install virtuoso-opensource

which will install virtuoso and set it up as a system service with the name

virtuoso-opensource-X.x

with X.x being version numbers, which for me were 6.1. The service can be managed as:

# start, stop, restart, status
sudo service virtuoso-opensource-X.x start
sudo service virtuoso-opensource-X.x stop
sudo service virtuoso-opensource-X.x restart
sudo service virtuoso-opensource-X.x status

During installation, virtuoso will ask to set a password for two users - DBA and DAV which are like admins for the web interface and management actions. It is essential to remember the password as this is required to make changes to virtuoso and also to add other users.

Configuration

The config file is located at -

/etc/virtuoso-opensource-X.x/virtuoso.ini

and contains settings for storage location and server settings. Virtuoso has the option of serving the management interface over a SSL certificate (located in the Parameters section) which is commented out by default. The configuration for the Web interface is in the HTTPServer section.

ServerPort refers to the port the virtuoso interface runs at, which is 8890 by default, which can be changed through this option. A description of the various options is available at link.

Conductor

The virtuoso web interface is called conductor, and offers management capabilities for all its features. It is served by default at /conductor URL prefixed by wherever virtuoso is being served.

Linked Data

The linked data section in Conductor offers a SPARQL endpoint, query interface, and management capabilities for graphs and datasets. The default tab for SPARQL is a query interface which queries the (default) graph specified and displays the results in the page itself. Graphs shows all available graphs in the triple store, and virtuoso comes with a lot of RDF data and some graphs by default, which one can assume are required for its configurations and data settings. The Namespaces tab shows the stored namespaces for RDF graphs, and one can add custom namespaces here. Quad Store Upload provides a simple way to upload a RDF file as a dataset or import it from a URL. It requires the named graph IRI under which the dataset is stored in the triple store. There is no default graph, therefore the namespace has to be provided.

iSQL

Virtuoso provides a utility called Interactive SQL or iSQL which is accessed using isql-vt or can be symlinked from /usr/bin/isql-vt. This utility provides SQL-like access to the datasets which can be used to perform SPARQL queries or upload data into the triple store.

SPARQL Endpoint

By default, /sparql is the SPARQL endpoint provided by virtuoso, and requires no access control to set up or access. So once you have used Conductor or iSQL to upload the dataset, the SPARQL endpoint is ready to serve the data for the given graph IRI. The only thing to configure is to serve datasets under a given IRI.

Exposing Virtuoso interfaces using Nginx

By default, Virtuoso runs at localhost:8890, which Nginx can be configured with a proxy to pass traffic to the server. However, for some reason, Nginx cannot pass in a reverse proxy, or map URL to the localhost as required. A hack around this is to configure all the locations virtuoso requires as URL accesses, and proxy pass them to the Virtuoso server. A list of them is-

/virtuoso
/conductor
/about
/category
/class
/data
/describe
/delta.vsp
/fct
/issparql
/ontology
/page
/property
/rdfdesc
/resource
/services
/snorql
/sparql-auth
/sparql
/statics
/void
/wikicompany

If a particular service is to be restricted or not provided, then simply remove its URL from the Nginx configurations. An example of a proxy configuration for a URL is -

location /sparql {
    proxy_set_header X-Real-IP $remote_addr ;
    proxy_set_header X-Forwarded-For $remote_addr ;
    proxy_set_header Host $host ;
    proxy_set_header X-NginX-Proxy true;
    rewrite ^/virtuoso/?(.*) /$1 break;
    proxy_pass http://localhost:8890/;
    proxy_redirect off;