Setting up Pubby

Using pubby to expose resources in a dataset
published: (updated: )
by Harshvardhan J. Pandit
is part of: Semantic Web
ontologies semantic-web web-dev

Pubby is a nifty little tool that is great for exposing RDF datasets accessed through SPARQL endpoints as browsable HTML pages. What this allows is to create a populated web-page for resources available in SPARQL endpoints. Pubby uses DESCRIBE queries to populate the HTML page. To see it in action, visit [OPMW](http://opmw.org] example pages with the Similar Words example showing all RDF links in HTML.

Installation

Pubby can be downloaded from the download page or the source can be accessed through the Github project. Usually, the latest version is advocated to be used, but in this case, I found an unresolved issue with showing RDF prefixes in the generated documents. There was a proposed solution on StackOverflow with two answers that propose adding prefixes to the config file and setting the prefixes as URI both of which did not work in my case. Therefore, downgraded from version 0.3.3 to version 0.3.2. And important change in these two versions is that pubby changed the configuration file format from N3 to Turtle. However, they both still look fairly similar, so there is not much of a change in terms of reading and configuration.

To get pubby, use curl and unzip the contents like -

# wget -O pubby.zip http://wifo5-03.informatik.uni-mannheim.de/pubby/download/pubby-0.3.3.zip
curl -o pubby.zip http://wifo5-03.informatik.uni-mannheim.de/pubby/download/pubby-0.3.3.zip
# use unzip or jar -xf
jar -xf pubby.zip

Serving with Jetty

Pubby can be served using Tomcat or Jetty, or any other mechanism of serving web containers. It does not come with a WAR file, but contains a WEB-INF folder which is ready to served. If pubby is to be served as the root which means it is directly accessible from wherever jetty is running, such as localhost:8080, then the webapps folder must contain the pubby contents as root (folder name). Otherwise, jetty can be configured to run pubby as a servelet at the desired url.

Jetty is available for download as a package, in which case, it is installed as a service, or one can download the portable application and set it up. In this case, jetty can be setup as a service using the file /etc/systemd/system/pubby.service as -

[Unit]
Description=Pubby server using Jetty
After=network.target

[Service]
User=< user >
Group=< dev >
WorkingDirectory=< folder containing jetty >
ExecStart=/usr/bin/java -jar start.jar

[Install]
WantedBy=multi-user.target

Serving using Nginx

Once jetty is running the pubby servelet, Nginx can be configured to serve this using a proxy service as -

location /<DESIRED URL/ {
    proxy_set_header X-Real-IP $remote_addr ;
    proxy_set_header X-Forwarded-For $remote_addr ;
    proxy_set_header Host $host ;
    proxy_set_header X-NginX-Proxy true;
    rewrite ^/<DESIRED NAMESPACE SET IN PUBBY CONFIG>/?(.*) /$1 break;
    proxy_pass http://<JETTY ADDRESS>/;
    proxy_redirect off;
}

Configuration

The pubby config file is located in the WEB-INF folder and is named either config.n3 for N3 or config.ttl for Turtle, depending on the version of pubby being used.

Prefixes

The starting prefixes define the prefix seen in the HTML page output, along with those used on the page.

Server configuration section

This is the section marked as an instance of conf:Configuration.

  • projectName - this is the name of project displayed on the page
  • projectHomepage - this is the URI for the project homepage
  • usePrefixesFrom - this defines the location where the prefixes are loaded from, a value of <> indicates the config file, or this can contain a URI from which the prefixes will be loaded
  • indexResource - this is the URI of the resource that will be displayed when the 'homepage' of pubby is displayed; or to put it in another way, this is the resource that will be displayed on the landing page

Dataset configuration section

This is an section in the Server configuration section, defined as annotations of conf:dataset property.

  • sparqlEndpoint - this is the SPARQL endpoint URL from which resources will be loaded
  • datasetBase - this is the common URI prefix, similar to the @prefix used in SPARQL queries