SEMANTiCS 2019

SEMANTiCS 2019 conference in Karlsruhe, Germany
published: Tue Sep 10 2019
by Harshvardhan J. Pandit
is about: SEMANTiCS 2019 Conference
academic conference Germany image for SEMANTiCS 2019

Day 1
- Plenary - introduction
- Keynote: "Making sense of and taking control of enterprise silos" - Michael J. Sullivan, Oracle
- LegalTech, "To whom it may concern" - Christian Dirschl, Walters Kluwers
- LegalTech, Ensuring GDPR Compliance with KG at large German powertool manufacturer - Magnus Knuth, Eccenca
- Open Government and Semantic Web: A Field Report - Guido van der Wolk, Taxonic
- Keynote: FAIR Data - Michel Dumontier
- Talk: An innovative semantic solution to turn transport data in EU compliance - Marco Comerio
- Talk: A legal knowledge graph for improved law accessibility - Erwin Fitz, WU

Day 2
- talk: Industry proven AI applications based on Enterprise KG - Klaus, Jan i-views
- talk: From monolingual to multilingual ontologies: the role of cross-lingual ontology enrichment - Shimaa, Uni. Bonn
- keynote: Looking for common sense in the Semantic Web - Valentina Presutti
- Closing Session

Day 1Plenary - introductionKeynote: "Making sense of and taking control of enterprise silos" - Michael J. Sullivan, OracleLegalTech, "To whom it may concern" - Christian Dirschl, Walters KluwersLegalTech, Ensuring GDPR Compliance with KG at large German powertool manufacturer - Magnus Knuth, EccencaOpen Government and Semantic Web: A Field Report - Guido van der Wolk, TaxonicKeynote: FAIR Data - Michel DumontierTalk: An innovative semantic solution to turn transport data in EU compliance - Marco ComerioTalk: A legal knowledge graph for improved law accessibility - Erwin Fitz, WUDay 2talk: Industry proven AI applications based on Enterprise KG - Klaus, Jan i-viewstalk: From monolingual to multilingual ontologies: the role of cross-lingual ontology enrichment - Shimaa, Uni. Bonnkeynote: Looking for common sense in the Semantic Web - Valentina PresuttiClosing Session

summary: SEMANTiCS took place at Karlsruhe this year over two days (10-11) with workshops on 9 and DBpedia day on 12th. There was a mix of industry and academic talks about the use of semantics (~50% industry), which is expected from the conference. This year, there were special tracks for LegalTech (10th) and Cultural Heritage (11th), which consited of academic research papers as well as invited talks from industry. There were interesting keynote talks, from Oracle on the use of semantics and KG, from Michel Dumontier regarding FAIR data, and from Valentina Presutti regarding representation of common sense using sem-web.

There were quite a variance in the topics presented (though they had the commonality of semantics). There was research regarding Wikipedia/Wikidata/DBpedia, Cultural Heritage, Building/Transport data, Query Processing - though there were almost no papers which were based on core ML or NLP tasks which was a good sign of the focus moving back to semantics.

In terms of papers, the conference had 20 full and 8 short papers with an acceptance rate of 27% and 31 posters with an acceptance rate of 66%. Awards were given to RSP-QL* - Statement level annotations in RDF streams for Best Paper, Transfer Learning for Biomedical NER with BioBERT for Best Poster/Demo.

Harshvardhan Pandit @coolharsh55 from @tcddublin @AdaptCentre presents how to provide #semantic annotations to #GDPR consent forms online and enable semi-automatic compliance to the regulation. #semanticsconf https://t.co/hvOR8xR1hY pic.twitter.com/llsjqc6qIJ
— Bianca Pereira (@bianca_oli_per) September 10, 2019

I presented a paper titled "Test-driven approach towards GDPR compliance" in the LegalTech track based on PhD work in the validation and linking of GDPR compliance using sem-web (paper and resources: https://w3id.org/GDPRep/semantic-tests) as well as a poster titled "OPN: Open Notice Network" based on work done on incorporating semantics to create an open notice for transparency with Mark Lizar (industry partner). Additionally, a poster advertising the Data Privacy Vocabulary was also on display at the conference, which was attended by DPVCG members Sabrina, Fajar, and Javier. Some interesting networking opportunities took place based on my PhD work (with Sabrina of SPECIAL for GDPR compliance, and Heiko Paulheim of Uni. Mannheim and Jan of SAP regarding domain specific ontology matching), as well as that of the DPVCG (Maria Pieper from FZI who also works in LegalTech).

@coolharsh55 presenting his work on GDPR compliance @SemanticsConf #gdpr #legaltech #compliance pic.twitter.com/4sF8SKMjlg
— Sabrina Kirrane (@SabrinaKirrane) September 11, 2019

I also had the opportunity to chair Session 4.4 on Knowledge Graphs on the 11th. There were two talks on the day. The first was an industry talk by i-views on the use of KG for exploration of experience in projects. The second was from the research team at Uni. Bonn, which was presented by Shimaa Ibrahim, and featured their work on Multi-lingual ontology enrichment. For me, the multi-lingual talk was of particular interest based on past experience with trying to generate multi-lingual thesauri of GDPR concept, and the eventual frustration of not being able to use existing techniques due to their ineffectiveness in domain specific ontologies.

If you are at the Semantics conference be sure to check out the W3C DPVCG vocabulary poster https://t.co/rRzgJdsU5S @SemanticsConf @AxelPolleres @coolharsh55 @specialprivacy #privacy #gdpr #legaltech pic.twitter.com/TASPq94ve3
— Sabrina Kirrane (@SabrinaKirrane) September 11, 2019

Wi-Fi password KA2019FIZ

Day 1

2019-09-10

Plenary - introduction

Semantics 2020
- April 21-23 - Austin, TX, USA
- September 07-10 - Amsterdam, NL
Papers: 88 submissions (20 full, 8 short accepted, 27% acceptance rate); Posters: 47 submitted (31 accepted, 66% acceptance rate)
28 papers, 37 industry presentations, 7 workshops, 2 tutorials, 31 posters

Keynote: "Making sense of and taking control of enterprise silos" - Michael J. Sullivan, Oracle

large-scale print documentation, Linotype to genrate, focus on creating general pattern language for technical documentation (inspired by Tufte)
investigate implementing "taxonomy as a service" based on Oracle OCI for use in Oracle CX apps
Oracle DB one of the first to implement RDF, but not on anyone's radar from KG perspective
<50% of structured data used to make decisions, <1% of unstructured data analysed - Harvard Business Review (2017)
Why graphs / RDF / semantics are a solution:
- RDF requires URIs not strings for resources - makes integration easier e.g. no duplicates
- SPARQL/SHACL has reasoninreasoners that can make semantic sense out of disparate data e.g. owl:sameAs, differentFrom, inverseOf
- RDF middleware can hide complexity
- Oracle's implementation of RDF/Semantics reuses database features (materialised views, RMAN, RAC, etc.)
RDF solving data warehouse challenges
- contexts - schema on read
- conformed dimensions - sameAs inference
- slowly changing dimensions - forward chaining
- time series queries - events, dataWeb, Class/subClass inference, multiple inheritance, foward chaining
Amazon's solution for big data warehouse is complex and requires a lot of tools
Using a semantic data warehouse provides a semantic warehouse than can span across dimensions/silos
Problems: reconciling common URIs is problematic - mapping is active area of research
methodology to solve semantic heterogeneity
- collect set of use-cases / queries to be answered across silos
- create top-level schema (T) to answer use-cases (just enough information)
- map each silo schema to top-level schema using OWL/SHACL axioms (A) just for that silo
- create entailment E using A over T for silos
- create virtual model V for E + T for silos
- query V to answer use-cases
- repeat as use-cases come in
- won't work if all silos were attempted to be mapped at start itself
pattern for reconciling known issues
- create SPARQL endpoint for each silo
- expose knowledge as schemas/data streams, aggregtes, analytics - APIs don't work
- use A+T --> E to create read only master views over all silos
- could create multiple virtual models
instead of named graphs, have multiple instances (as silos) for scalability
Oracle DB 19c supports these features / methods
final thoughts: should have people in graphs (knowledge) for serendipity

LegalTech, "To whom it may concern" - Christian Dirschl, Walters Kluwers

Wolters Kluwers slide: Legal is very local - dependant on language and jurisdiction
Law firms sell advice, consultancy; LegalTech firms sell (digital) service
Global survey about future of law: two outcomes - significant transformation (disruption), rapid acceleration (in next 3 years)
- independently conducted, 700 professionals across US and 10 countries across Europe in law firms, corporate legal departments
LegalTech companies index provided by Standford techindex.law.standford.edu
legalcomplex.com collect (financial) data about companies and startups, show analytics of money by sector, domain - sector 8 legaltech, sector is making money (income flow, not profit)
most of the money is going towards AI - search, IR, legal analysis, blockchain
survey outcomes: >60% lawyers expect impact from LegalTech, only fraction think they can cope with them
reasons new technology is resisted
- 36% lack of technology knowledge, understanding of skills
- 34% existing organisatonal are efficient
- 30% financial constraints
>50% layers expect transformational technologies, <24% have a good understanding of them
- big data and predictive analytics, machine learning, AI, robotic process automation, blockchain
smartlaw.de forms, resources for german legaltech
study download: info.wolterskluwer.de/studie-future-ready-lawyer
Q&A: domain knowledge differentiates (provides advantage) from big players who have more data, more resources

GDPR data is interdependant
- data (ID, name) → processes (CRM, accounting) → purposes (marketing, billing) → lgal validation (Consent, contract) → legal framework (GDPR, trade law)
- data is not centrally managed, have separte data owners, for their use-case >200 separate data silos
- there is a directory of processes and applications which lists all data being handled by processes within comapny or sub-contractor
company gets data subject request to DPO (A15)
- first challenge is to identify applications that contain personal data of requesting data subject
- same for personal data categories
- not practical to ask every data owner
- information about processing is stored in directory
- legal basis can be disparate for different silos
eccenca solution - connect everything via a middleman/middle layer
- data from heterogenous sources is integrated into one knowledge graph, and GDPR team has access via dedicated interfaces
- personal data search, meta data catlog, compliance dashboards, data import → only import metadata
- PII is stored in search index to identify data subject
ontologies to represent domain knowledge e.g. GDPR
summary
- data is linked with requirements of GDPR (consent, purpose, processing)
- record of processing activities, legal bases, and retention periods
- application/metadata discovery
SAR
- rights: access A15, rectify A16, erasure A17, restriction of processing A18, data portability A20
- upon request, JIRA ticket is created
- identify data subject using search index in different applications / data silos
- level 1 report: meta data about data subject
- level 2 report: data export / deletion / update confirmation
search interface
- personal data categories
- search resuls returning personal data with source
sub-tickets in JIRA for tasks and targeted applications
Auditing and Reporting
- proof of compliance to stakeholders and authorities
- compliance for all internal data processing operations
- personal note: compliance dashboard is handling compliance requests
For internal stakeholders, JIRA is used to track internal issue status, trackers, analytics
explore views and complex queries e.g. instance x data object x consent (personalised offers)
- identify data for future processing activities - compatibility
integrate power BI for GDPR compliance dashboard - consent midding by data object for data categories
summary
- deliver metadata to external applications (BI, analytics, dasboards)
- existing IT infrastructure is not affected, does not replace legacy system
- DPO: identification of data subjects across applications, control and transparency of compliance, no duplication of data through separating instance data from metadata
- application owners: efficient processing of SAR without interfacing legacy system, operational processes and established workflows are no disrupted
booth at SEMANTiCS

Open Government and Semantic Web: A Field Report - Guido van der Wolk, Taxonic

new dutch legislation (1/1/2021)
- every government org must publish their notices in official gazete online
- every 18+ citizen will receive customisable IDs
officla gaazette of NL
- 300,000+ publications a year, official source since July 1, 2009, cetralised government (states, parliament) and decentralised government (provinces, municipalities, water boards) - bulk
- XML, HTML, PDF, ODT, Metadata (XML)
- search platform, geo-based email subscription
- open data (CC0)
- officielebekend.makingen.nl
currently the data is not usable
working on data hub using FAIR - retrieve data and metadata, deploy enriched data
- data wrangling, define semantic model, make data linkable, data is represented using XML
- combine with other data, query combined data, monitoring and visualisation
- MarkLogic, Java, Python, vue.js - MarkLogic chosen because data is stored in MarkLogic document storage
- data wrangling challenges - prefixes are not stored in same field, data clearning required, fields are concatenated
- semantic enrichment - dct, dcam, dcat, foaf, geo, prov, rdf/s, skos, legal domain dutch law: bwb, ecli, lido, internal references: oep, overheid, overheidop
combine with other data - registry of government organisations, geo and demographic information, judicial information
interactive queries using YASGUI
enriching publications with KG
- enalbes data quality monitoring/analysis
- enables custom open government
- prepares for ML enrichment

Keynote: FAIR Data - Michel Dumontier

FAIR
- unique identifiers to retrieve all forms of digital content and knowledge
- high quality metadata to enhance discovery of digital resources
- use of common vocab
- etablish community standards
- detailed provenance
- registered in appr. repos
- social and technological commitments
- simpler terms of use to clarify expectations and intensify innovation
FAIR != Open - open as possible, closed as necessary
- document your data (with metadat) for potential findability and reuse, not necessarily make it open (publish)
why should I go FAIR?
- easy to use my data for new purpose
- easy for other people to find, use, city my data, and understand what I expect in return
- easy to verify my work
- ensure data are available in future
- satisfy expectations around management from institution, agencies, peers
semantic web provides ways for publishing data, metadata, frameworks, ecosystems
Bio2RDF - OSS uses sem-web for reusing biomed data
reproduce original research
- reimplement PREDICT: inferring novel drug indications with application to personalised medicine
- original result: AUC 0.91, new result over new data: AUC 0.83
efficiently explore web of data: explore probabilistic drug (re-)use using a KG to identify potential applications of existing drugs and potential candidate drugs
FAIR metadata
- metadata identifier
- resource identifier
- standardized, machine readable format
- use of community vocabularies
- license ???
- provenance ???
W3C HCLS Community Profile w3.org/TR/hcls-dataset/
- ShEx validator (github, convertable to SHACL)
In addition to FAIR, there are 15 guiding principles fairmetrics.org
- 14 universal metrics covering FAIR sub-principles
- Metrics demand evidence, not standards
- machine-readable metadata, resource management plan, additional authorisation procedures
- publically registered, identifier schemes, access protocols, KR lang, licenses, provenance spec, community standards
- evidence resource can be located in search results
automatically assess FAIRness of digital resource w3id.org/AmIFAIR
- tests metrics
- evaluating FAIR maturity through a scalable, automated, community-governed framework
- each metric is registered as an API service, which can be executed automatically
mine distributed, access retricted FAIR datasets in a privacy preserving manner
- privacy preserving machine learning
- made available through FAIR data stations
semantics, coupled with AI, may enable humans, aided by intelligible machine agents, to exploit internet of shared data and services
Q&A: FAIR is a gradient of increasing competencies, not an absolute target. It will evolve and move as we churn through technologies.

Talk: An innovative semantic solution to turn transport data in EU compliance - Marco Comerio

EU Reg 2107/1226
- requirements
- impact on transport stakeholders
- challenge & opportunities
establish interop framework enabling EU players for interop business applications
- barriers: insufficient accessibility of transport data, lack of service and data interop
- key enablers: data sharing mechanism, data interop by means of common set of data exchange standards
Each EU member state is required to setup NAP by regulation
rely on in-house support for data conversion process, which may lack knowledge and skills related to regulation - or turn to external providers that provide custom and expensive solutions
impact on transport stakeholders:
- obligations: provide datasets to NAP compliant to the requested data formats, provide metadata description of datasets
- challenge: turn available data into requested formats, and enrich them with additional data sources
- benefit: additional data sources
reference ontologies - unambigiously describe operational aspects of transport domain
- metadata profiles - harmonise metadata description of datasets
- data converters: turn available transport data into specific formats, and enrich with additional sources e.g. translate schedule, fare info
contributions
- conceptualisation - acquire domain knowledge, data formats, standards → define reference ontology
- sharing - asset types, asset descriptors
- governance - identify actors, roles, tasks; define lifecycle
SNAP solution
- uplift from source format into reference ontology → chimera provides options for RML (CSV,DB,etc), Java
- downlift to target format → chimera provides two options Apache Velocity, Java annotations
  - Apache velocity template: beginning of template has SPARQL query binding variables to data required in template
  - Java annotations: annotations identify mappings
- chimera converter: lifeting, data enrichment, inference enrichment, lowering;; based on Apache Camel
SWOT
- strengths: flexibility, reusability
- weaknesses: handmade mappings no tooling, semantic/logic skills required
- opportunities: conceptualisation of domain, applicability to different domains, semantic NAPs with transmodel RDF data
- threats: bad ontology and mapping
transmodel-cen.eu

Talk: A legal knowledge graph for improved law accessibility - Erwin Fitz, WU

Legal data is expressed natural language, most times it is heterogenous, and has incomplete metadata
searched 'car' via Eurlex in datasets for Austria, Germany, Italy, EU
- Austria: case number and dates
- Germany: some additional data
- Italy: no central database, heterogenous data from court cases; 'auto' returned 1 result
problems:
- court decisions have references to other laws, dates, documents
- Austria linking law - problems with versioning as laws might change
- mainly keyword based
- need to filter
- ambigious terms
Solution:
- central search interface, ideally across EU
- using semantic search
- linked documents to support better information lookup
- add external sources
- standardised document classification schema
desired
- interlinked legal documents
- use standardised identifiers (ECLI, ELI)
- minimum set of metadata for legal documents
ECLI / ELI is not implemented / adopted by all countries in the EU
which sources can be used?
- EU source influence national law
- EurVoc and EUR-lex
information extraction
- patterns using regex
- gazetteers - compare text to lists/trees (existing list)
EuroVoc for thesaurus
approaches
- TF-IDF
- Word2Vec, Doc2Vec + combine with TF-IDF
- fast.ai deep learning
- JRC-acquis V3, KE-Darmstadt corpus
ADORN - automatic document RDFa annotator
- GUI
- query/store documents in file
- automatically annotate and classify documents
- export in RDFa to display in HTML

Day 2

2019-09-11

talk: Industry proven AI applications based on Enterprise KG - Klaus, Jan i-views

why is semantics difficult to introduce in interprise?
- experimental applications dominate
- not managed by domain experts
- how do we escape the sandbox?
do knowledge graphs help? old wine in new bottle? hype cycle?
consultants are more interested in talking to other consultants about experience and seeing reference projects
- e.g. project diet - diet consultant simulating real dietician
system and KG should not only close the gap (in knowledge, application) but also allow exploration
- should be able to edit information (easily)
- should be able to react to new vocabularies as they permeate
related terms and applications can be explored (semantics: related to, subtopics)
learn from user behaviour by prioritising edges that lead to desired result (in search)
- personal note: perhaps this construcuts a form of weighted graph for IR
Rather than worrying about RDF, SPARQL, worry about CSV and integration of structured sources
learning
- analytics, text analytics, modelling
- along with knowlegde engineers we also need to expose end users to the KG otherwise it won't grow
services
- authorisation: on objects/relations, on meta-data, via the graph itself
- auditing: audit log for all data access
- security: access control, integration, encryption

talk: From monolingual to multilingual ontologies: the role of cross-lingual ontology enrichment - Shimaa, Uni. Bonn

multi-lingual ontology
- entities and relations are present in multiple natural languages
- e.g. dbpedia - en, fr, de
processes for multi-lingual
- cross-lingual matching - match source to target in different natural language
- cross-lingual ontology enrichment
  - depends on matching
  - expand the target ontology with additional information extracted from external resources
motivation - 73.46% EN in LOV, 7.92% FR, 4.84% DE
- manual enrichment is error-prone and difficult
- monolingual ontoglogies are not easily understandable to other language speakers
previous work: OECM (ESWC 2019 Poster)
new approach
- use semantic simlarity
- enrich by ading new classes in addition to elated classes in hierarchy
- automated
- non EU languages
steps
- extract ceoncepts and translate using Google Translate, for multiple matches, select all
- pre-process: use NLP tokenisation, POS-tagging, Stop words removal, lemmatization, true casing
- identify potential match and select best match based on similarity score - Jacquard (string) & WordNet (semantic)
- output is matched terms between source and target ontologies
- triple-retrieval: takes matched terms and retrieves triples for matched terms and related classes
- enrichment: retrieve triples and enriched terms and add to target ontology
- validation: semantic (reasoners), syntactic (W3C ontline validation)
personal question:
- will Word2Vec from source and target languages reveal similarity in labels???
- labels might not be valid dictionary words, can we also utilise definition and other annotations
use-case: SEO Scientific Events Ontology
- enrich SEO (49 classes) from another ontology Conference (DE, 60 classes)
- new ontology had 20 new classes
- enrich SEO from ConfOf (Arabic)
- new ontology has 37 new classes
evaluation
- MultiFarm benchmark - 7 ontologies, translation into 9 languages
- evaluate effectiveness - compare with reference alignment
- compare with SotA
- evaluate enrichment process quality - manually enrich by expert to create gold standard, compare for evaluation, results 80%

keynote: Looking for common sense in the Semantic Web - Valentina Presutti

adoption and usage of semantic web in industry
smart agents currently
- do not reason
- are not aware of surrounding context
- do not have 'common sense'
- have the answer built-in in the best case
- borrow from Wikipedia or other sources
- issue a query on Google
lack of common sense
- "common sense" is knowledge which we share but do not explain explicitly
- common sense facts/knowledge is needed for reasoning
- existing knowledge graphs encode domain specific knowledge
Role of semantic web - to create a graph of common sense (Dagstuhl Report)
research on common sense in semantic web
- search on scholarly data website → 2197 papers, only 3 papers excluding Wikipedia, only 1 unique papers
conceptnet.io
- labeled graph (semantic network) targetting text processing
- accumulated ~1M english facts
- crowdsourced, reusing Wiktionary and WordNet and aligned partially to DBpedia
- provides JSON-LD APIs
- e.g. knife has three sub-graphs
  - knife as a noun
  - knife as a verb
  - knife as a object
- we need to give it formal semantics (e.g. via OWL) in order to utilise this for reasoning/inferences
- there is no information about situational semantics, validity, and applicability
NELL rtw.ml.cmu.edu
- ML system that reads the web and extracts facts from textual web documents
- running since 2010, ~50M candidate beliefs, ~2.8M high confidence beliefs
- candidate beliefs encoded as KN of facts and ontology of categories and relations
- available as LOD
- no formal semantics no categorisation, no constraints or dependency
Atomic homes/cs/washington.edu/~msap/atomic/
- textual descriptions of inferential knowledge (if-then clauses) based on 3 types of if-then associated with 9 dimensions of inferential and casual types
- accumulated ~877k textual descriptions
- crowdsourcing of blank placeholders put in 24k event phrases
- no formal semantics
Human Know-How dataset datashare.is.ed.ac.uk/handle/10283/1985
- dataset and ontology (PROHOW)
- labeled with a sequence
- nodes are labeled in text, not semantics
FrameNet
- lexical resource, which describes frames which are situations
- frameelements are nodes which add semantics to the frame
- associated actions or situations can be related to 'evoke' the frame e.g. slide for cutting
Framester w3id.org/framester
- LOD resource that connects linguistic data with factual and ontological data
- encodes 50M links between 21 resources
  - resources: DBpedia, WordNet, DOLCE, FrameNet, SentiWordNet, ConceptNet etc
  - linking: skos closeMatch etc
FOX w3id.org/fox
- do foundational distinctions match common sense?
- are they present in LOD?
- class vs instance e.g. is a building a class or an instance

Closing Session

SEMANTiCS 2019 has 426 participants
28 papers, 37 industry presentations, 7 workshops, 2 tutorials, 31 posters
Best Paper RSP-QL* - Statement level annotations in RDF streams
Best Poster/Demo - Transfer Learning for Biomedical NER with BioBERT for Best Poster/Demo
Industry Innovation Award - Upstream - Managing Knowledge in the Oil & Fas Industry into the digital age
pre-proceedings https://cutt.ly/semantics2019
proceedings will be Open Access (coming soon)