Karstlink how to

Introduction

Semantic Web (SW) is more than a set of data formats. If you are accustomed to a classical data format like spreadsheets and CSV, JSON, XML, or YAML, you are used to data in isolation. The Semantic Web is designed to facitate links between individual documents. The key feature is the use of global identifiers instead of local ones like database keys, XML ID's, etc. This enables to refer and thus enrich to data available elsewhere.

These global identifiers are generally HTTP URL's . When sharing data whith the SW, the good practice is to use a URL prefix starting with your organization's URL; this way one is sure that no other organization will use these identifiers. For example, suppose your organization has web site https://myorg.org/ , and you have an SQL table for persons that you want to share, the identifier will be https://myorg.org/persons/111 for person number 111 . It could be https://myorg.org/data/persons/111 , or https://data.myorg.org/persons/111 , you can choose any scheme you want . These URL's can later provide RDF data in one of the formats below, but this is not mandatory.

In SW, the atomic data is a "triple" subject - property - value. Subject and property are URL's (more generally URI, IRI's). Depending on the data export or dump that you may already have, the concrete formats can be one of the following (see below): You may send us a sample of your data or data structure (e.g. SQL DDL) in any format easily obtained, so that the technical discussion can start on concrete things.

The data site, that aggregates speleology data is here: data.grottocenter.org (at the moment 4 datasets).

Share data with a .csv file

You can put your data in a CSV file. This file must respect one of the models given on the example page . Please indicate on the wiki the existence of this CSV file. The files are different for each data type (cavity, area, ...), but there are some common elements. They are all presented in the table below

When we speak of "resource" in the table, it is the information available in a cell of the table. If an external resource is specified, the license for that document is not known.

If in a field you have several values, separate them with a |. For example "Peter | Eric"

Column explanations mandatory Data type
id Unique identifier, if possible the one that the resource has in your database yes string
rdf:type The type of resource: document, underground cavity, bio-speleo observation. Look at the ontology and the examples to find the list of labels yes URI or string
dct:rights/karstlink:licenceType Indicate the license attached to the resource from the values present in the ontology yes URI or string
dct:rights/dct:created Resource creation date no date AAAA-MM-JJ
dct:rights/dct:modified Resource modification date. This information is useful for managing the updating of data on the SPARQL servers that collect the data. no date AAAA-MM-JJ
dct:rights/cc:attributionURL URL that points to the organization or person (an agent) making the resource available. This agent plays a publisher role and has the rights allowing him to make the resource available under the specified license yes URI
dct:rights/cc:attributionName The name the creator of a Work would like used when attributing re-use. yes String
karstlink:documentType The type of document: collection, number, article, database. Choose among the types offered by the ontology yes URI or String
rdfs:label the name of the resource yes string
dct:subject The subject of the resource. Choose among the subjects appearing in the ontology no String or URI
dc:langage The primary language used in the resource no controlled vocabulary ISO 639-2
gn:countryCode The country linked to the resource no iso country code, 2 characters
dct:date Use to indicate the date on which the document that is described in the resource was published no String
dct:format File format that is described in the resource no use the list of Internet Media Types [MIME]
dct:identifier An unambiguous reference to the resource no ISBN, ISSN, DOI, URI
dct:source A link to the file described in the resource not URI
dct:creator the author of the document described in the resource yes String or URI
dct:publisher The organization that published the document described in the resource no String or URI
dct:isPartOf Indicates that the document described in the resource is part of another document. For example an article is part of a journal, an issue is part of a collection no An URI is better
dct:references Link to another resource which is related to the document described in the resource: Point, Observation, Area, Organization no An URI is better
karstlink:relatedToUndergoundCavity Link to Underground cavity which is related to the document described in the resource no An URI is better
karstlink:hasDescriptionDocument/dct:creator author of the resource description no String or URI
karstlink:hasDescriptionDocument/dc:language language of the resource description no controlled vocabulary ISO 639-2
karstlink:hasDescriptionDocument/dct:title resource description title no String
karstlink:hasDescriptionDocument/dct:description Resource Description no String
gn:alternateName Used to indicate another name when the resource is a underground cavity no string or URI
schema:containedInPlace Used to indicate when the resource is an underground cavity, that it is part of another underground cavity (a network) no URI is better
w3geo:latitude The WGS84 latitude of the resource yes decimal degrees
w3geo:longitude The WGS84 longitude of the resource yes decimal degrees
w3geo:altitude The altitude of the resource no integer
dwc:coordinatePrecision A decimal representation of the precision of the coordinates given in the Latitude and Longitude. Use -1 for falses coordinate no integer
karstlink:length The real development of all the galleries of the underground cavity no integer
karstlink:verticalExtent The vertical distance between the entrance to the underground cavity and the highest point no integer
karstlink:extentBelowEntrance The vertical distance between the entrance of the cavity and the lowest point no integer
karstlink:extentAboveEntrance The vertical distance between the highest point and the lowest point of the underground cavity no integer
karstlink:discoveredBy Agent (person or organization) who discovered the underground cavity no URI or String
karstlink:hasAccessDocument/dct:creator author of the description of the access to the resource no URI or String
karstlink:hasAccessDocument/dc:language Language of the description of the access to the resource controlled vocabulary ISO 639-2
karstlink:hasAccessDocument/dct:title Title of the description of the access to the resource no String
karstlink:hasAccessDocument/dct:description Description of access to the resource no String
dwc:recordedBy Person who carried out the observation
dwc:eventDate Date of observation
dwc:identifiedBy Person who determined the Taxon
dwc:dateIdentified date of Taxon determination
dwc:individualCount Number of individuals of a taxon
dwc:associatedTaxa Taxon name yes String or URI
karstlink:relatedToUndergroundCavity Underground cavity connected to the resource no
dct:spatial Location of the observation (link to a point or area, not to a underground cavity) no String or URI
foaf:firstName First name of the person described in the resource yes String or URI
foaf:lastName Last name of the person described in the resource yes String or URI
foaf:nick Nickname of the person described in the resource no String or URI
foaf:member Link to an organization of which the person or organization described in the resource is a member no String or URI
karstlink:visited Link to a cavity visited by the organization or the person described in the resource no String or URI
karstlink:pointType Used to indicate the type of resource. Choose among the types of points existing in the ontology no String or URI
foaf:mbox An Internet mailbox. no mail or URI
foaf:homepage A Homepage is a public Web document with an URI. no URI
schema:streetAddress The street address no String
schema:postalCode The postal code no
schema:addressLocality The locality in which the street address is no String
schema:addressCountry The country code no the two-letter ISO 3166-1 alpha-2 country code
karstlink:areaType Used to indicate the type of resource. Choose among the types ofareas existing in the ontology no URI or String
schema:polygon A polygon is the area enclosed by a point-to-point path for which the starting and ending points are the same. A polygon is expressed as a series of four or more space delimited points where the first and final points are identical. no

Share data with RDF dump

The different concrete formats (syntaxes) for RDF are the following (see RDF Serialization_formats ) If you already have a JSON dump, it is possible to upgrade it as JSON-LD, without impacting the current users (see paragraph below). If you already have an XML dump, it can be the starting point for a RDF/XML dump, without too much structural change. If you have no dump or API whatsoever, and do not want to go the CSV way, there are 2 ways: N-Triples and direct mapping from SQL using R2RML.

N-Triples is easy to generate, because it presents any data in the most atomic way, in line with the basics of semantic web: Here is a line exemple:
<http://myssite.org/person/111> <http://www.w3.org/2000/01/rdf-schema#label> "Frédéric Urien" . 

If you have an SQL database , a so-called direct mapping from SQL is possible by using dedicated tools. There is even a W3C standard for mapping an SQL database to RDF, R2RML . Leveraging this mapping language, several tools can either generate an RDF dump, or even provide a SPARQL server that wraps the source SQL database as a SPARQL database.

Turtle is widely used in the Semantic Web; it is based on N-Triples, meaning that every N-Triples document is a Turtle document. It comes with several syntactic features to make it more human readable and shorter, and hence much more complex. So I would not recommend to use it for data sharing from scratch.

Share data with a Json-LD API

If you already have a JSON API, it is possible to use the JSON-LD technology, to extends the existing JSON returned by the API, without impacting the current users. JSON-LD provides a way to interpret a key-value pair in the original data as an RDF triple. It can work even with no modification at all in the JSON API. But it works best by adding a few dedicated special keys: @id, @type, @context .

An @context is a special declaration JSON document that describes the mapping. We can write one for you. Here are examples of @context JSON documents recently writen for Kartslink: And here is an example of a JSON API augmented with the JSON-LD special keys: https://beta.grottocenter.org/api/v1/massifs/555 . It should be noted that JSON-LD documents are 100% syntactically correct JSON documents.