Relevant concepts

Overview of concepts and definitions 

Various concepts and definitions are used on the National Data Portal. Here you will find an explanation of them.

If you miss a term or if you have a question about a particular concept, please feel free to contact us via data@koop.overheid.nl.

Referential data sets

Government organisations in the Netherlands make various data sets available as open data. These data sets are made freely available for use. A pilot by Statistics Netherlands (CBS) and the Land Registry (Kadaster) showed that users often use a number of datasets as a “reference” to use other data. A reference often means a reference to a source that can provide more information about a particular situation or claim. Referential data is generally uniform with few mutations, and can consist of values ​​or statuses.

Within the government there are data sets that are essential for boosting the use of government data. These data sets form so-called "anchor points" for the use of data. These datasets are also referred to as referential data sets. Lists of population numbers, index figures from Statistics Netherlands, the real estate dashboard from the Land Registry, or government organizations from KOOP are examples of this type of referential data. The referential data sets are prominently disclosed on data.overheid.nl. This way, the user can be better supported in the use and application of government data. An incentive for use is hereby intended.

High value data sets

The government has the ambition to make as much government data as possible available as open data. In doing so, the government prioritizes 'high value' data sets. These are data sets of high value to society, such as the Basic Registration Addresses Buildings (BAG) and the cadastral map. When making data available, priority is given to making these data sets accessible. In 2016, data.overheid.nl drew up a Municipal High Value List in collaboration with municipalities, the Digital Urban Agenda and VNG/KING. This list is a starting point for municipalities to start opening data sets.

The provinces also drew up a Provincial High Value list in 2019.

DCAT

In order to present data sets in a well-organised manner and to be able to search for data sets in a targeted manner, data sets on data.overheid.nl are described with meta data. The W3C has developed DCAT for this: a meta data standard for the description of data sets. 

DCAT standard European Union

An application profile of DCAT has been drawn up by the European Union. The Dutch application profile of DCAT is based on version 1.1 of the EU profile (more about DCAT-AP-EU 1.1 of the EU). An update of the DCAT-AP-EU is continuously underway. This includes mapping to ISO 19115, the metadata standard for geo data sets, among other things. Follow the developments of DCAT-AP-EU.

DCAT standaard The Netherlands

The Dutch government has translated the DCAT-AP-EU into a Dutch profile. This is also referred to as the IPM for data sets. The IPM for data sets is the specification of the metadata that the Dutch government uses for the exchange of metadata about data sets between data catalogs. Read more about the forms that exist around DCAT. The IPM for Datasets can be found here.

File formats open data

When registering a dataset at data.overheid.nl, you can choose from various file formats. These file formats are selected on the basis of the DCAT-NL model. All 13 formats are explained in the table below.

FORMAAT Explanation
Atom  XML based format similar to .rss. It is designed as a universal standard for personal content and weblogs.
JSON This standard format is used for storing simple data and objects. The text can be read by humans and is based on javascript.
MS Word  
PDF PDF files can contain text, images, annotations, outlines, and other data.
RDF RDF format is mostly used for visualization and spatial analysis.
SOAP  
Excel  
ZIP A ZIP is a folder containing multiple documents.
CSV Comma Separated Values (CSV). File with data separated by commas. CSV is often used to exchange data
HTML HTML is a web page that is displayed in a web browser. The HTML source code is parsed by the web browser and is usually not seen by the user.
N3  
Turtle  
XML  XML is a data file that uses tags to define objects and object attributes; formatted just like an HTML. XML files are a standard way of storing and transferring data between programs and on the internet. Since they are formatted as text documents they can be edited by a simple word processor. 

Data.overheid.nl also uses another format, the SHAPEfile. SHAPE is not formally an open data format. But it has been decided to view this format as open data, because the format is widely used by government organizations.

Licences for reuse

When you register a dataset on data.overheid.nl, you are obliged to link a license to a dataset. A license determines the degree of reuse of a dataset. The table below lists the licenses that are used on data.overheid.nl. With each license it is indicated whether it concerns "open data". Some licenses impose restrictions on the re-user, so that there is no open data that can be reused without restriction.

Closed data sets

Some data sets on data.overheid.nl are 'closed'. This means that the data set is not or will not be available for public reuse. If a data set is 'closed', it must also be clear why a data set is not available or will not become available. Read more here.

Linked data stars

To demonstrate how useful a data set is, data.overheid.nl uses the Linked data stars by Tim Berners Lee. According to the Linked data star classification, there are five ranks in the degree of openness of a data set. The higher the number of stars, the better the quality and openness of a data set. The star system is used in England to encourage government organisations to be as 'open' as possible.

NUMBER OF STARS MEANING EXAMPLE
1 star Available on the web, with an open license PDF
2 stars Data is machine readable and contains an open license Excel
3 stars De data set is available in an open file format  CSV
4 stars All of the above + use open standards by the W3C (RDF and SPARQL) to identify objects in the data, so others can refer to those objects. RDF
5 stars All of the above + link your data to others' data, to give more context.  RDF

Data set en data source

The terms "data set" and "data source" are used on data.overheid.nl to refer to a collection of data. There is no fixed definition of these terms. The definition below is how data.overheid.nl uses them.

  • Data set: a description of a collection of data from a data owner. This can be, for example, one table with data or a collection of tables with related data, for example all tables per year over the period 2005-2016.
  • Data source: a reference to the actual location of data that is named in the data set. A data set contains one or more data sources. In the example there is one data source in the form of one table with data or several data sources corresponding to the number of tables per year.

The following rules of thumb apply:

  • A data set is formed by a description and metadata. There are a number of mandatory fields (enforced by the DCAT standard) and optional fields. The data set describes the content of the underlying data sources.
  • A data set is compiled by the data owner in such a way that it provides the optimal composition of description and data sources for reuse. The data owner decides.
  • A data set contains at least one data source and possibly more data sources. A data source can occur in multiple datasets if the data owner finds this useful to encourage reuse. Duplication of reference to the same data sources should be avoided as much as possible and only used if there is no other option.