DCV Curation Tools Graphical User Interface

Within MD-Paedigree, data curation/validation is a crucial aspect to ensure that the data submitted into the repository is relevant, syntactically well-formed, semantically interoperable and properly linked into the system.

The Data Curation and Validation (DCV) tool developed by ATHENA is a web application offering an advanced (semi)- automatic data cleaning process able to handle the heterogeneous MD-Paedigree data. This tool was based on a former tool developed by UoA during the European FP6 Health-e-Child project. In MD-Paedigree, DCV is enhanced with data cleaning mechanisms facilitating the detection of numeric outliers, missing values as well as alphanumeric typographical errors. DCV also offers a user-friendly interface for defining and running data cleaning rules over a relation such as functional dependencies, conditional functional dependencies and denial constraints.


An additional extremely powerful functionality of DCV is the computation of new derived columns either through discretisation criteria or by computing and executing arithmetic operations (e.g. for computing medical scores). Complicated computations for millions of rows of data are executed in minimal time with the extremely powerful madIS engine . Furthermore, DCV provides visualisation of data through interactive barcharts and piecharts which help users to identify the distinct values of a column’s data, as well as scatterplots and linecharts which give a graphical representation of correlations between two attributes. Last but not least, DCV keeps a history of all actions that affect the values of data. The user can undo/redo history or save workflows and re-run them in other projects or with other data.


Graphical User Interface

A new prototype for the case- and ontologybased retrieval service

A graphical user interface has been developed by HESSO for the Case-based retrieval service to develop services to allow for searching the infostructure for a large variety of information needs. The search modalities will include free text data – virtually in any language – and linked data to ontological resources developed by and maintained in MDPaedigree. A prototype of a generic Ontology-Based Data Access (ODBA) Service was developed by the ATHENA, that will provide a flexible querying front-end for the MD-Paedigree platform, addressing Ontology-based Query Formulations that will give enable users to formulate queries using familiar vocabularies and conceptualisations. Our goal is to develop a flexible module that will support query formulation by different types of users (ranging from clinicians and researchers to IT experts and computer scientists), as well as to provide a querying interface to other MD-Paedigree subsystems like KDD tools and the Sim-e-Child/PCDR repository.



Authors: Anna Gogolou, Graduate Student Researcher at “Athena” Research and Innovation Center AND Patrick Ruch, Professor at the University of Applied Sciences Geneva (HES-SO)

This article was originally published in Md-Paedigree Newsletter-Issue 3

md-paedigree twitter page
md-paedigree secure
md-paedigree call
MD-Paedigree Infostructure