The MD‐Paedigree VPH Infostructure and Digital Repository
Building on the Sim‐e‐Child/PCDR digital repository (http://sec‐portal.maatg.fr/), MD‐Paedigree will implement the Service‐Oriented Knowledge Utility (SOKU) vision, to facilitate the design and development of innovative new predictive models as reusable and adaptable workflows of data mining applications, and turning the latter into clinically validated decision support tools, made available at the point of care to physicians.
The Service‐Oriented Knowledge Utility (SOKU)
MD‐Paedigree translates the domain‐specific applications and data into services and associated knowledge that can be further published, discovered and semi‐automatically orchestrated in the grid/cloud, by physicians and medical data integration experts. This way leads to the development of new workflows and enables their personalisation to real patient cases.
Go to the About Section for more information about the SOKU vision.
The MD‐Paedigree system consists of standard services that have various levels of knowledge awareness, starting from basic utilities under the form of classical Web services wrapping up computational resources, to hybrid services that only consume data and information, and to finally more complex high‐level entities, which produce knowledge.
The semantic enrichment of services in the platform enables the system to understand its own constitution and to provide users with guidance in a variety of options compatible with the defined execution context (i.e. patient data, applications, objective, etc.).
MD‐Paedigree exploits recent ground‐breaking European research on semantic modelling, ontology‐based data access and scalable query execution to develop an extensible platform based on open standards and protocols to deliver a complete and generic solution able to tackle the targeted paediatric disease areas, while remaining adaptable and evolvable to additional disorders in the future. This is what the following sections further elaborate on.
Data Access and Query Formulation
One important goal of the MD‐Paedigree infostructure is to provide the necessary tools and applications to assist users in accessing and foraging the wealth of heterogeneous data available in the digital repository in an easy, intuitive and seamless way across the care continuum via enhanced connectivity with other hospital information systems and the patient’s electronic health records.
Advanced techniques, tools and languages for accessing federated data are thus used (both machine and user‐ oriented). Technologies and research related to Ontology‐Based Data Access (OBDA) are applied, such as the new forms of query by navigation based on ontologies and the extensible declarative query language supporting linked data (e.g. SPARQL endpoint). Interactive search based on relevance feedback will be applied to improve data recall in the infostructure.
Distributed Processing and GPU support
MD‐Paedigree extends the distributed processing capabilities of the Sim‐e‐Child platform in two major axes. On the one hand, it develops compatibility with GPU processing and makes it possible to execute validated models onto real patient data, thus providing real‐time support to physicians at the point of care in the 7 participating centres.
Indeed, the introduction of non graphics application programming interfaces (APIs) for GPUs brought a new perspective on GPUs, transforming them into general purpose units. On the other hand, MD‐Paedigree will experiment with the operation of a sustainable translational service for healthcare professionals and other external centres, by integrating an open Cloud API in its abstraction layer, thereby allowing the infrastructure to elastically adapt according to faced requests from end‐users. The Athena Distributed Processing (ADP) Engine is considered to more easily integrate and adapt algorithms distribution, through the newly integrated abstraction APIs.
Intelligent Mining, Modelling, Reasoning and Simulation Framework
MD‐Paedigree integrates AITION, an evolutionary information processing and knowledge discovery framework developed by the University of Athens (UoA) for biomedical research, which is able to provide highly accurate predictive and statistical simulation models combining
(1) a bottom‐up data driven process to analyse heterogeneous demographic, phenotypic, clinical, molecular, and genomic biomedical data, images and streams; and
(2) a top‐down model‐driven process to incorporate external knowledge coming from domain experts, literature, or model‐guided processes and relational/semantic models.
AITION integrates Probabilistic Graphical Models (PGMs) as a unifying patient/disease modelling approach providing an integrated framework for multi‐scale vertical integration, feature selection, simulation, knowledge discovery and decision support. Initially developed and tested in the Health‐e‐Child project, AITION is based on state‐of‐the‐art techniques for Bayesian Network Learning, Markov Blanket induction and real‐time inference.
Moreover, ontologies and a priori knowledge will also be incorporated automating causal discovery and feature selection, providing semantic modelling under uncertainty. In MD‐Paedigree, hierarchical architectures, as well as, Granular Computing (GrC) and Statistical Relational Learning (SRL) techniques will be extended.
SRL is an emerging research area which aims at combining statistical learning and probabilistic reasoning (such as PGMs) within logical/relational representations providing multi‐entity reasoning for complex situations involving a variety of objects, as well as relations among them. SRL allows overcoming assumptions of traditional Machine Learning propositional approaches and i.i.d. assumptions, while making it possible to capture both uncertainty and similarity.
Moreover, the Hierarchical Layered Architecture incorporating hidden (latent) layers/variables and GrC techniques allows to build an efficient multi‐resolution computational model targeting complex applications consuming large amounts of data, information and knowledge. MD-Paedigree will thus deliver mathematically and semantically well‐grounded, scalable, dynamic, hierarchical statistical simulation models that will allow efficient Bayesian inference and online learning addressing multi‐entity, multi‐modal, high‐dimensional spatial data analysis and temporal reasoning over the distributed infrastructure.
Holistic Model‐Guided Personalised Medicine
Ultimately, MD‐Paedigree will provide an evolvable framework for holistic model‐driven medicine and personalised treatment combining knowledge constructs from observational data analysis, statistical and specialized VPH patient‐ or disease‐specific simulation models, domain knowledge representations, as well as patient/disease‐specific profiles.
The goal will be to find efficient ways to optimize and combine multiple statistical and/or specialized VPH simulation models in prediction tasks supporting the creation and validation of model‐driven clinical workflows. Utilizing the PAROS personalization platform, clinicians and domain experts will create ontology‐based patient and disease‐specific profiles capturing high‐level concepts and common characteristics.
Similarity search techniques will then be developed mapping specific medical cases to pertinent patient/disease profiles. These profiles will be used to adapt and optimise individual simulation models by transformations, as well as to explore their combinations and re‐use in different disease areas.
Finally, a holistic scheme for model‐driven personalised medicine will be developed that will allow analysing and testing scientific hypotheses, predicting disease evolution and treatment responses (e.g. early diagnosis of poor outcome that needs aggressive treatment) and elaborating individualized treatment plans. This outline is illustrated in the next figure.
Compliance with Guidelines for Model Based‐Drug Development (MBDD)
Taking into account the need to demonstrate model robustness and reusability, also by complying with the guidelines for future clinical trials design and execution, will in fact imply devoting special attention to having functional databases available to assist drug developers. Indeed, with the advent of molecular biology coupled with advances in screening and synthetic chemistry technologies, a combination of both random screening and knowledge around the receptor is used for drug discovery.
The complexity of the discovery pipelines is becoming greater and greater as medicinal chemistry meets with personalised medicine not only to design new drugs but also new diagnosis procedures.
MD‐Paedigree can support the drug discovery process. In particular, it can help in identifying biomarkers likely to characterise a particular pathology or dysfunction. By modelling the complex process of a particular disease and clinical intervention associated to healthcare (e.g. drug prescription, morbidities, diagnostic procedures…), the project knowledge bases can help identify specific biomarkers (such as vital signs, phenotypes, protein…). Second, it can help to design clinical trial protocols (i.e. exclusion/inclusion criteria, statistical power, and cohort identification) by providing a feasibility testbed to conduct clinical research studies, as currently explored by IMI projects such as EHR4CR.
Last but not least, the longitudinal follow up of MD‐Paedigree populations can help to monitor longer term effects of therapeutic treatments, including ‐drug response, phenotype evolution (e.g. neoplastic processes), as well as rare adverse effects. The resulting views can ultimately help to cluster populations according to specific genotypic variations (pharmacogenomics).