Semantic data integration techniques for transforming big biomedical data into actionable knowledge

verfasst von
Maria Esther Vidal, Samaneh Jozashoori
Abstract

FAIR principles and the Open Data initiatives have motivated the publication of large volumes of data. Specifically, in the biomedical domain, the size of the data has increased exponentially in the last decade, and with the advances in the technologies to collect and generate data, a faster growth rate is expected for the next years. The available collections of data are characterized by the dominant dimensions of big data, i.e., they are not only large in volume, but they can be also heterogeneous and present quality issues. These data complexity problems impact on the typical tasks of data management, and particularly, in the task of integrating big biomedical data sources. We tackle the problem of big data integration and present a knowledge-driven framework able to extract and integrate data collected from structured and unstructured data sources. The proposed framework resorts to Natural Language Processing techniques to extract knowledge from unstructured data and short text. Furthermore, ontologies and controlled vocabularies, e.g., UMLS, are utilized to annotate the extracted entities and relations with terms from the ontology or controlled vocabulary. The annotated data is integrated into a knowledge graph. A unified schema is used to describe the meaning of the integrated data as well as the main properties and relations. As proof of concept, we show the results of applying the proposed framework to integrate clinical records from lung cancer patients with data extracted from open data sources like Drugbank and PubMed. The created knowledge graph enables the discovery of interactions between drugs in the treatments prescribed to lung cancer patients.

Organisationseinheit(en)
Forschungszentrum L3S
Externe Organisation(en)
Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Typ
Aufsatz in Konferenzband
Seiten
563-566
Anzahl der Seiten
4
Publikationsdatum
06.2019
Publikationsstatus
Veröffentlicht
Peer-reviewed
Ja
ASJC Scopus Sachgebiete
Radiologie, Nuklearmedizin und Bildgebung, Angewandte Informatik
Ziele für nachhaltige Entwicklung
SDG 3 – Gute Gesundheit und Wohlergehen
Elektronische Version(en)
https://doi.org/10.1109/CBMS.2019.00116 (Zugang: Geschlossen)