Tutorial 1: Bio-Ontologies

Applications of Bio-Ontologies in Large-Scale Data-Driven Science: A Practical Introduction

Barry Smith, University at Buffalo, US.
Janna Hastings, European Bioinformatics Institute and University of Geneva.

Date:	Sunday, 9 September, 2012
Time:	9:00 - 17:00
Venue:	Room "Hongkong" Congress Center Basel, Messeplatz 21

Modern experimental techniques in biology generate data at unprecedented rates, but analysis and interpretation still lags behind the data, and data are still rarely combinable and reusable by independent groups. Ontological frameworks such as the Gene Ontology (GO) have been used for more than a decade to provide a shared language for communicating biological information that promotes integration of biological knowledge. In order to serve this objective, ontologies must be non-overlapping, accepted and used by the broader scientific community, and well-structured, marked by logical rigor and terminology consistency. This tutorial will provide practical training in ontology construction, modular re-use of parts of ontologies to reduce redundancy between ontologies, mediated discussion to achieve community agreement on ontology structure and the use of ontologies for data-driven science, all within a non-technical, biologically oriented programme using examples taken from the chemical biology domain, including relevant existing bio-ontologies for chemistry, screening experiments and targets.

Motivation

Modern experimental techniques in biology are generating data at unprecedented rates, but analysis and interpretation still lags behind the available data, and data are still rarely combinable and reusable by independent groups. Ontological frameworks such as the Gene Ontology (GO) have been used for more than a decade to provide a shared language for communicating biological information that promotes integration of biological knowledge and thereby addresses the analysis and integration bottleneck. Common controlled vocabularies modelled on the GO have now been created in a large variety of different fields, and these ontologies are being used to support navigation through very large volumes of data in ways which can allow formulation and testing of complex hypotheses. Ontologies are also being used to provide a means to address mandates imposed by funding agencies for reusability of data, by providing the means to describe research results in ways which allow them to be discovered by users.

In recent years, new powerful editing and reasoning tools have made it much easier to develop and use ontologies, for example within the framework of the Semantic Web / Linked Open Data. This tutorial will present the use of these tools in a non-technical, biologically relevant fashion. Unfortunately, many of these newly created ontologies have been developed in isolation for specific local purposes with little attention to their applicability to multiple disparate data sets; the result, in many cases, is that new ontologies being created do not in fact serve large-scale cross-community needs. Ontologies will be truly beneficial to biological data integration and organisation only if common best practices are employed in a way that ensures semantic interoperability. The most important prerequisites are: that the ontologies are non-overlapping, that they are accepted and used by the broader scientific community, and that they are well-structured and marked by logical rigor and terminology consistency.

This tutorial will focus on practical strategies for the achievement of these objectives. We will introduce the Open Biomedical Ontologies (OBO) Foundry project, which has played a pivotal role in coordinating and standardizing ontology development in the biomedical domain, and the Basic Formal Ontology (BFO), which allows interoperation of multiple biomedical ontologies by providing a shared upper level. The practical components of the tutorial will involve ontology construction, modular re-use of parts of ontologies to reduce redundancy between ontologies, and discussion to achieve community agreement on the ontology structure. We will draw for illustration on examples taken from the chemical biology domain, including relevant existing bio-ontologies for chemistry, screening experiments and targets.

Overall Goals

This tutorial will provide instruction in all areas of ontology development to aid computational biological research, with a special focus on chemical biology. The specific learning outcomes for the students will be:

• Understanding the background to existing bio-ontology development and application efforts;
• Becoming familiar with what is currently available in various biological domains and with the most important success stories and common reasons for failure;
• Gaining practical experience with modern ontology development tools such as the Protégé ontology editor and associated automated reasoners;
• Developing an appreciation for the essential tried and tested principles that are required to build a robust formalization of an ontology that brings maximal benefits in search, integration and reasoning over the data that the ontology is used to annotate.

The overall goal of this tutorial is to foster communication and the adoption of best practices within the community, to encourage cooperative development, and to ensure that biological ontologies are created that are sufficiently well principled that they may be reasoned over within an open cumulatively growing framework.

Prerequisites and intended audience

The tutorial is aimed at computational biologists and bioinformaticians who need to develop ontologies as part of data standardization and annotation efforts.

Participants are expected to be generally familiar with bioinformatics, and have experience in using at least one biological database. Some familiarity with the Gene Ontology will be useful, but is not required. No prior use of tools such as Protégé is required, as participants will gain this skill during the tutorial.

Tutorial Outline

Time		Session Details
9:00		Introduction and Principles During this session, we will introduce ontology research and development with their application in bioinformatics and computational biology. The principles of good ontology development in support of multiple biologically relevant applications such as standardization and classification will be outlined. In order to reason upon and draw inferences from data to which an ontology has been applied it is essential that the relationships be carefully defined, otherwise the data entry is insecure and the results are unpredictable. We will use case studies to illustrate situations to be avoided and the subtleties of intended meaning underlying various relationship types. The primary emphasis will be on illustrating key issues through examples which will be used to guide ensuing discussions. We will introduce the OBO Foundry project, its aims and methodology.
10:00		Basic Formal Ontology In this session, we will outline the purpose of upper-level ontologies in ensuring interoperability and quality in ontology development. We will introduce Basic Formal Ontology, which is used as a foundational upper-level ontology to ensure interoperability in over 100 projects in the biomedical domain. The latest version of the ontology (2.0) includes significant improvements over the earlier version (1.1), and these will be detailed.
10:30		*Coffee break*
11:00		Relation Ontology We will continue the previous session by introducing the Relation Ontology, and illustrate with practical examples how BFO and RO enable interoperability.
11:30		Bio-Ontologies and their Applications in Chemical Biology In this session, we will provide an in-depth overview of several of the biological ontologies that are currently available, with a focus on those that are relevant for chemical biology. This will include the Chemical Entities of Biological Interest ontology (ChEBI), the Gene Ontology (GO), the Protein Ontology (PRO) and the Ontology of Biomedical Investigations (OBI). We will cover the motivation and scope of each of the ontologies, the relationships used and the logical inferences they support. We will also survey data sets to which the ontology has been applied. We will further illustrate how the ontologies are being used, with practical examples from recent research in text mining, systems biology modelling, and metabolic network reconstruction.
12:30		Lunch
13:30		Constructing and Using Interoperable Ontologies with Protégé In this hands-on session, we will provide an introduction to using Protégé and the Web Ontology Language OWL for ontology editing. This will include creating and editing ontologies, annotating ontologies with metadata such as synonyms, definitions, evidence codes, and comments, and creating appropriate relationships between entities. We will further introduce some of the more sophisticated features of the Protégé application, including automatic ontology classification using reasoners such as Pellet and HermiT, and sophisticated, flexible querying using Description Logic-based queries. In preparation for the practical sessions that follow, we will show how multiple ontologies can be edited at the same time, and show how modules can be extracted from existing ontologies via the MIREOT mechanism, as applied in the OntoFox tool implementation, in support of re-use of existing ontologies.
14:30		Ontology Construction Practical During this practical session, students will participate in the construction of an ontology, learning by doing. The students will be separated into groups, and each group will be assigned an example that will be selected from challenging areas of emerging biology where existing standards have not yet matured. Students will then develop ontologies to describe these areas of biology, drawing on existing ontologies where appropriate. This will be highly interactive as the group will need to discuss the relative pros and cons for each term and relationship that is added.
15:00		*Coffee break*
15:30		Ontology Construction Practical, continued We will continue the practical session.
16:00		Ontology Harmonization and Discussion Students will have generated different ontologies in the previous session. In this final session we will bring the various ontologies under a common root level in order to use them simultaneously for reasoning. The task will illustrate the utility of ontology-based reasoning across integrated interoperable ontology modules. It will highlight the need for harmonization of different representations and emphasise practical principles for ensuring interoperability. Through discussion, students will gain insights into methods to achieve consensus in the community development of shared ontologies.
17:00		End of Workshop

Tutors

Barry Smith

Smith Barry Smith is a prominent contributor to both theoretical and applied research in ontology. He is the author of some 500 publications on ontology and related topics, and his research has been funded by the National Institutes of Health, the US, Swiss and Austrian National Science Foundations, the US Department of Defense, the Volkswagen Foundation, and the European Union. In 2010 he was awarded the first Paolo Bozzi Prize in Ontology by the University of Turin.
Smith is SUNY Distinguished Professor in the Department of Philosophy and Director of the National Center for Ontological Research in the University at Buffalo, where he is also Adjunct Professor in the Departments of Neurology and of Computer Science.
Smith’s work on the science of ontology contributed to the establishment of the OBO (Open Biomedical Ontologies) Foundry, a set of resources designed to support information-driven research in biology and biomedicine. Smith is one of the principal scientists of the NIH National Center for Biomedical Ontology, a Scientific Advisor to the Gene Ontology Consortium, and a PI on the Protein Ontology and Infectious Disease Ontology projects. He has organized over 100 ontology conferences, workshops and tutorials.

Janna Hastings

Janna Hastings is a bioinformatician and ontologist working on ontologies for chemical biology in the Cheminformatics and Metabolism group at the European Bioinformatics Institute in Hinxton, UK. She also has a dual appointment at the Swiss Centre for Affective Sciences in Geneva, Switzerland, where she works on ontologies for emotions, cognition and other mental processes. Her research centres around ontology-based knowledge representation and automated reasoning for chemical data within a biologically relevant context.
Hastings is the lead ontologist for the ChEBI chemical ontology, and is responsible for the ontology-based standardization effort for an EU-wide chemical biology screening project. She is also the lead developer of the Emotion Ontology and contributor to the Mental Functioning Ontology. She regularly teaches courses on ontologies in bioinformatics as part of the EBI Bioinformatics Roadshow training programme, and has organised multiple ontology workshops and community participation events at the EBI such as user group workshops for the ChEBI database.

ECCB'12 KEYNOTES:

CONFERENCE CHAIRS: