Technology and Software
T1: Verification of Systems Biology Research in the Age of Collaborative - Competition. Julia Hoeng Ph.D., Manager Computational Disease Biology, Philip Morris International, Research & Development, Neuchâtel, Switzerland
T2: Developing intelligent software for storing, retrieving and analyzing biological data is the core of bioinformatics – Are these assets protectable? Julian Würmser, German and European Patent Attorney, Meissner Bolte & Partner, München, Germany.
T3: Calling Pathogenic Variants from Exome Sequencing Data. Frank Schacherer, CTO, BIOBASE GmbH, Wolfenbüttel, Germany.
T4: Better Computing for Better Bioinformatics. George Vacek, Director, Life Sciences, Convey Computer Corporation, Richardson, TX, USA.
T5: Using globally integrated public expression data to investigate your own hypotheses. Dr. Philip Zimmermann, CEO, NEBION.
T6: The SBKB Structural Biology Knowledgebase and the Protein Model Portal: a one-stop-resource to structure information. Jürgen Haas, SIB Swiss Institute of Bioinformatics & Yi-Ping Wendy Tao, RCSB PDB.
T7: Ion Torrent: Open, Accessible, Enabling. Matt Dyer, Ion Torrent Associate Director, South San Francisco, CA, USA.
T8: ELIXIR - working towards a pan-European research infrastructure for biological information. Søren Brunak, Technical University of Denmark, Chair of the Interim ELIXIR Board.
T9: Putting IBM Watson to Work In Healthcare. Bill Rapp, Chief Architect Watson Solutions Development, IBM Software Group, Rochester, USA.
T10: Pathway Studio is a platform for integrating predicted and measured interactions, enabling pathway and network analysis of high-throughput molecular profiling data. Anton Yuryev, Sales Development Director, Pathway Studio, Elsevier.
T11: Finding unusual peptides on the Internet using plain three letter sequence codes. Hans-Juergen Himmler, AKos Consulting & Solutions Deutschland GmbH, Steinen, Germany.
T12: IBM Watson For Healthcare - the Technology Bill Rapp, Chief Architect - Watson Solutions Development, Rochester, MN, USA.
Abstracts
T1: Verification of Systems Biology Research in the Age of Collaborative - Competition. Julia Hoeng Ph.D., Manager Computational Disease Biology, Philip Morris International, Research & Development, Neuchâtel, Switzerland
Modern society demands greater scrutiny of the potential health risks and benefits of long-term, and sometimes lifelong, exposure to drugs, chemicals, and substances found in consumer products and the environment.
Organizations such as companies and academic consortia conduct large multi-year scientific studies that entail the collection and analysis of thousands of data points. The individual experiments are often conducted over many physical sites and with internal and outsourced components. To extract maximum value, the interested parties need to verify the accuracy and reproducibility of automated collection and analysis workflows in systems biology before the initiation of large multi-year studies.
Traditional verification using the peer-review process has shortcomings, such as lack of scalability, which renders it insufficient for the assessment of high throughout research. A team of researchers at PMI and IBM, whose aim is to improve the effectiveness of scientific studies and verification of scientific findings propose a scheme called IMPROVER, for Industrial Methodology for Process Verification of Research. This methodology evaluates a research program by dividing its workflow into smaller building blocks, whereby the verification of each building block can be done internally or externally via challenge-based `crowd-sourcing` to a research community.
Scientific challenges will be broadcast to potential stakeholders in the form of an open call for participation with the intention of providing the community with the opportunity to test their computational methods on new data as well as to partake in a collaborative effort whose ultimate goal could contribute to solving a grand scientific problem.
Considering cancer as the leading cause of death worldwide, we formulate the Diagnostics Signature Challenge to evaluate novel approaches for the identification of robust and predictive signatures for this disease. The goal of a Diagnostics Signature Challenge is to verify that transcriptomics data contains enough information for the determination and prognosis of certain human disease states that could profit from better diagnostics signatures.
Here we will describe the approach, the necessary operational steps, and how we intend to engage the wider scientific community to assess the applicability of the IMPROVER approach to molecular diagnostics (i.e., genomic signatures).
Speaker:
Julia Hoeng Ph.D., Manager Computational Disease Biology, Philip Morris International, Research & Development, Neuchâtel, Switzerland. SBV 3.0 IMPROVER Project (Industrial Methodology for Process Verification in Research) is a scientific collaboration between Philip Morris International (PMI) and IBM’s Thomas J. Watson Research Center on a project funded by PMI
[top]
T2: Developing intelligent software for storing, retrieving and analyzing biological data is the core of bioinformatics – Are these assets protectable? Julian Würmser, German and European Patent Attorney, Meissner Bolte & Partner, München, Germany.
It is often said that the sequencing of the human genome was an achievement in computing rather than in molecular biology. Since the development of the technology, a number of bioinformatics companies have been active in creating new algorithms and software for storing, retrieving and analyzing biological data.
While the United States Patent Office is very accepting of patent applications in this field, the European Patent Office, and many national patent offices in the European Union, take a somewhat more restrictive approach. Nevertheless, with little public attention, the European Patent Office has granted numerous patents for bioinformatics inventions. Due to misinformation and legal uncertainties, European companies are disregarding available options for protecting their assets.
In the session, the available means for legal protection of software based inventions in the field of bioinformatics will be presented. Risks and remedies to these risks are illustrated and discussed.
Speaker:
Julian Würmser graduated from the Technical University of Munich with a Masters degree (Dipl.-Inf.) in Computer Science. He worked several years as a software designer focusing on medical imaging and developing algorithms in the field of structural biology before qualifying as German and European Patent Attorney. Julian Würmser is also admitted as a European Trademark Attorney. Julian Würmser is working at Meissner, Bolte & Partner, one of Germany’s most prominent intellectual property law firms. His work focuses on patents for software-implemented inventions, and the enforcement of respective rights.
[top]
>T3: Calling Pathogenic Variants from Exome Sequencing Data. Frank Schacherer, CTO, BIOBASE GmbH, Wolfenbüttel, Germany.
Next Generation Sequencing offers powerful capabilities to screen multiple candidate genes for disease causing variants in parallel. This talk will review how mutations with pathological significance can be identified in whole exome data, and will provide you with an overview on a variety of existing approaches.
* Direct application of known genotype-phenotype associations from supporting literature evidence, such as the Human Gene Mutation Database (HGMD)
* Functional assessment of novel variants by filtering with common SNPs and known disease-associated genes
* Prediction of pathologic effects for missense variants, splice-site or nonsense mutations
[top]
T4: Better Computing for Better Bioinformatics. George Vacek, Director, Life Sciences, Convey Computer Corporation, Richardson, TX, USA.
Advances in sequencing technology have significantly increased data generation, requiring similar computational advances for bioinformatics analysis. Advanced architectures based on reconfigurable computing can reduce application run times from hours to minutes, while addressing problems unapproachable with commodity servers. The increased capability also improves research quality by allowing more accurate, previously impractical approaches. This work describes the use of Convey’s Hybrid-Core (HC) computing architecture, which combines a traditional x86 environment with a reconfigurable coprocessor, to solve a data-intensive problem of next-generation sequencing analysis such as reference mapping, de novo assembly, functional annotation, variant analysis and RNA expression profiling.
Convey has developed a personality that improves the performance of the aln step of the BWA processing pipeline, and a parallelized version of the samse and sampe processing steps, that allow Convey systems to dramatically reduce time to solution and increase throughput more than 18x for a full BWA paired-end mapping of GoNL data. Integrated BAM file generation leads to additional workflow optimization.
Graph Constructor for Velvet reduces not only run time, but also required memory, making it capable of larger assemblies. Additional performance and workflow optimizations will be discussed, including a fast kmer counting tool that allows quick identification of optimal kmer length and coverage cutoffs for de novo assembly.
SWSearch, a search and alignment program using the Smith-Waterman algorithm, dramatically reduces the time to perform large numbers of local alignments. SWSearch on an HC-2ex server is 15x faster than the fastest software implementation on a commodity x86 system. When searching Illumina reads against a database of protein sequences SWSearch on an HC-2ex server is more than 7 times faster than NCBI BLASTx. Furthermore, BLAST uses a heuristic filter and matches only about 1/3 as many as found by the full Smith-Waterman approach. Since matches indicate a read that is part of a gene of interest, any miss could be significant.
[top]
T5: Using globally integrated public expression data to investigate your own hypotheses. Dr. Philip Zimmermann, CEO, NEBION.
During the past 10 years, a huge amount of expression data has been generated and made publicly available. The biological information contained in this data is largely under-explored, and most experiments have been analyzed individually. Integrating this bulk of data on a large scale provides new means of biological investigation and hypothesis testing. This presentation will show an example of public data integration and creation of a high-performance search engine - the Genevestigator platform. Currently, Genevestigator contains manually curated and globally normalized expression data from 14 organisms, including more than 170 human diseases from 29 disease areas. Scientists can use the Genevestigator online tools to check the expression of genes against thousands of experimental conditions, tissue types, cancers, and across development, or to search for genes with particular expression characteristics. In parallel, high quality aggregated datasets from the Genevestigator compendium will soon be made publicly available for bioinformaticians to serve as baseline/reference datasets and for testing novel methods and algorithms on real-life data. This presentation will discuss these resources and how they can be used to speed up your research projects.
Speaker:
Dr. Philip Zimmermann studied Agronomy at ETH Zurich, with a focus on plant breeding and physiology. His Masters at Texas A&M dealt with drought resistance in cotton. He then pursued a PhD in plant molecular biology at ETH Zurich. As a post-doc and senior scientist at ETH Zurich, his work focused on modeling gene regulatory networks and systems biology in mammalian and plant species. Together with his team, he developed the Genevestigator platform as an online search engine for gene expression. He currently serves as CEO at NEBION and coordinates several research projects in collaboration with academic and commercial partners.
[top]
T6: The SBKB Structural Biology Knowledgebase and the Protein Model Portal: a one-stop-resource to structure information. Jürgen Haas, SIB Swiss Institute of Bioinformatics & Yi-Ping Wendy Tao, RCSB PDB.
The Protein Structure Initiative Structural Biology Knowledgebase (SBKB, http://sbkb.org) is the scientific web portal that integrates biological, experimental, and structural data about proteins. SBKB delivers unique and comprehensive information, including 3D structures from the Protein Data Bank, theoretical models available in the Protein Model Portal, target history and protocols from PSI TargetTrack, dna clones from PSI Material Repository, annotations from 140+ open biological resources, technology reports from the PSI Technology Portal, PSI articles from the PSI Publications Portal, a series on featured structures by David Goodsell and research and technical highlights from the Nature Publishing Group.
The talk will illustrate the benefits from searching the SBKB in detail and also highlight the structural coverage within the Protein Model Portal. Visit us at sbkb.org!
[top]
T7: Ion Torrent: Open, Accessible, Enabling. Matt Dyer, Ion Torrent Associate Director, South San Francisco, CA, USA.
Ion Torrent has pioneered an entirely new approach to sequencing that enables a direct connection between chemical and digital information and leverage decades of semiconductor technology advances. The result is the first commercial sequencing technology that does not use light, and as a result delivers unprecedented speed, scalability, accuracy, and low cost. In just the first year the Ion Torrent Personal Genome Machine (TM) has become the fastest selling sequencing platform. The throughput scaled 100X, from 10Mb to 1Gb, in just the first year and will scale another 100X in the next year with the new Proton (TM) sequencer, which will enable the single day $1000 human genome. Automated data analysis is driven by Torrent Suite, an open-source software suite that provides a simple and intuitive interface to streamline data analysis and provide results in minutes to hours, not days. Built on top of Torrent Suite is a flexible SDK that allows users to expand the analysis capabilities through the development and utilization of plugins and APIs.
[top]
T8: ELIXIR - working towards a pan-European research infrastructure for biological information. Søren Brunak, Technical University of Denmark, Chair of the Interim ELIXIR Board.
The mission of ELIXIR is to build a sustainable European infrastructure for biological information supporting life science research and its translation too: medicine, the environment, the bioindustries, and society.
ELIXIR will be a distributed infrastructure arranged as a Hub and Nodes, with the Hub at the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK and Nodes located throughout Europe.
An Interim ELIXIR Board, consisting of Scientific and Administrative representatives from countries that have indicated interested in joining ELIXIR by signing a Memorandum of Understanding with ELIXIR, has been established and is currently moving ELIXIR towards construction, working with those institutes interested in hosting one of the it's ‘Nodes’ contributing data resources; bio-computing capacity; infrastructure for data integration; and services for the research community, including training and standards development.
Speakers:
The Chair of the Interim ELIXIR Board, Søren Brunak, of the Technical University of Denmark, will present an introduction to ELIXIR and the latest developments made towards its construction. Delegates will then be welcomed to put questions to ELIXIR representatives.
[top]
T9: Putting IBM Watson to Work In Healthcare. Bill Rapp, Chief Architect Watson Solutions Development, IBM Software Group, Rochester, USA.
The challenge in building a computer system like Watson lies in developing its ability to understand the language and intent of a question, scour millions of lines of human language, and return a single, precise answer - in less than three seconds. This discussion will outline the efforts going on inside IBM to commercialize the Watson engine for use in various industries, starting with Healthcare, and describe how Watson is being adapted to answer complex medical questions.
[top]
T10: Pathway Studio is a platform for integrating predicted and measured interactions, enabling pathway and network analysis of high-throughput molecular profiling data. Anton Yuryev, Sales Development Director, Pathway Studio, Elsevier.
Pathway Studio includes a comprehensive and accurate knowledgebase of molecular findings with a powerful analytical framework. Pathway Studio knowledgebase consists of regulatory and physical interaction relations accurately extracted from peer-reviewed scientific publications by MedScan, natural processing technology from peer reviewed scientific publications. Pathway Studio allows import of interaction data from high-throughput experiments or from predictions based on protein sequence similarity. Thus, Pathway Studio is an efficient tool for validation of new interaction data by comparison with published knowledge. It also can also be a source of interaction for the training sets of algorithms predicting molecular interaction. All interactions imported into Pathway Studio knowledgebase can be used to analyze gene expression microarray, GWAS and NGS datasets. This analysis can be used for further validation of new interactions as wells to obtain new insights into the biological mechanism using previously published molecular profiling results. Pathway Studio offers application programming interfaces that can be used for integrating novel statistical algorithms into the Pathway Studio analytical framework.
[top]
T11: Finding unusual peptides on the Internet using plain three letter sequence codes. Hans-Juergen Himmler, AKos Consulting & Solutions Deutschland GmbH, Steinen, Germany.
Finding peptides with modified Amino acids is difficult or impossible when you use plain three letter sequence codes and BLAST. You can find those peptides when you use the structure as a query, but drawing the structure correctly is rather difficult for non-chemists.
We developed CWM Global Search with Proteax. This is an Internet search engine that allows scientists such as biologists to input plain three letter sequence codes and subsequently search the corresponding chemical structures on the Internet including Substructure searches and structure similarity searches.
The results are mapped back to three letter sequence codes if possible. This makes the interpretation of the search results much easier than trying to interpret the structures or systematic names normally provided in the databases. Results can also be easily compared using Proteax for Spreadsheets.
We demonstrate how to input the plain three letter sequence codes in the Proteax editor and the easy and powerful interpretation of the results performing substructure and structure similarity searches for peptides in PubChem and ChEBI.
[top]
T12: IBM Watson For Healthcare - the Technology Bill Rapp, Chief Architect - Watson Solutions Development, Rochester, MN, USA.
This interactive technical discussion will take a look behind the scenes at how Watson works and how it is being trained to answer medical questions. We will use our experience in leveraging Watson to recommend treatment options for oncology patients and then generalize for other healthcare domains.
[top]