Confirmed Sessions at Data Day Health 2017

We are still confirming the final speakers for Data Day Health 2017. The final schedule will be published in the next few days. Bookmark this page for updates.

Smart Data Lakes in Pharma

Arthur Keen, of Cambridge Semantics will lead this session.

The process of developing a new drug can take over a decade and cost billions of dollars. Most of the time and effort in this process involves the clinical trial phases, where medicines that have been developed in the earlier drug discovery phases are tested. Planning for clinical trials is a critical part of this process. Planning for efficient and effective clinical trials requires companies to identify the most suitable doctors, global sites, and patients to use, however data from a very large number of disparate distributed data systems needs to be harmonized in order to support this process, and the unique nature of each clinical trail may lead to diverse and unanticipated questions that are difficult to support, with traditional data systems.
This talk describes how pharmaceutical companies are accelerating and improving the quality of planning for clinical trial by leveraging the self-service capabilities of the Anzo Smart Data Lake to organize, categorize, and harmonize their data, and to interactively search, analyze, and discover insights in their data in order to answer these kinds of questions.

Machine learning and IOT for medical prevention. A view from the trenches.

Pierre Gutierrez, of Dataiku will lead this session.

In this talk, Pierre Gutierrez, a data scientist at Dataiku, will discuss Dataiku's experiences using machine learning on IOT data. We will talk about the challenges processing and cleaning IoT data, and how to successfully train a model that can be deployed in production. We will illustrate our talk with two examples from our previous work. Creating algorithm for early epilepsy seizure detection based on wearable tech and Detecting people activity through sensor data.

Architecting a data-driven healthcare system from first principles

Nikhil Buduma, of Remedy Health will lead this session.

Today's healthcare system is rife with shortcomings. It takes 17 years for clinically-validated technologies and guidelines to reach the average physician's practice. It can take weeks, if not months, to see a physician in most cities. And the two thirds of the system's most valuable asset, physician time, is wasted on shuffling around paperwork.
Remedy is tackling these issues head on by developing a new health system from scratch, built around a strong software backbone. Nikhil will be telling the story behind Remedy's first product, a virtual primary care clinic that enables patients to receive care from the world's best doctors from the comfort of their smartphone. Leveraging powerful automation technologies, Remedy's software eliminates billing, compliance, and EMR data entry for physicians while dramatically improving quality of care, access, and affordability for patients. Nikhil will also be talking about how Remedy's paradigm for healthcare could one day enable anyone to prototype, validate, and distribute clinical technologies at the front lines of care.

Solving Healthcare’s Dirtiest Data Problems, and I Might Mention Blockchain. . .

Dr. Denise Gosnell, of PokitDok will lead this session.

Clinical research grabs attention with its “save your life” intention; but before you can receive the newest, greatest, (insert other “-est”) treatment, you need access to it. And, most people can sympathize that the access to, the cost of, and options within healthcare remain confusing. Even though 2016 introduced personal AI agents and self-driving cars, your local nurse still has to make phone calls, send faxes, and/or attach PDFs to unencrypted emails to transfer your personal health data from one system to another.
To address the most basic, but also most widely ignored, issues in healthcare, this talk is going to tackle the most fundamental of all topics in healthcare advancement: secure, system-wide data plumbing. Atop the right pipes (or, more accurately, blockchain), we can build any of the expected solutions which we all take for granted in other industries, such as cost transparency, in-network provider recommendations, predictably lower claim management costs, universal healthcare identity, and much more.
Nobody wants substandard care and inefficient processes. Let’s tackle the problem of data integration to ensure transparency, access, and options within healthcare.

Machine learning and IOT for medical prevention. A view from the trenches.

Pierre Gutierrez, of Dataiku will lead this session.

In this talk, Pierre Gutierrez, a data scientist at Dataiku, will discuss Dataiku's experiences using machine learning on IOT data. We will talk about the challenges processing and cleaning IoT data, and how to successfully train a model that can be deployed in production. We will illustrate our talk with two examples from our previous work. Creating algorithm for early epilepsy seizure detection based on wearable tech and Detecting people activity through sensor data.

R you experienced?: Genomics 101 and a real-world R example for genetic analysis

Sanjay Joshi, of Dell EMC will lead this session.

Sanjay will talk about Lambda architectures, Spark and Genomics along with the top 10 algorithm examples in biology. He will also use a real-world R sample dataset for eye- and hair-color to illustrate the complexity in biology.

Graph Representations of Clinical Standards of Care in Oncology

David Hughes, Stephen Barr of Seattle Cancer Care Alliance will lead this workshop.

The Clinical Pathways Program at the Seattle Cancer Care Alliance is tasked with the development and reporting of processes representing the highest standard of care for a given cancer type. The processes, known as Clinical Pathways, are derived from national guidelines, clinical evidence, and oncology research. Throughout a clinical pathway's development, treatment modalities such as chemotherapy, surgery, and radiation are integrated into a comprehensive documentation of the standard of care. Efficacy, toxicity, and various attributes of the patient's clinical picture (lab test, comorbidities, genetic markers, clinical evidence, etc) are incorporated throughout the process. To facilitate advanced concordance reporting of actual care to the standard, we developed a graph-based representation of the clinical pathway, stored in Neo4j. This data structure also allows for the embedding of semantic information about pathway nodes. Using this graph representation, we can enforce validity of the pathway, and employ graph-based machine learning techniques for detailed pathway analysis in relation to patient data.

NEW - Deconvolving cellular phenotypes in images: a review of deep learning and machine Intelligence methods in healthcare Imaging

Sanjay Joshi, of Dell EMC will lead this session.

As polulation scale imaging architectures are slowly moving into clinical research new words like "Radiomics" is being added to the vocabulary which means "imaging phenotype." Human and computer extracted phenotypes along with feature extraction and classification are underway on small datasets. What is the journey to real-world clinical utility? Sanjay will discuss several methods and efforts to ramp up deep learning and machine intelligence to population scale.

Semantic Search Applied to Healthcare

Juan Sequeda of Capsenta will host this talk.

In this presentation, Juan will show how graphs can be used to virtually integrate data from different, heterogeneous sources and how semantics can be added to make data smarter and enhance search. Juan will present a patient stratification case study from the Ohio State University Wexner Medical Center. Traditional Enterprise Data Warehouse solutions did not satisfy the need due to cost and data privacy. By using semantic and data virtualization technologies, the medical center was able to identify 30% more patients in need of Left Ventricular Assist Devices (LVAD). Some of the items Juan will cover include:
Methodology to create Knowledge Graphs in conjunction with mappings
Architectural decisions on virtualization (NoETL) vs materialization (ETL)
Intuitive ways to model financial and patient health information
Searching financial and patient information which matches desired criteria

Biorevolutions: a Data Science Approach to Success in Biotechnology

Gunnar Kleemann, Denis Vrdoljak will co present this session.

Biotechnology will change our lives, but which technologies will move from concept to mainstream? The Biotechnology market is a multi-billion dollar industry, with billions more invested into new ventures every year. Which ones are good investments? Using publicly available data on biotech startups at various stages, we developed a machine learning based predictive model to provide some insights into the likelihood of success for these startups. In this presentation, we discuss challenges faced and insights gleaned as we applied data science to analyzing biotech startups’ success rates. We walk you through our process-- from the ETL pipeline and architecture selection/optimization, all the way to the machine learning model scoring and selection.
Some of the topics that we cover include imputing to compensate for incomplete and missing data, incorporating network graph analysis into a larger machine learning framework, and selecting an architecture to balance vertical and horizontal scalability against future features and product development. We will end by introducing our analytical product; a web based UI that predicts startup IPO likelihood, while allowing the user to manually adjust the model, input hypothetical startup profiles, or explore the existing profiles to identify trends, patterns, and outliers.

Quantifying Patient Improvement in Equine Assisted Activities and Therapeutics with Kinetic Sensors

Duane Steward will present this session.

What is EAAT and what does it need? What is the scope of disability and health conditions that benefit from EAAT. Dr. Steward will discuss the feasibility studies performed to date with horse and rider strapped up with remote sensors
Analytics: Modeling and measuring coordination between complex systems—horse and physically challenged riders. Dr. Steward will then discuss the challenges and limitations: Relative data with little or no reference points; Volume, velocity, variability; Organic versus Mechanic; Measuring the human-animal bond.

Unfold The Mystery of Genome Through Mining Next Generation Sequencing Data

Anna Mengjie Yu will present this session.

Due to the advances of next generation sequencing technology, more genomic sequences can be generated in a much shorter time at a substantially lower cost. Mining through the genomic data can help us better understand the evolutionary relationship between different bacterial species, and better target the mutation sites that lead to certain diseases, thus providing better ways to make personalized medicine.
In this talk, I will show a de novo Genome assembly and visualization pipeline to process Illumina paired-end (PE) sequencing data. I will show published results and research in progress elucidating the evolutionary relationship of different species at whole genome level based on this pipeline. I will also demonstrate how interactive visualization tools Shiny and D3 can help us better explore our data and visualize our results.