ARTICLE

What Scientists Need to Know about Big Data

Introduction

From the earliest stages of drug discovery to clinical trial phases, research labs are producing previously unthinkable volumes of data at an increasing velocity. Properly managed and utilized, this data can reveal stunning discoveries and opportunities for producing lifesaving drugs. Informatics solutions hold the power to reduce inefficiencies within every stage of drug development, ultimately reducing costs and speeding up time-to-market.

The average cost of bringing a drug to market is estimated at roughly $2.6 billion, with the entire process often taking more than 10 years. In many cases, companies sink substantial time and money into drug compounds that never hit the market.

One of the biggest barriers to more efficient drug discovery is the inability to properly leverage big data. Not only do pharmaceutical companies often lack the resources to process and analyze massive amounts of data, but they also waste scientific resources on non-core tasks like collecting data from disparate sources and normalizing formats.

Big data is rarely generated in a manner that is immediately applicable to discovery analysis. It needs to be processed so that different data types can coexist within the same model. Often data from one instrument is outputted in a completely different format than similar data from another device.

Precise data yields higher rates of assay reproducibility, faster drug discovery, and better adherence to compliance requirements.


Unifying operational and scientific data

Collecting operational data is essential to speeding up drug discovery timelines, but several prerequisites can stop laboratories from fully realizing their potential. Namely, researchers may not have the tools to capture every bit of relevant operational data. Indeed, it may be difficult to discern what type of data is relevant and what isn’t.

Does the temperature of the lab environment impact an assay? What about the ambient air pressure? These and similar questions need to be answered before researchers can determine the types of operational data they need to record to make their assays reproducible.

The question of reproducibility is an important one for researchers to consider. All too often, the answer to the question, “Is your data reproducible?” is a firm “we don’t know.”

In fact, a now notorious 2016 Nature article revealed that a large percentage of researchers can’t reproduce their colleagues' work, let alone their own. It’s not laziness or bad science that’s to blame — all too often researchers simply lack the resources to replicate experiments. Doing so can double the length of time and resource cost necessary to complete an assay.

The inclusion of automatically recorded operational data at the testing phase of preclinical drug discovery provides important context that is often critical to assay success.

Operational data provides context for scientific data. For example, Internet of Things sensors can track ambient air temperature, humidity levels, and similar lab conditions. Later, when scientists are reviewing assay data, they can refer to operational data to make sense of any anomalous findings.

Getting operational data is a challenge on its own. Some instruments may be able to capture operational conditions and scientific output simultaneously. However, labs may also require wireless IoT sensors to accurately monitor lab conditions.


Harnessing the power of machine learning

Meanwhile, another layer of complexity further obscures the proper use of big data in the lab: machine learning. The sheer volume of data produced in a single assay is difficult for humans to parse in a timely manner. Considering the number of assays necessary to generate even a single lead, it’s easy to understand why machine learning algorithms are necessary to analyze lab data.

This is another case where gaining one efficiency may actually take away valuable scientific resources, such as when bench scientists spend too much time tweaking algorithms and setting up PC environments. That type of work is best performed by technical experts prior to each assay.

When researchers can capture all relevant operational data, use it to provide context for scientific data, and leverage machine learning to quickly and efficiently analyze results, they’re empowered to focus all of their resources on tasks that add value and ultimately lead to profitable discoveries.


Understanding the benefits of informatics solutions

Informatics is the scientific processing of data for storage and retrieval. In the lab, informatics utilizes test results data as well as lab conditions information. Best-in-class informatics solutions help researchers become more productive and accelerate their projects by gaining critical insights from data analytics.

Laboratory leaders thinking about informatics solutions should consider the scientific data they currently have and how it’s isolated from relevant operational data. There are discoveries waiting to be uncovered in all that data — and an informatics partner can help lab stakeholders gain important new insights.

Beyond advantages of speed and productivity, informatics solutions can help labs adhere to critical compliance standards. For example, insights from big data can empower scientists to more easily identify safety markers and similar indicators necessary for in vivo drug testing. Ultimately, these solutions may be able to reduce the need for animal testing and make human trials safer and more effective.

By achieving those benefits, labs may also see a reduction in soft costs, such as the time wasted by scientists on algorithm configuration. An informatics partner can help labs rein in explosive spending by optimizing scientific resources and eliminating delays due to data processing.

Best-in-class informatics providers develop validated machine learning and artificial intelligence models so that scientists can arrive at the bench and get to work right away. This also solves issues related to data integrity and assay reproducibility. As both these challenges are tied to costs, labs that can reduce data related delays can better manage their resources.

For example, an informatics and scientific services partner can empower researchers to do work more productively and efficiently by:

  • Identifying necessary IoT sensors to capture relevant operational data, like lab conditions.
  • Ensuring data integrity with storage, processing, and standardization solutions.
  • Unifying operational and scientific data to provide full context for drug discovery assays.
  • Capturing asset maintenance and utilization metrics for complete lab optimization.

From the earliest stages of drug discovery to clinical trial phases, research labs are producing previously unthinkable volumes of data at an increasing velocity. Properly managed and utilized, this data can reveal stunning discoveries and opportunities for producing lifesaving drugs. Informatics solutions hold the power to reduce inefficiencies within every stage of drug development, ultimately reducing costs and speeding up time-to-market.

Sources:

  1. https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970