Mining New Insights In Unstructured Scientific Data | Stories | PerkinElmer
PerkinElmer uses cookies to ensure that we give you the best experience possible on our website. This may include cookies from third party websites. If you continue without changing your settings, we will assume that you consent to receive cookies from this website. You can change your cookie settings at any time. To learn more, please review our cookie policy, which includes information on how to manage your cookies.

Mining New Insights In Unstructured Scientific Data

April 24, 2017

Data mining

Drowning In Data

Technology is revolutionizing science and medicine. We see, hear, and experience the results of this revolution every day. Google Analytics can now predict flu strains faster than official sources. Physicians already leverage technology to better diagnose and treat disease down to the personalized level.1 And throughout the global healthcare system, the ability to analyze patient data holds the potential to improve the quality of healthcare delivery and reduce cost significantly.2

There is no question that technology is advancing medicine in ways only dreamed of a decade ago. But there is an immense white elephant lurking in the research and clinical labs of the world.

“Science has data coming out of its ears,” Dr. Timo Hannay, Managing Director of Digital Science, noted in an article that appeared in WIRED in 2014. “Yet in this age of big data, science has a big problem,” Dr. Hannay said. “It is not doing nearly enough to encourage and enable the sharing, analysis, and interpretation of the vast swatches of data that researchers are collecting.”3

One reason is there is just too much information out there. In addition to what is already in existence, we create more than 2.5 quintillion bytes of new data every day. Put another way, that is the equivalent of 57.5 billion, 32-gigabyte iPads—or about eight iPads for every person on earth every day.4 Buried beneath all that data, however, are any number of potential breakthroughs right under our collective noses. The challenge is how to best mine the critical data currently hidden away in the countless clouds, hard drives, clinical reports and lab notes, and published reports that continue to grow at a mind-numbing pace and in formats that do not easily lend themselves to analytics… until now.

Finding the Right Information

Scientists need ways to easily search for the right data and quickly understand the relationships between sources of information. In effect, users need to be able to find the right data sources for their analysis just as easily as they can find the right product on Amazon with an intuitive search experience. “That allows for faster identification of relevant datasets across disparate systems to create a rich visual data and information continuum meant to improve scientific insights and the quality of decisions.” – Dr. Mark Demesmaeker, Vice President Scientific Analytics, at PerkinElmer.

Considering that nearly 90% of information assets in a typical enterprise currently go untapped, this semantic search experience can lead to some pretty powerful findings in the scientific, pharmaceutical, and clinical worlds.5

One company with a long history in data discovery and analysis is PerkinElmer, a global leader focused on innovating for a healthier world. PerkinElmer is the exclusive distributor of the Attivio® platform for Life Sciences that allows researchers to quickly identify and unify the relevant data sources for their analysis, including structured, semi-structured, and unstructured content, bridging proprietary and public domain sources. The platform samples critical content from the disparate sources to understand their implicit relationship and can also suggest the connection of related data sets even when those datasets do not directly reference each other. The resulting virtual “datamarts” can then be seamlessly sent to the TIBCO Spotfire® visual analytics software to generate visually intuitive, interactive displays to support further collaboration and decision making.

Most of the information being created today is unstructured, being captured in free text, pdfs, emails, journal articles, and other forms that were not easily utilized for analytics – until now. The Attivio platform incorporates advanced text analytics to fully unlock unstructured data. “Using text mining with scientifically focused ontologies, unstructured data can now be structured into tabular models which are easily utilized with leading analytics platforms like TIBCO Spotfire,” Demesmaeker says.

The SciBite Connection

The latest addition to PerkinElmer’s solutions for big data is SciBite’s TERMite text analytics platform. This application enriches the text mining with dictionaries and ontologies containing millions of scientific and medically relevant terms to greatly improve the text analytics output quality.

“Basically, it uses a comprehensive scientific understanding, to extract the right information from unstructured data sources such as PubMed to empower researchers with a complete view of all relevant information versus a partial picture. Imagine asking the question “How the 84,000+ PubMed articles on Pancreatic Cancer quote CA19-9 as suitable diagnostic biomarker for a specific sub-form?” and trying to answer that by reading the top search results. The only way to really understand the full body of literature is for a computer to read all of the articles and provide text mining output for analytical and statistical analysis.” Demesmaeker says. “That is the real strength of the Attivio and SciBite expanded platform.”


  1. Eric Schadt, “The Role Of Big Data In Medicine,” McKenssey Insights, November 2015, accessed February 7, 2017.
  2. Wullianallur Raghupathi, Viju Raghupathi, “Big Data Analytics In Healthcare: Promise And Potential,” Health Information Science And Systems, December 2014, pp. 2047-2501, accessed March 3, 2017.
  3. Imo Hannay, “Science’s Big Data Problem”, WIRED, 2014, accessed February 7, 2017.
  4. Cory Vander Jagt, “Can You Find The Needle In The Haystack? Let’s Talk BI Data Discovery,” GoodData, January 28, 2015, accessed February 7, 2017.
  5. Douglas Laney, “Information Innovation Key Initiative Overview,” Gartner, April 22, 2014, accessed February 7, 2017.

Read More Stories Like This One

Shelf Life: Take Control of Stability Testing

Typically, both drug substance and drug product are tested in at least two different storage conditions: long term ambient storage temperature and accelerated co...

GMP Compliance in QC Labs: It Takes a Village

Meeting standards for Good Manufacturing Practice (GMP) compliance is increasingly difficult for pharma labs. Regulatory bodies around the globe enforce these st...

Related Products