- Date 1 Dec 2016
- Sectors Technology
Innovative ETL (Extract, Transform, Load) technology frees 80% of unstructured data trapped in Data Lakes, enabling high-value knowledge discovery and decision support
Game-changing capabilities in I2E 5.0 include normalization of concepts (e.g. dates, measurements, gene mutations) within unstructured text, advanced range search and a new query language EASL. These capabilities tackle the variety in big data, and accelerate insights from unstructured, semi-structured and structured data sources.
Normalization and range search helps users find key information (e.g. a particular temperature or a range of temperatures) in unstructured text sources regardless of how the information is expressed, and boosts ETL operations by identifying, extracting and standardizing data. Given that around 80-90% of big data is unstructured, these new text mining capabilities allow huge amounts of data to be processed that previously had to be read manually.
‘Imagine trying to find patients for a clinical trial who are between 18 and 65 years old, have any kind of cancer, weigh over 200lbs, and have a specific gene mutation’ said Linguamatics CTO and founder, David Milward. ‘There are so many ways to write down age, weight and gene mutations that most search tools fail to comprehensively find the data variants, let alone identify and match values within the correct ranges (e.g. 18-65 years old and greater than 200lbs). I2E 5.0 extracts and standardizes this critical data, so that organizations can discover information buried within unstructured or semi-structured text.’
The I2E 5.0 release also includes a new query language, EASL (Extraction And Search Language), that allows text mining queries to be described and written in a human-readable text format. EASL can be generated from outside of the I2E platform, supporting custom interfaces, and enhanced workflow automation.
Ryan Owens, Strategic Systems Analyst, Spartanburg Regional Healthcare System commented “Our team is very excited about the new capabilities in I2E 5.0. The ability to use normalization of concepts and relationships – particularly for TNM stages and genetic mutations – will speed and simplify our searches, significantly enhance accuracy of our results, and improve our understanding and use of our data.”