This blog was originally published by our partners at Brainspace.
According to a report by IDC/EMC, the total amount of digital data in the world is forecasted to reach 44ZB by 2020. 90% of this digital data is unstructured, much of it generated from external sources such as websites, blogs, social media sites, and smartphone applications. However, a great amount of unstructured data is generated by organizations, most of which is untagged, uncategorized, stored indefinitely, and never used. The massive amounts of unstructured data generated from within organizations and from external sources are making it difficult for knowledge workers to do their jobs. IDC estimates that the average knowledge worker spends 36% of their day looking for and consolidating information that is spread across multiple systems. In addition, knowledge workers are unable to find the information they need to do their jobs 44% of the time.
When leveraging analytics technology in managed review, the power of defensible, highly prioritized reviews can be adapted to the needs of each case. Download this on-demand webinar to learn how reduce your review costs →
Using machine learning, organizations can help knowledge workers quickly and efficiently find the information they need. Organizations can use machine learning to manage unused operational data (dark data) and redundant, outdated, or trivial data (ROT) to help reduce the amount of information knowledge workers have to sift through. Machine learning can help improve the search and discovery of information by enabling multi-concept search that produces contextually relevant results. Organizations can also use machine learning to automate information gathering tasks such as taxonomy building, ontology building, tagging, and document classification freeing up time for knowledge workers to focus on core tasks.
Machine Learning Can Help Reduce Dark Data and ROT
Organizations are not only grappling with the speed in which unstructured data is now generated, but also the rapidly growing amount of dark data and ROT. According to the March 2016 Veritas Technologies Global Databerg report, approximately 85% of stored data is either dark data or ROT leaving only 15% of stored data to be classified by IT leaders as business critical information. The report also states that if dark data and ROT are not dealt with, by the year 2020, it will unnecessarily cost organizations worldwide a cumulative $3.3 trillion to manage.
When it comes to enterprise information ROT, sheer duplication is one of the biggest problems. In searching masses of information, knowledge workers often retrieve the same or similar information again and again. While exact duplicates can be easily detected by hash codes, machine learning methods go beyond that to bring together and prioritize versions, near-duplicates, and conceptually related material. Machine learning applied to search can also shed light on dark data, by expanding user searches with conceptually related material, and linking actively used documents with related material that has been gathered, but not yet been exploited.
“Brainspace’s core technologies in conceptual search, unsupervised learning, and natural language processing allow cutting through masses of unstructured data to find its core value,” says information retrieval pioneer Dave Lewis, who recently joined Brainspace as Chief Data Scientist. “The combination of these technologies with powerful visualizations means that no upfront cleanup, indexing, or coding is necessary before enterprise data can be analyzed and unsuspected connections uncovered.”
Machine Learning Allows Information Gathering Tasks to be Automated
Organizations that have yet to leverage machine learning often have teams of knowledge workers spending much of their time performing information gathering tasks such as taxonomy building, ontology building, tagging, and document classification. Many of these organizations are finding that the amount unstructured data generated from within the business is growing at a rate that is simply too much for knowledge workers to manually tag, classify, and maintain.
Machine learning allows information gathering tasks to be automated so that knowledge workers can focus on core tasks. Machine learning is also capable of performing these tasks at very high rates of speed, far beyond human capabilities.
It should be noted that a machine learning platform with strong document classification capabilities can actually reduce the need for traditional taxonomies/ontologies altogether.
Machine Learning Can Help Improve Search and Discovery of Information
Every second massive streams of unstructured data are generated from emails, social media sites, smartphones, sensors, wearable electronics, and many other data sources. The magnitude, variety, and velocity of unstructured data is staggering; for knowledge workers (and anyone looking for specific information), trying to find the information they need is like looking for a needle in an infinite number of haystacks.
Search engine companies like Google and Microsoft Bing are using machine learning and artificial intelligence to help millions upon millions of people search the web to find information about pretty much everything. According to the Internet Live Stats website, there are 55,218 Google searches every second at the time of this writing.
Knowledge workers need to find and use information from multiple sources
While Google and Microsoft Bing are using machine learning and artificial intelligence to improve the search and discovery of information on the web, knowledge workers often need to search for information that can only be found within organizations. Knowledge workers are often looking for information that may be found within document management systems, intranets, extranets, email archives, and portals. In order to successfully do their jobs, knowledge workers must quickly find and use information from many different sources including the web.
Many organizations have neither an effective data management system in place, nor do they provide effective search tools for knowledge workers. Organizations need to ensure that knowledge workers can quickly and effectively search for and find relevant information regardless of whether they’re looking for that information from sources within the enterprise or the web.
Knowledge workers need more than traditional keyword search
Traditional search engines work best if the user knows exactly what they’re looking for and can provide relevant keywords for their search. Knowledge workers often search for information without knowing exactly what they’re looking for and sometimes without knowing exactly what to ask for. If a knowledge worker doesn’t know the exact keywords to enter, a traditional search engine may not return results that are relevant to the query. If a traditional search engine does return results that are relevant to the query, and the user would like to see similar results, the search engine may not be able to provide results that are conceptually similar. In addition, the search engine may return thousands of results that the worker does not have time to sift through and read.
Machine learning allows for truly semantic, multi-concept search
Traditional keyword search is one dimensional; it provides results that contain the specific keywords, but often fails to return results that are conceptually related to the original search query. Knowledge workers need search tools that allow them to search for information using concepts instead of keywords. Using machine learning, organizations can make it possible for knowledge workers to search for information using multi-concept search that produces contextually relevant results.
Machine learning allows for truly semantic, multi-concept search. While many semantic search engine platforms use machine learning, they are not all created equal. Dave Copps, Brainspace founder and CEO, says that “semantic search engines are able to go beyond keyword matching and match on concepts. In fact, the more powerful semantic search engines will sometimes produce relevant search results that do not contain any of the original query words bringing an element of serendipity to search.”
Using machine learning and artificial intelligence, Brainspace has built a truly semantic search platform that is capable of automatically and continually learning, scaling intelligence, and understanding the intent and context of the user in order to return the most relevant results. “Typically semantic technologies look to average all concepts in a corpus—find a semantic center. This is a significant deficiency in other semantic search technologies,” says Copps. “Multi-concept enables our machine learning processes to achieve a more human-like learning from any given corpus because of its recognition of multiple themes from within a single section of text.”
One of the biggest challenges for knowledge workers is finding relevant information from the many vast sources of data available; 44% of the time, knowledge workers cannot find the information they need. According to an IDC report, an enterprise with 1,000 knowledge workers loses $5.7 million on average every year because of lost productivity caused by workers searching for, but not finding, relevant information.
There is simply too much unstructured data generated every second from within organizations and external sources; there are too many vast streams of data moving at lightning speed. The amount of unstructured data generated from within organizations alone has grown far beyond the human capacity to search through, organize, and manage.
Organizations can no longer expect knowledge workers to manually tag, classify, and maintain the massive amounts of unstructured data generated from within the business each and every day. Organizations can no longer expect knowledge workers to find the information they need using only traditional keyword search. Organizations can expect modern search tools powered by machine learning, artificial intelligence and other advanced technologies to help knowledge workers quickly and efficiently find the information they need.
Artificial intelligence and machine learning-powered semantic search platforms like Brainspace can help organizations help knowledge workers search for and find the information they need to successfully do their jobs. The Brainspace platform is highly scalable and can analyze the world’s largest unstructured datasets identifying the relationships between words, phrases, and categories without the need for human intervention. Brainspace can provide conceptually and contextually relevant search results to users all while actively learning and dynamically adapting to new content.
Attract & Retain Top Talent
With a rapidly changing industry, it's vital to offer the right compensation and set the right expectation. With our Salary Guide, get detailed job descriptions, industry insights and local salary data to equip your managers with hiring confidence and expertise.Get your copy »