As we watched the news unfold on the COVID-19 crisis, what became apparent is the importance of empirical data and analytics to inform citizens and governments worldwide and empower them to make policy and healthcare decisions. Johns Hopkins, for example, quickly put together a data lake with information feeds from countries across the globe on COVID-19 statistics. This became the cornerstone for the public and experts alike to monitor day-to-day statistics from all over the world and modelling and forecasts based on machine learning helped millions.
The beauty of data lakes is that their creation and use is not bounded by a rigid structuring of data—it can be ingested in its native format. Cloud makes this possible due to:
- Inexpensive, on-demand cloud-based storage
- Built-in cloud tools and blueprints for ingesting and cleaning data
- Multiple cloud security and compliance layers to protect resources
- Latest and most innovative cloud-based analytics tools
Once the data lake is “formed”, it is imperative to provide security and compliance measures to protect both the system integrity and personal information and intellectual property data. Cloud data lake security can be governed and protected within a software-defined encryption and dynamic isolation environment. Compliance standards (e.g. HIPAA, GDPR, NIST) can be assessed, monitored, and enforced using best in class cloud-based services and methodologies.
Cloud analytics, data mining, and machine learning can then be brought to bear to deliver actionable insights. When it comes to data feeds for training the artificial intelligence models, more feeds and richer data typically yield more correlations and insights. Potential data sources include:
- Existing data warehouses
- Mobile, web, and social media data
- IoT and sensor data
- Demographic data
- Customer Relationship Management (CRM) data
According to a recent Aberdeen report, companies that deploy and utilize data lakes saw increases in revenue over businesses that did not. IDC predicts spending on cognitive and AI systems will reach $77.6B by 2022.
Overall, any organization can utilize data lakes and AI to:
Gain insight into innovation choices. A data lake can help your R&D teams test their hypotheses, refine assumptions, and assess results—such as choosing the right materials in your product design resulting in faster performance, doing genomic research leading to more effective medications, or understanding the willingness of customers to pay for different product or service attributes.
Improve customer interactions. A data lake can combine customer data from a CRM platform with social media analytics, a marketing platform that includes buying history, and incident tickets to empower the business to understand the most profitable customer cohort, the cause of customer churn, and the promotions or rewards that will increase loyalty.
Increase operational efficiencies. The Internet of Things (IoT) introduces more ways to collect data on processes like manufacturing, with real-time data coming from internet connected devices. A data lake makes it easy to store and run analytics on machine generated IoT data to discover ways to reduce operational costs while increasing quality.
Leading healthcare, biological and pharmaceutical research, logistics, financial, and service-based organizations have discovered the competitive and strategic value of data lakes and advanced analytics using machine learning.
Much More Than IT Outcomes
A current Unisys client, a large US public university system, began a multi-cloud, multi-campus integration of data, culminating in the creation of a cloud-based, central data lake. This data repository included a variety of structured and unstructured data, including data from existing CRM tools, student and faculty information, attendance, and test performance data, and much more from the 500,000+ students, faculty, and staff.
While the IT outcomes is remarkable for this client, their data lake is yielding even greater benefits from predictive analysis—such as identifying at-risk students and intervening before poor performance, attendance, or other factors adversely affecting their studies and academic success.
At Unisys, we’re not just proponents of data lakes and analytics for our clients —we use it ourselves to continuously improve our systems, services, and commitment to clients. Data lakes for AI-led Operations use cloud-based ingestion, storage and machine learning services and encompass both historical and real time data. The data sources include operational tooling, Configuration Management Databases (CMDB), and IT Service Management (ITSM) systems. This allows for trends and incidents to be forecast, and root cause to be intelligently identified to provide better service to our clients [For a more detailed look at how Data lakes and AI-led Operations can drive better experiences in the cloud, see Unisys CTO for emerging technology Suzanne Taylor’s latest article in Forbes Magazine.]
Data driven decisions and predictions using machine learning will provide a competitive advantage to companies across industries with cloud AI services paving the way for increased adoption. Don’t hesitate to dive in with the right security guardrails to figure out what insights are swimming in your data lake.