AWS re:Invent: 9 new solutions to unlock the value of data

Global analyst firm IDC predicted that around 175 trillion gigabytes of new data will be generated globally by 2025.

The firm also noted that the amount of digital data that will be created over the next five years will be greater than twice the amount of data created since the advent of the digital age.

In his keynote address at AWS re:Invent 2022, CEO Adam Selipsky launched a slew of new solutions and services to empower customers to derive more value from their data.

“Data is at the centre of applications, processes and business decisions. It is the cornerstone of almost every organisation’s digital transformation,” he said.

“Working with data is tricky,” he said. “You need a complete set of tools that account for the scale and variety of this data and the many purposes for which you want to use it.”

Here are some of the biggest announcements made by the AWS chief:

Amazon DataZone

Amazon DataZone is a new data management service that makes it faster and easier for customers to catalogue, discover, share, and govern data stored across AWS, on-premises, and third-party sources.

“Amazon DataZone enables you to set data free throughout the organisation safely by making it easy for admins and data stewards to manage and govern access to data,” said Selipsky. “It makes it easy for data engineers data engineers, data scientists, product managers, analysts and other business users to discover, use and collaborate around that data to drive insights for your businesses.”

Data producers use Amazon DataZone’s web portal to set up their own business data catalogue by defining their data taxonomy, configuring governance policies, and connecting to a range of AWS services, partner solutions and on-premises systems.

It eliminates the heavy lifting of maintaining a catalogue by using machine learning to collect and suggest metadata for each dataset and by training on a customer’s taxonomy and preferences to improve over time.

“To unlock the full power of the full value of data, we need to make it easy for the right people in applications to find access and share the right data when they need it.

“And, to keep data safe and secure. With appropriate controls in place. You need governance to find the right balance between control and access to crucial but it’s different for every organisation.

AWS Supply Chain

AWS Supply Chain is a new application that helps businesses increase supply chain visibility to make faster, more informed decisions that mitigate risks, lower costs, and improve customer experiences. It automatically combines and analyses data across multiple supply chain systems so businesses can observe their operations in real-time, find trends more quickly, and generate more accurate demand forecasts that ensure adequate inventory to meet customer expectations.

According to Selipsky, AWS Supply Chain unifies supply chain data, improves visibility and provides ML-powered actionable insights, enabling users to optimise processes and increase customer service levels.

Amazon OpenSearch Serverless

OpenSearch Serverless automatically provisions and scales the underlying resources to deliver fast data ingestion and query responses for even the most demanding and unpredictable workloads.

“You can use this platform to perform interactive analytics, real-time application monitoring, website search and more without having to worry about provisioning, configuring and scaling infrastructure.

“Now we have serverless options for all of our analytic services. No one else can say that,” said Selipsky.

OpenSearch Serverless simplifies the process of running petabyte-scale search and analytics workloads without having to configure, manage, or scale OpenSearch clusters.

Customers leveraging OpenSearch Serverless only need to pay for the resources they consume.

Amazon Aurora Zero-ETL integration with Amazon RedShift

Amazon Aurora now supports zero-ETL integration with Amazon Redshift. This capability enables near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from Aurora.

Within seconds of transactional data being written into Aurora, the data is available in Amazon Redshift. This means users don’t have to build and maintain complex data pipelines to perform extract, transform, and load (ETL) operations.

Zero-ETL integration also enables users to analyse data from multiple Aurora database clusters in the same new or existing Amazon Redshift instance to derive holistic insights across many applications or partitions.

With near real-time access to transactional data, you can leverage Amazon Redshift’s analytics and capabilities such as built-in ML, materialised views, data sharing, and federated access to multiple data stores and data lakes to derive insights from transactional and other data.

“The entire system is serverless. It dynamically scales up and down based on the data volume, so there’s no infrastructure to manage. Now, you really have the best of both worlds. Fast, scalable transactions in Aurora, together with scalable analytics in redshift, all in one seamless system,” said Selipsky.

Amazon Redshift Integration for Apache Spark

Amazon Redshift integration for Apache Spark makes it easy to build and run Spark applications on Amazon Redshift and Redshift Serverless.

This will enable customers to open up the data warehouse for a broader set of AWS analytics and machine learning (ML) solutions.

With Amazon Redshift integration for Apache Spark, you can get started in seconds and effortlessly build Apache Spark applications in a variety of languages, such as Java, Scala, and Python.

“Today, if you work in EMR, you can use Spark to run analytics on data. But if you want to run a Spark query for data located in Redshift, you have to either move the data into S3 or find, download and configure a slow open-source container connector to Redshift,” explained Selipsky.

According to Selipsky, a better way would be to just run a Spark query on the data right on RedShift.

“We wanted to make this process fast and seamless. And now, it’s incredibly easy to run Apache Spark applications on Redshift data from AWS analytic services,” he added.

“Now, there’s no more need to move any data, no need to build or manage any connectors. Both zero ETL integration with Amazon RedShift and RedShift integration with Apache Spark make it easier to generate insights without having to build the ETL pipelines or manually move data around.”

AWS Clean Rooms

AWS Clean Rooms is a new analytics service that helps companies across industries easily and securely analyse and collaborate on their combined datasets—without sharing or revealing underlying data.

With AWS Clean Rooms, customers can create a secure data clean room in minutes and collaborate with any other company in the AWS Cloud to generate unique insights about advertising campaigns, investment decisions, clinical research, and more.

Selipsky noted that data clean rooms are protected environments where multiple parties can analyse combined data without exposing the underlying data.

“The clean rooms are hard to build. Their complex requirements take months to develop.”

The purpose of the clean rooms is too provide a single service that companies can use to collaborate on data while protecting sensitive consumer information and reduce or eliminate the sharing of raw data.

Amazon Security Lake

Amazon Security Lake simplifies the analysis of security data so that users can get a more complete understanding of the entire organisation’s security.

It automatically centralises security data from the cloud, on-premises, and custom sources into a purpose-built data lake stored in their account. Security Lake automatically gathers and manages all security data across accounts and Regions.

“It gives you visibility into the security of your security data,” explained Selipsky.

“Security like automatically collect and aggregate security data for partner solutions like Cisco CrowdStrike and Palo Alto Networks as well as for more than 50 security tools integrated into the solution.”

Security Lake also manages the lifecycle of data with customisable retention settings and storage costs with automated storage tiering.

“We look forward to seeing how you’re going to use Amazon Security Lake to improve your security posture, reduce the time to resolve security issues and simplify the lives of your security teams,” said Selipsky.

Amazon EC2 Hpc6id Instances

Amazon Elastic Compute Cloud (Amazon EC2) Hpc6id instances are optimised to efficiently run memory bandwidth-bound, data-intensive high-performance computing (HPC) workloads, such as finite element analysis and seismic reservoir simulations. With EC2 Hpc6id instances, you can lower the cost of your HPC workloads while taking advantage of the elasticity and scalability of AWS.

“EC2 Hpc6id is designed to deliver leading price performance data and memory intensive HPC workloads, higher memory bandwidth report or faster local SSD storage and enhanced networking with Elastic Fabric Adapter (EFA) Adapter.”

Amazon EC2 Hpc6id instances are powered by 3rd Gen Intel Xeon Scalable processors (Ice Lake) that run at frequencies up to 3.5 GHz, 1024 GiB memory, 15.2 TB local SSD disk, 200 Gbps EFA network bandwidth, which is 4x higher than R6i instances.

EC2 Hpc6id instances, built on the AWS Nitro System, offer 200 Gbps Elastic Fabric Adapter (EFA) networking for high-throughput inter-node communications that enable your HPC workloads to run at scale.

AWS SimSpace Weaver

AWS SimSpace Weaver is a new computing service to run real-time spatial simulations in the cloud and at scale.

With SimSpace Weaver, simulation developers are no longer limited by the compute and memory of their hardware. It enables organisations to run simulations on situations that are rare, dangerous, or very expensive to test in the real world.

“With SimSpace Weaver, you could run large scale simulations without being constrained by a single piece of hardware or having to manage the underlying memory or networking infrastructure,” said Selipsky.

The solution allows developers to spend more time building and understanding their simulations and less time deploying and scaling.