We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – August 3. Join AI and data leaders for insightful talks and exciting networking opportunities. Learn more about Transform 2022
San Francisco-based Databricks today announced that its cloud framework, Delta Live Tables (DLT), has become generally available. The service debuted last year as part of a gated preview, accessible only to select enterprises upon request, but now, it is open to everyone willing to build and manage reliable data pipelines for downstream analytics, data science and machine learning (ML) projects.
Whether big or small, enterprises are well aware of the operational complexity associated with turning initial SQL queries into production-grade ETL pipelines and keeping them up and running to ensure a consistent flow of clean and fresh data. The task involves an extensive series of low-level instructions that data engineers have to code for high-quality, error-free results. This means they end up giving most of their time to tooling and managing the infrastructure instead of deriving value from the data.
How do Delta Live Tables help?
Available as a new capability on Delta Lake, DLT simplifies the process of building and managing ETL pipelines with modern engineering practices and automation.
The solution allows data engineers to describe the outcomes they expect from data transformations (a declarative approach). Once this is done, it understands the dependencies of the full data pipeline live and automates virtually all of the manual operational challenges, starting from creating instructions for data transformation and validation to testing, quality monitoring and error handling (including identification of root cause of the error).
“By just adding “LIVE” to your SQL queries, DLT will begin to automatically take care of all of your operational, governance and quality challenges. With the ability to mix Python with SQL, users get powerful extensions to SQL to implement advanced transformations and embed AI models as part of the pipelines,” company executives Michael Armbrust, Awez Syed and Sam Steiny wrote in a blog post.
The framework is tailored to support both streaming and batch data workloads. Plus, all dependencies are also automatically executed downstream whenever a data table is modified.
“It’s a game-changing technology that will allow data engineers and analysts to be more productive than ever,” Ali Ghodsi, CEO and co-founder at Databricks, said. “It also broadens Databricks’ reach; DLT supports any type of data workload with a single API, eliminating the need for advanced data engineering skills.”
Currently, multiple leading enterprises are using DLT, including JLL, Shell, Jumbo, Bread Finance, and ADP.
“At ADP, we are migrating our human resource management data to an integrated data store on the lakehouse. Delta Live Tables has helped our team build quality controls, and because of the declarative APIs, support for batch and real-time using only SQL, it has enabled our team to save time and effort in managing our data,” Jack Berkowitz, chief data officer at ADP, said.
The general availability of Delta Live Tables comes as the latest move from Databricks to strengthen its position in the race for data supremacy. The company is competing against the likes of players such as Snowflake and Dremio and has been bolstering its Apache Spark-based ‘lakehouse’ offering with the launch of vertical-specific products, accompanied by solution accelerators, and partner-delivered integrations.
Snowflake, which went public in 2020, is also on a similar road with its data cloud and has been strengthening the offering with industry-specific solutions as well as partnerships. The company recently launched data cloud for retail enterprises and backed it up with a partnership with AWS to bring Amazon.com sales channel data directly into customers’ Snowflake data warehouse instances. It also announced the acquisition of Streamlit to simplify data application development on its platform. Ultimately, they all want to be the one-stop-shop for all data workloads, starting from data engineering and data science to data app development.
Source: Read Full Article