Introduction to Data (Delta) Lake with Azure Databricks

09:00 AM Cranmore

Many organizations have responded to the their ever-growing data volumes by adopting data lakes as places to collect their data ahead of making it available for analysis. While this has tended to improve the situation somewhat data lakes suffer from some key challenges of their own: •Schema changes can break enrichment, joins, transforms between stages • Failures may cause data between stages to either drop on the floor or be duplicated • Partitioning alone does not scale for multi-dimensional data • Standard tables do not allow combining streaming and batch • Concurrent access suffer from inconsistent query results • Failing streaming jobs can require resetting and restarting data processing In this session we will take a look how Databricks Delta addresses this challenges by providing the opportunity for a much simpler analytics architecture able to address both batch and stream use case with high query performance and high data reliability.