Staging data for Azure SQL Services

04:15 PM Jay Peak

Most companies are faced with the ever-growing big data problem. It is estimated that there will be 40 zettabytes of new data generated between 2012 to 2020. See the computer world article for details. Most of this data will be generated by sensors and machines. However, only a small portion of the data is available for users. How can IT professionals help business lines gather and process data from various sources? There have been two schools of thought on how to solve this problem. Schema on write is represented by the traditional relational database. Raw data is ingested by an extract, transform and load (ETL) process. The data is stored in tables that enforce integrity and allow for quick retrieval. Only a small portion of the total data owned by the company resides in the database. Schema on read is represented by technologies such as Hadoop or PolyBase. These technologies assumed that data integrity was applied during the generation of the text files. The actual definition of the table is applied during the read operation. All data owned by the company can reside in simple storage. Today, we will learn how to stage data using Azure blob storage. This staged data can be ingested by both techniques.