We are entering an exciting new era for data analytics. At the May 2023 Build conference, Microsoft announced Fabric.
Microsoft Fabric is a powerful software solution that provides a comprehensive SaaS (Software-as-a-Service) data analytics platform. It addresses the challenges of managing and analysing data by integrating various tools into a single offering.
One of the critical strengths of Microsoft Fabric is its ability to support multiple personas, including Data Engineers, Data Scientists, and Data Analysts. These different roles can leverage the platform’s capabilities to perform their tasks efficiently, fostering collaboration and maximising organisational productivity.
A significant advantage of Microsoft Fabric is its ability to eliminate data silos. Traditionally, data silos hinder data accessibility and gaining insights from data across various sources. Microsoft Fabric tackles this issue by utilising OneLake for storage, which acts as a central repository for data. By consolidating data in OneLake, Microsoft Fabric enables seamless access to data from multiple sources, facilitating data integration and analysis.
Microsoft Fabric standardises using the Delta Lake file format to enhance its data management capabilities further. Delta Lake provides a reliable and scalable solution for managing large volumes of data. By adopting this standard format, Microsoft Fabric ensures consistency, reliability, and compatibility across different data sources and enables efficient data processing and analytics.
The six most significant observations I have from working with Fabric are:
Deploying Microsoft Fabric is incredibly simple and convenient. As a complete SaaS (Software-as-a-Service) analytics end-to-end solution, Microsoft Fabric eliminates the need for any deployment effort on your part, making it exceptionally user-friendly.
Unlike traditional methods, Microsoft Fabric does not require deploying physical servers or worrying about cloud infrastructure setup using infrastructure as code.
To start with Microsoft Fabric, you only need to sign up for the service. Once you have created an account, you will find that all the necessary components—storage, compute engines, and analytics services—are available and waiting for you to utilise. There is no need to go through the time-consuming process of deploying and configuring each component individually.
By eliminating the need for deployment, Microsoft Fabric saves you valuable time and effort, allowing you to leverage its powerful analytics capabilities.
2. Delta Lake File Format
Microsoft has gone all in on Delta Lake, an open-source storage framework with several features for managing and processing data in a cloud environment. Some features of Delta Lake are:
- ACID Transactions: Delta Lake provides ACID (Atomicity, Consistency, Isolation, Durability) transactions, ensuring data integrity and reliability. This feature means data modifications are fully committed or rolled back, preventing inconsistencies.
- Versioned Parquet Files: Delta Lake leverages versioned Parquet files as the underlying storage format for data. Parquet is a columnar storage file format that offers efficient compression and query performance. Delta Lake allows for schema evolution and efficient data access using versioned Parquet files.
- Transaction Log: Delta Lake maintains a transaction log that records all commits to the table or blob store directory. This log serves as a historical record of changes, enabling features like time travel to access data at specific points in time and facilitating data recovery in case of failures.
- Native Integration: Delta Lake is natively built into the Fabric UI, a user interface for managing data in the Microsoft ecosystem. It is tightly integrated with the Fabric components, Data Flows, Power BI DirectLake, and Fabric Datawarehouse. This integration provides a seamless experience for data ingestion, transformation, analysis, and reporting.
Described as OneDrive for Data, OneLake is a storage solution developed by Microsoft that underpins the Microsoft Fabric analytics solution. It is designed to address the challenge of data silos and enable seamless data storage and consumption across various Microsoft Fabric compute engines.
OneLake employs the Delta Lake format as the standard for storing data within the Fabric ecosystem. Delta Lake provides reliability, performance, and scalability features for managing large datasets. By leveraging Delta Lake, OneLake ensures that data is stored consistently and efficiently across the platform.
Underneath the hood, OneLake is built on Azure Data Lake Gen 2, a scalable and secure cloud-based storage service provided by Microsoft Azure. This capability means that users can store data in any file format supported by Azure Data Lake Gen 2. This flexibility allows organisations to work with various data types, including structured, semi-structured, and unstructured formats.
OneLake is automatically included as part of a Microsoft Fabric tenant. When you have a Microsoft Fabric account, you gain automatic access to OneLake, eliminating the need for additional setup or configuration.
Furthermore, OneLake supports the creation of shortcuts to data stored in other OneLake locations, Azure Data Lake Gen 2, or Amazon S3 storage accounts. By creating these shortcuts, users can easily access and consume data without requiring complex Extract, Transform, and Load (ETL) pipelines to move data from different storage locations. This feature enhances data integration and simplifies data processing workflows within the Microsoft Fabric ecosystem.
In general terms, a Lakehouse is a data architecture that combines the capabilities of data lakes and data warehouses into a unified platform. It aims to address the limitations and challenges of traditional data warehouses and data lakes by providing a unified and scalable solution for storing, processing, and analysing large volumes of structured and unstructured data.
The Microsoft Fabric Lakehouse incorporates the following features:
- OneLake Integration: The Microsoft Fabric Lakehouse stores your files in OneLake. OneLake supports various file formats and is built on Azure Data Lake Gen 2, ensuring secure and scalable storage for your data.
- Delta Lake Tables: The Microsoft Fabric Lakehouse UI provides functionality to load raw data into Delta Lake tables. Delta Lake is a transactional storage layer built on the data lake, enabling features such as ACID transactions, schema enforcement, and time travel capabilities. Loading data into Delta Lake tables ensures data integrity and enables efficient querying and analytics.
- Personas: The Microsoft Fabric Lakehouse caters to different personas, including data engineers, data scientists, and SQL developers. Data engineers and scientists can leverage Spark notebooks for advanced data processing and analytics tasks. SQL developers have the flexibility to use familiar SQL queries to interact with the data stored in the Lakehouse.
- SQL Endpoint: With the Microsoft Fabric Lakehouse, you automatically receive a SQL Endpoint for your data, which is listed separately in the Fabric workspace. This endpoint allows you to execute SQL queries directly against the data stored in the Lakehouse, providing a convenient and efficient way to access and analyse the data.
- Power BI Dataset: By default, the Microsoft Fabric Lakehouse provides a Power BI Dataset containing your tables for downstream analytics. This feature means you can seamlessly connect Power BI to the Lakehouse and utilise the tables and views within your Power BI reports and visualisations. The Power BI dataset is automatically updated as new tables and views are added to the Lakehouse, ensuring real-time or near-real-time insights.
5. Data warehouse
The Microsoft Fabric data warehouse is a modern version of the traditional data warehouse, designed to provide a unified and streamlined data analysis and reporting approach. It encompasses the following key features:
- Unified View: Microsoft Fabric’s data warehouse organises data from various sources into a cohesive view. This unified view eliminates data silos and enables analysts and stakeholders to access and analyse data from different systems in a consistent and integrated manner.
- Full SQL Support: The data warehouse in Microsoft Fabric offers comprehensive SQL support, allowing users to leverage their SQL skills and knowledge. This support includes executing complex SQL queries, performing aggregations, applying transformations, and more. With full SQL capabilities, users can efficiently interact with the data stored in the warehouse.
- Data Manipulation: Fabric’s data warehouse goes beyond read-only operations by allowing inserting, updating, and deleting data in the tables. This feature enables data engineers and analysts to retrieve information and modify and manage the underlying data for enhanced data governance and lifecycle management.
- Relational Layer: The Microsoft Fabric data warehouse is a relational layer on top of the underlying data stored in the Lakehouse architecture. This capability means that it leverages the data management capabilities of the Lakehouse, such as Delta Lake, while providing a familiar relational database experience for analysts. Analysts can use T-SQL (Transact-SQL), the SQL dialect used in Microsoft SQL Server, and Power BI to explore and analyse the warehouse data.
The Microsoft Fabric analytics platform billing offers significant simplifications and cost-saving opportunities compared to previous Azure Lakehouse solutions. Here are the key points related to the billing of Microsoft Fabric:
- Simplification: With Microsoft Fabric, the billing process is greatly simplified. You only need to purchase one F-SKU (Fabric SKU) to get started, consolidating the platform’s billing. This unified SKU eliminates the need to manage multiple Azure services with different billing levers and options, simplifying the overall billing experience.
- OneLake: While the Fabric SKU covers the analytics platform, it’s important to note that you have to pay for OneLake storage separately. OneLake is the scalable and cost-effective data lake storage solution in Azure. The storage costs associated with OneLake will be billed independently from the Fabric SKU.
- Cost Savings: A notable advantage of Microsoft Fabric is the ability to pause or scale down the platform during light usage. By utilising this feature, you can save significant money as you will only be billed for the active usage periods. This flexibility allows you to optimise costs based on your specific workload requirements.
- Architectural: As a data architect, the ability to scale down Microsoft Fabric during light usage becomes highly relevant. This factor will dictate how you architect your Fabric resources into separate workspaces. By separating resources into different workspaces, you can maximise your ability to turn off or scale down specific workspaces, thus optimising costs even further.
- Power BI: depending on your Fabric SKU, individual Power BI user licences may still be required.
Microsoft Fabric includes a range of additional features in addition to those above. These include Data Science, OneSecurity, Real-Time Analytics, Data Governance integration with Microsoft Purview, and Data Activator, which allows actions to be triggered based on data conditions, and Power BI for data visualisation and insight. These features enhance the overall functionality and versatility of the Microsoft Fabric platform.
While the goal with Fabric is simplicity, we still have architectural decisions to make. Users can choose from Pipelines, Dataflows, Spark Notebooks, or SQL options (or any combination) for efficient and scalable data processing. Additionally, users can present their data in a Lakehouse, Data Warehouse, or Datamart, depending on their specific needs and use cases. The choice depends on many factors, including skill level, experience, performance, cost and personal preference.
As Microsoft Fabric progresses towards its general availability, it is expected to evolve further, incorporating refinements and enhancements based on user feedback. The ongoing development and expansion of the Microsoft Fabric ecosystem will undoubtedly bring exciting possibilities and advancements in data management, analytics, and decision-making capabilities for organisations leveraging the platform.