Introduction
Dell EMC PowerStore is a robust and flexible storage and compute option that is well suited for SQL Server 2019 Big Data Clusters. This section provides an overview for PowerStore and SQL Server 2019 Big Data Clusters.
PowerStore overview
PowerStore achieves new levels of operational simplicity and agility. It uses a container-based microservices architecture, advanced storage technologies, and integrated machine learning to unlock the power of your data. PowerStore is a versatile platform with a performance-centric design that delivers multidimensional scale, always-on data reduction, and support for next-generation media. PowerStore brings the simplicity of public cloud to on-premises infrastructure, streamlining operations with an integrated machine-learning engine and seamless automation. It also offers predictive analytics to easily monitor, analyze, and troubleshoot the environment. PowerStore is highly adaptable, providing the flexibility to host specialized workloads directly on the appliance and modernize infrastructure without disruption. It also offers investment protection through flexible payment solutions and data-in-place upgrades.
SQL Server 2019 Big Data Clusters overview
SQL Server 2019 introduced a groundbreaking data platform with SQL Server 2019 Big Data Clusters (BDC). Designed to address big data challenges in a unique way, Big Data Clusters solve many of the traditional challenges with building big-data and data-lake environments. See an overview of SQL Server 2019 Big Data Clusters on the Microsoft page SQL Server 2019 Big Data Cluster Overview and on the GitHub page SQL Server Big Data Cluster Workshops .
In addition to the product documentation, the following subsections cover specific benefits when deploying BDC on PowerStore.
Platform choice
SQL Server 2019 BDC deploys on the Kubernetes platform. This means that several different distributions for Kubernetes are supported, and various Linux distributions that run Kubernetes. While SQL Server 2019 BDC can be deployed either in the public cloud or on premises, this paper focuses on the PowerStore on-premises deployment. Dell Technologies also provides many Kubernetes hosting platforms and validated designs, depending on the required solution, besides the design addressed in this paper. Regardless of the deployment, cluster management and user experience are largely the same. For administrators and IT professionals transitioning from Microsoft SQL Server on Windows Server, the Kubernetes platform can make the transition to SQL Server Big Data Clusters a bit daunting. At the time of publication, there are 90 certified Kubernetes offerings from the Cloud Native Computing Foundation . Also, the Kubernetes platform is rapidly evolving, and updates are generally published on a quarterly basis. These factors can make finding, setting up, and running a solution extremely challenging. Dell Technologies™ has conquered this challenge by providing step-by-step instructions on how to set up and deploy a Big Data Cluster with only three commands on PowerStore X models. This process is fully documented.
Scale
When planning a big data environment, scaling can sometimes be an afterthought. When scalability is not planned for an environment that will inevitably grow, this scenario can create problems in the future. SQL Server 2019 Big Data Clusters have been designed with scalability in mind. The default installation creates a cluster of three nodes, enabling performance and scale from the start. Using proven components such as SQL Server, Spark, and Kubernetes provides massive power and scale. To add power to the cluster, just add nodes to the cluster. This complements the clustered scale-out architecture of PowerStore, making PowerStore a natural choice for this solution.
Deployment
Building out a big data environment typically requires defining a stack of products that provide the capabilities that are required. It also involves configuring multiple components such as Apache® Hadoop® and Spark®, and selecting and installing monitoring and analytical components. SQL Server 2019 Big Data Clusters simplifies a complex deployment process. Using a containerized architecture on the Kubernetes platform can simplify deployment, since Kubernetes manages networking, resiliency, and load balancing. The SQL Server 2019 BDC installation tools enable deploying an entire BDC cluster on Kubernetes with a single command. The capabilities of PowerStore X models and Ansible support allow automated deployment of Kubernetes clusters.
Data movement and external data sources
Typically, in big data and data analytics environments, data must be prepared for analysis. Often, this preparation includes data extraction, transformation, and load (ETL) processes in a separate data store.These processes can be expensive and time consuming in terms of development, maintenance, and administration. The capabilities of SQL Server 2019 Big Data Clusters enable choice in how to analyze data and access data with expanded PolyBase capabilities. Big Data Clusters can be used as a data store, but they can also be used to analyze data where it resides. This data could reside in existing relational
databases, Hadoop clusters, or unstructured storage. This BDC capability enables scaling compute and storage separately, horizontally, and dynamically. With the abilities of PowerStore X models, hosts running external data sources can be placed on the same cluster to optimize performance.
A demo, showing PowerStore, hosting Microsoft SQL server 2019 BDC, can be seen below
And the Reference Architecture, can be downloaded, by clicking the screenshot below