Reference index for architectural validation, platform evaluation, and specifications.
Modern enterprise infrastructure demands storage engines capable of keeping pace with high-velocity cloud-native workloads, automating multi-site disaster recovery protocols, and delivering bare-metal physical performance without locking data into closed, proprietary storage arrays. QuasarSDS is a Kubernetes-native, software-defined storage (SDS) platform designed precisely to satisfy these three fundamental requirements.
By eliminating traditional structural trade-offs between speed and administrative flexibility, QuasarSDS delivers sub-millisecond P99 latency and millions of concurrent IOPS utilizing native NVMe over Fabrics (NVMe-oF TCP). It runs entirely on commodity x86/ARM hardware while integrating seamlessly with production orchestration layers via a standard Container Storage Interface (CSI) driver.
Beyond raw, physical IOPS performance, QuasarSDS provides a unified and secure data lifecycle pipeline. This comprises synchronous replication for absolute high availability (RPO=0), asynchronous deduplicated backups directly to Object Storage (S3) for optimized disaster recovery, and an incremental, zero-downtime cloud migration path to AWS with a native bidirectional fallback option.
Infrastructure engineers and enterprise IT organizations currently face several converging operational bottlenecks. Traditional storage options fail to satisfy modern cloud-native standards across multiple vectors:
| Operational Vector | Legacy Approach | Inherent Architectural Problem |
|---|---|---|
| High Availability | Proprietary hardware-based SAN/NAS controllers. | Extreme CAPEX, complex custom fabrics, and heavy vendor lock-in. |
| Kubernetes Storage | Generic CSI plugins mapping network file systems or standard cloud volumes. | Heavy kernel I/O taxes, severe IOPS degradation, and high inter-zone traffic fees. |
| Disaster Recovery | Periodic hypervisor snapshots or traditional file-level scheduled backups. | Hours of data loss (high RPO), complex multi-step recovery runbooks, and massive manual overhead. |
| Cloud Mobility | Block-level off-line disk copies or heavy application-level network migrations. | Extensive production maintenance windows and high risks of data corruption during transit. |
| FinOps Efficiency | Pre-purchased, over-provisioned local physical storage arrays. | Massive capital tied to idle capacity and unpredictable expansion costs. |
QuasarSDS replaces these disjointed mechanisms with a single unified, lightweight, and programmable storage plane.
QuasarSDS is a open-architecture, Kubernetes-native software-defined storage (SDS) platform that aggregates local NVMe physical SSDs into a high-performance, distributed storage fabric. The platform is managed entirely through standard declarative APIs, making it natively compatible with GitOps automation pipelines.
Storage targets are presented directly to the node environment via two key interfaces:
The architectural foundation of QuasarSDS enforces a strict separation of concerns by decoupling the control plane from the active data plane. The platform is comprised of four lightweight, containerized components:
| Component | Role & Operational Scope | Deployment Model |
|---|---|---|
| Agent | Direct physical disk management, SPDK user-space initialization, replication logic, and backup streaming. | Kubernetes DaemonSet on storage provider nodes. |
| Controller | Global orchestration operator, state management, and cluster-wide volume provisioning. | High-Availability Deployment. |
| Witness | Stateless consensus arbiter used to prevent split-brain states in HA failover events. | 3-Replica Deployment. |
| CSI Driver | Implements standard CSI specifications to map, mount, and manage lifecycle events of volumes. | Kubernetes DaemonSet across worker nodes. |
The data plane executes entirely in User-Space utilizing the Storage Performance Development Kit (SPDK). By bypassing the Linux kernel's storage layers, interrupt handlers, and context switches, QuasarSDS achieves bare-metal efficiency, routing data directly from applications to physical NVMe media via polling-mode drivers.
To guide IT decision-makers, CTOs, and Infrastructure Architects, the following production blueprints demonstrate how QuasarSDS solves real-world bottlenecks, ranging from state-of-the-art AI clusters to legacy physical storage array modernization.
The Challenge: Legacy storage architectures like shared NAS (NFS/SMB) and Object Storage (S3/HTTP) are highly unsuited for active AI model training pipelines. File systems (NAS) introduce massive kernel translation overhead and lock contention when thousands of training workers attempt to access multi-gigabyte datasets concurrently. Object storage (S3) lacks low-latency random-access block operations, forcing engines to download full objects or deal with highly latent API parsing, leading to severe GPU Starvation and idle compute cycles.
The QuasarSDS Block-Level NVMe-oF Solution: QuasarSDS bypasses the limitations of both File and Object storage by delivering high-performance, raw storage at the Block Level consumed via the optimized NVMe over Fabrics (NVMe-oF TCP) protocol.
<150 µs to prevent GPU idle cycles.The Challenge: Relational and non-relational enterprise databases (PostgreSQL, MongoDB, Cassandra) require ultra-high transactions per second (TPS) and predictable latencies in the cloud, but traditional cloud storage limits IOPS or charges prohibitive premium fees.
The QuasarSDS Solution:
The Challenge: Organizations hold legacy, highly capital-intensive SAN/NAS storage arrays that cannot communicate natively with Kubernetes or fail to scale for dynamic microservice workloads, but cannot yet write off or retire this hardware due to amortization cycles.
The QuasarSDS Solution:
The Challenge: High data egress fees, vendor lock-in, and static cloud storage performance tiers severely restrict hybrid cloud architectures and drive up infrastructure operational costs.
The QuasarSDS Solution:
To ensure maximum durability, QuasarSDS replicates block write commands synchronously across multiple physical nodes using a software RAID-1 mechanism:
Because write operations are committed in a blocking, synchronous fashion, the standby secondary replica remains perfectly identical to the active primary replica at any given CPU instruction, maintaining a Recovery Point Objective (RPO) of absolute zero.
In the event of an active primary node outage, the surviving components automatically trigger a failover sequence:
T+0s Primary storage node experiences unexpected hardware failure.
T+6s gRPC heartbeat monitoring triggers a timeout detection on the standby node.
T+6s Quorum consensus sequence initiated. Standby requests Witness vote.
T+6s Witness evaluates etcd state via optimistic lock. Vote authorized.
T+6s Secondary replica promoted to active Primary role.
T+7s CSI clients execute path redirection via NVMe-oF Asymmetric Namespace Access (ANA).
The application workload resumes I/O operations in approximately 6 seconds (RTO). The redirection occurs natively at the NVMe protocol layer, ensuring zero application crashes, zero required pod restarts, and zero configuration changes.
To prevent network partitions from creating two active primary writers, QuasarSDS enforces strict quorum consensus requiring at least 2 out of 3 active votes. The three voters are the dos storage replica nodes plus the stateless Witness service. The Witness interacts with the Kubernetes control plane's `etcd` database using optimistic concurrency controls (`resourceVersion`), preventing race conditions in complex failover scenarios.
Rather than relying on resource-intensive VM snapshots, QuasarSDS streams changes continuously to any S3-compatible object store (Amazon S3, MinIO, Ceph RGW) using a Full + Incremental Backup Chain:
The declarative `SDSDRVolume` Custom Resource continuously monitors the target S3 catalog, retrieving, decompressing (via `zstd`), and applying incremental blocks to a remote disaster recovery volume. This architecture supports 1-to-Many replication topologies, allowing multiple geographical targets to maintain read-only standby copies of a single production volume with minimal overhead.
The DR controller continuously measures and exposes the `lagSeconds` metric (the age of the newest unapplied S3 block at the destination). These metrics are natively exposed via a Prometheus exporter, enabling operations teams to monitor real-time compliance with business SLAs on unified dashboards.
The decoupled architecture of QuasarSDS enables seamless block-level data mobility across hybrid cloud topologies without requiring complex application re-architecting.
Migrations are performed online utilizing a non-disruptive, multi-phase hydration model:
Unlike tools that trap workloads in the public cloud, QuasarSDS maintains an incremental, bidirectional data movement reverse path. Backups can stream from the newly active AWS volume back to S3, allowing the on-premises site to act as a disaster recovery destination. If rollback is required, only the incremental blocks written during the cloud execution phase are copied back, minimizing egress costs and downtime.
By eliminating intermediate file systems, block mapping abstractions, and operating system kernel contexts, QuasarSDS delivers extreme performance directly to containerized workloads.
| Performance Metric | QuasarSDS Engine (SPDK) | Standard Enterprise SAN / NAS |
|---|---|---|
| Sequential Read/Write Throughput | > 12.5 GB/s | 1.5 – 3.2 GB/s |
| Random 4K Read/Write IOPS | > 1,200,000 IOPS | 80,000 – 250,000 IOPS |
| P99 Transaction Latency | < 150 µs | 1.2 – 4.5 ms |
Provisions a synchronous, high-availability, double-replicated storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: quasarsds-high-availability
provisioner: quasarsds.io
parameters:
poolName: nvme-pool-direct
strategy: RF2 # Synchronous two-way mirroring
reclaimPolicy: Delete
volumeBindingMode: Immediate
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-primary-pvc
spec:
storageClassName: quasarsds-high-availability
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 250Gi
To guarantee complete, bit-perfect data integrity, QuasarSDS performs full cryptographic block-level verification at the completion of every block sync or cloud hydration loop. This process verifies that the source and target physical disks are mathematically identical.
This validation can be executed and audited transparently via the qsdsctl command-line utility:
$ qsdsctl verify --source local-spdk --target aws-ebs
[INFO] S3 Catalog synchronization complete.
[INFO] Running block-level cryptographic verification...
✓ Source Volume MD5: 4a8e9b10c23d4e5f6a7b8c9d0e1f2a3b
✓ Target AWS EBS MD5: 4a8e9b10c23d4e5f6a7b8c9d0e1f2a3b
[SUCCESS] 100% Data Integrity Verified. Bit-by-bit identical copy.
| Technical Architecture Metric | Standard Supported Specification Value |
|---|---|
| Replication Methods | Synchronous multi-node mirroring (RF1, RF2, RF3) |
| Target RPO / RTO Bounds | Zero (0) RPO / ~6 Seconds Automated Cluster Failover |
| Transport Storage Protocol | NVMe over Fabrics TCP (NVMe-oF TCP) standard spec |
| Backup Formats | Raw blocks guided by logical JSON catalogs, compressed via `zstd` |
| Active Data Storage Core | Intel Storage Performance Development Kit (SPDK) in User-Space |
| Supported Linux Platforms | Red Hat Enterprise Linux (RHEL 9+), Ubuntu Server 22.04 LTS+ |
| Provisioning Abstractions | Kubernetes Custom Resource Definitions (CRDs), fully GitOps-ready |