Teeling Cloud Courant

making hybrid cloud a little less cloudy

Volume 1, Issue 1 St.Louis, MO

Teeling Cloud Courant is a running commentary on hybrid cloud reality — especially Microsoft Azure Local, Azure Arc, and the connective tissue between on‑premises and cloud. I focus on these topics because they sit at the intersection of architecture, operations, and real‑world constraints — the place where Microsoft's intentions meet what customers can actually deploy. While much of the content centers on Microsoft's hybrid ecosystem, TCC isn't limited to a single vendor. Hybrid cloud is bigger than any one platform, and the publication occasionally features broader perspectives and guest contributors. If you work with hybrid cloud, Azure Local, or modern datacenter design, this publication keeps you current without drowning you in noise.

Latest Articles
Article 1 of 3
Feb 17, 2026

When Azure Local's Reserved IP Ranges Collide With Your Network

A recovery path when your datacenter network overlaps with Microsoft's reserved ranges

If you're already mid-deployment or can't renumber your network, there's a supported workaround that most customers don't know about.

Article 2 of 3
Feb 3, 2026

Making the Case for 100 GbE Networking with Azure Local

Why dense NVMe nodes change the networking conversation sooner than most designs expect

As NVMe becomes the default choice for modern hyperconverged infrastructure, one question comes up repeatedly in both Storage Spaces Direct (S2D) and Azure Local designs: When does 100 GbE networking actually make sense?

For many years, 10 GbE — and later 25 GbE — was sufficient for most storage workloads. Even today, dual‑port 25 GbE designs are often considered "more than enough." But once you start building dense all‑NVMe nodes, those assumptions begin to break down.

Article 3 of 3
Feb 3, 2026

Azure Local Support Deadlines and Stretched Clusters

Critical deadlines for 22H2 and 23H2 environments

Still on 22H2? Running a stretched cluster? Or sitting on 23H2 hoping it buys you time?

This is your wake‑up call — with clear facts and supported directions, not "just wait." Azure Local's 22H2 and 23H2 releases are on borrowed time. 22H2 is already out of support, and 23H2 — including stretched clusters — stops receiving security and quality updates after April 2026.

Back to Articles

Making the Case for 100 GbE Networking with Azure Local and Storage Spaces Direct

Why dense NVMe nodes change the networking conversation sooner than most designs expect

As NVMe becomes the default choice for modern hyperconverged infrastructure, one question comes up repeatedly in both Storage Spaces Direct (S2D) and Azure Local designs: When does 100 GbE networking actually make sense? For many years, 10 GbE — and later 25 GbE — was sufficient for most storage workloads. Even today, dual‑port 25 GbE designs are often considered "more than enough." But once you start building dense all‑NVMe nodes, those assumptions begin to break down. This article explains why 100 GbE becomes justified earlier than expected in all‑NVMe S2D and Azure Local deployments, why roughly 12 NVMe drives per node is a practical, conservative inflection point, how the economics of 4 × 25 GbE versus 2 × 100 GbE work out in real designs, and why the same logic applies across vendors like Dell and Cisco.

Design Assumptions (Made Explicit)

Before going further, it's important to anchor the discussion in realistic design practice:

No serious production S2D or Azure Local design uses a single storage network port. All scenarios discussed here assume:

  • At least two network ports per node
  • SMB Multichannel enabled
  • SMB Direct (RDMA) in use
  • A baseline of 2 × 25 GbE networking for storage/compute traffic

This reflects real‑world Azure Stack HCI and Azure Local designs and avoids straw‑man comparisons like "100 GbE vs a single 10 GbE link."

How NVMe Moves the Bottleneck to the Network

A single enterprise NVMe drive is capable of multiple gigabytes per second of throughput. When a node contains 12, 16, or 24 NVMe drives, the aggregate storage capability becomes enormous. In Storage Spaces Direct, all east‑west storage traffic — including mirrored writes — flows over the network using SMB 3 with RDMA. Azure Local uses the same underlying S2D mechanisms for its local storage.

As NVMe density increases, the limiting factor stops being the media and quickly becomes available network bandwidth. At that point:

  • Adding more NVMe increases peak potential
  • But does not improve realized performance unless the network scales with it
  • If the network doesn't keep up, you effectively pay for NVMe performance that never reaches the workload
What VMFleet Tells Us About Real‑World Limits

VMFleet, Microsoft's official S2D performance and validation tool, is designed to generate highly parallel I/O, stress storage, CPU, and network simultaneously, and expose architectural bottlenecks rather than idealized peaks. In all‑NVMe S2D clusters, VMFleet runs consistently show that nodes can sustain well over 5 GB/s of storage throughput, network links saturate before NVMe does, and scaling flattens once aggregate NIC bandwidth is consumed. This behavior appears even when SMB Multichannel is active and multiple NICs are available. Azure Local, running on the same S2D foundations, exhibits the same pattern when you push it with dense NVMe and realistic I/O mixes.

Dual‑Port 25 GbE: Good, But Not a 50 GbE Pipe

With 2 × 25 GbE, SMB Multichannel allows traffic to flow across both links concurrently. This meaningfully increases available bandwidth compared to a single port, but it's not a perfect doubling.

It's crucial to understand what SMB Multichannel does — and does not — provide:

  • It balances connections, not individual I/O operations
  • Storage traffic includes replica writes, metadata, and control flows
  • Mirror traffic competes with client I/O for the same NICs
  • Utilization across ports is rarely perfectly even under sustained load

As a result, dual‑port 25 GbE does not behave like a single flat 50 GbE pipe for storage workloads.

That's a substantial improvement over a single link — but it's also a ceiling that dense NVMe nodes can hit surprisingly quickly.

Where NVMe Density Meets the Network Ceiling

Once we assume 2 × 25 GbE as the baseline, a clear pattern emerges:

  • NVMe drives per node: ≤ 8 — Network rarely limits performance
  • NVMe drives per node: ~12 — Network pressure becomes visible under sustained load
  • NVMe drives per node: 16+ — Network is consistently the bottleneck in stress scenarios

At roughly 12 NVMe drives per node, the node is often capable of driving more throughput than dual‑port 25 GbE can sustain under continuous load. Beyond that point, performance gains from additional NVMe are increasingly constrained by the network. This is the point where 100 GbE stops being "nice to have" and becomes architecturally justified.

Why 100 GbE Changes the Equation

Moving to 100 GbE does more than increase peak bandwidth:

  • It removes the network as the primary limiter
  • It improves performance consistency under contention
  • It provides headroom for growth and future workloads
  • It avoids costly retrofits later (re‑cabling, new switches, downtime)

With dual‑port 100 GbE:

  • Aggregate bandwidth is high enough that storage or CPU once again become the limiting factors
  • NVMe density can scale without immediately flattening performance
  • The system behaves as customers expect when they invest in NVMe: adding drives actually results in more usable performance

This is why many all‑NVMe S2D and Azure Local reference architectures move directly to 100 GbE rather than trying to "stretch" 25 GbE with additional ports.

"The hybrid cloud is no longer a compromise—it's the optimal architecture for modern business."

— Core principle in modern HCI design

The Real Design Choice: 4 × 25 GbE vs 2 × 100 GbE

Once you accept that 2 × 25 GbE will not be enough for roughly 12–16 NVMe per node under sustained load, the design conversation changes:

You are no longer choosing between 25 GbE vs 100 GbE. You are choosing between 4 × 25 GbE and 2 × 100 GbE.

Option A: 4 × 25 GbE per node

  • Doubles the NIC ports per server (more NICs or higher‑port adapters)
  • Consumes 4 TOR ports per node
  • Requires twice as many DACs/optics
  • Increases cabling complexity and configuration overhead

Option B: 2 × 100 GbE per node

  • Keeps you at 2 TOR ports per node
  • Uses half the cables
  • Delivers 2× the aggregate bandwidth of a 4 × 25 GbE (100 Gbit/s) design
  • Simplifies operations and leaves more free ports for growth

Even if a 100G optic or port costs more per unit than a 25G equivalent, multiplying the 25G option by four ports per node (and four optics) erodes that apparent savings very quickly. In real designs, the total fabric cost for 4 × 25 GbE versus 2 × 100 GbE frequently lands in the same ballpark, especially once you include support and cabling. In some cases, the 100 GbE option is actually slightly cheaper, while delivering more per‑node bandwidth, fewer TOR ports consumed, fewer points of failure, and a cleaner topology.

A Simple, Defensible Rule of Thumb

If you want a guideline that's easy to communicate and hard to argue with, use this: If a Storage Spaces Direct or Azure Local node can sustain more than ~5 GB/s of storage throughput, 100 GbE networking is justified. Dense NVMe nodes reach that line quickly — often before they hit 16 drives per system. Equivalently: If you are planning for roughly 12 or more NVMe drives per node, design for 100 GbE from day one. Don't assume 2 × 25 GbE will scale with you. This is deliberately conservative. It doesn't require exotic workloads, perfect tuning, or lab‑only conditions. It's based on what real VMFleet tests and production systems actually do under load.

Conclusion

All‑NVMe Storage Spaces Direct and Azure Local fundamentally change where performance bottlenecks appear. Even with SMB Multichannel and dual‑port networking, dense NVMe nodes can saturate available bandwidth earlier than many designs anticipate. Recommending 100 GbE networking for systems with more than about 12 NVMe drives per node is conservative rather than aggressive, architecturally sound based on real tools and behavior, aligned with the economics of 4 × 25 GbE vs 2 × 100 GbE fabrics, and focused on delivering usable performance, not just impressive component specs. If you're investing in dense NVMe, the network needs to keep up — or it will define your limits. In that world, 100 GbE is not a luxury upgrade; it's the rational default once you pass a certain NVMe density.

Back to Articles
Back to Articles

Azure Local Support Deadlines, Stretched Clusters, and Rack‑Aware

What You Need to Do Before April 2026

Still on 22H2? Running a stretched cluster? Or sitting on 23H2 hoping it buys you time? This is your wake‑up call — with clear facts and supported directions, not "just wait."

Why April 2026 Matters

Azure Local (formerly Azure Stack HCI) has moved away from the old annual 22H2 / 23H2 cadence to a monthly release train with versions like 2507, 2508, 2509, and so on. These monthly releases align to a specific OS build (for example, 24H2 = 26100.xxxx) and follow Microsoft's Modern Lifecycle model, which requires customers to stay within six months of the latest release to remain supported. That change is good for features and fixes — but it also means older H2 releases have a hard runway.

So there are three overlapping issues: 1. You're still on 22H2 (possibly with stretched clusters). 2. You're on 23H2, but haven't moved to 24H2 and risk falling out of support in April 2026. 3. You're using stretched clusters and need to understand what replaces them. This post covers all three — at the support and architecture boundary level.

Baseline Assumptions

To keep this grounded:

  • Production environments run on hardware validated and supported for Azure Local, deployed in supported configurations.
  • There is an active Azure support plan that allows Azure Local technical cases. At minimum, this requires Azure Standard, Professional Direct, or Developer support. The Basic plan is not sufficient.
  • The discussion focuses strictly on Azure Local OS and solution versions that Microsoft explicitly documents and supports — namely the transition from 22H2 → 23H2 → 24H2 (26100.xxxx) under the current monthly release model.
Scenario 1: Still on 22H2 — You're Already Out of Support

If you're still on Azure Local / Azure Stack HCI 22H2 (20349.xxxx) today:

  • 22H2 reached end of support on May 31, 2025
  • Monthly security and quality updates have stopped

Remaining on 22H2 means:

  • No new security fixes
  • Increasing compatibility risk with drivers, agents, and integrations
  • Any support case starts with: "upgrade first"
The good news: a direct path to 24H2 Originally, the supported upgrade path was: 22H2 → 23H2 → 24H2 That has changed. Starting with the 2505 release, Microsoft introduced a direct OS upgrade path from 22H2 (20349.xxxx) → 24H2 (26100.xxxx). This skips the 23H2 hop and reduces reboots and maintenance windows. Microsoft explicitly recommends taking this direct path where applicable.

If you're on 22H2:

  • Plan a direct OS upgrade to 24H2 (26100.xxxx)
  • Complete the Azure Local solution upgrade so your solution version aligns with the 24H2 OS and places you on the monthly train

That restores:

  • A supported OS baseline
  • Monthly security and quality updates
  • A future‑proof lifecycle posture

"22H2 is already out of support. Don't wait for April 2026 — upgrade now."

— Microsoft Support guidance

Scenario 2: On 23H2 — Don't Fall Off in April 2026

If you're running 23H2 OS (25398.xxxx), you're in better shape — but only temporarily. 11.2510 (October 2025) was the final 23H2 solution release. All 23H2 configurations are supported only until April 2026 After April 2026:

  • 23H2 receives no monthly updates
  • Support cases will require upgrading before troubleshooting continues

Case A: 23H2 OS with the Azure Local solution installed

If the solution is already installed:

  • Solution updates are mandatory, not optional
  • They are the mechanism that moves you onto 24H2 (26100.xxxx)

Azure Local follows a Modern Lifecycle cadence: you must stay within six months of the most recent release to remain supported.

Case B: 23H2 OS only (solution not completed)

Some environments upgraded the OS but never finished the solution upgrade.

Microsoft explicitly treats this as a temporary state and expects you to:

  • Complete post‑OS upgrade tasks
  • Validate solution readiness
  • Apply the solution upgrade

Sitting in OS‑only 23H2 as April 2026 approaches is not a safe long‑term position.

Microsoft's Special Case: Stretched Clusters

Microsoft's generic OS‑upgrade guidance includes this statement: "There is an exception to the preceding recommendation if you are using stretch clusters. Stretch clusters should wait to directly move to version 26100.xxxx (24H2)." This sentence is easy to misread. Context for stretched clusters: You have effectively been on borrowed time. Microsoft accommodated stretched clusters on 23H2 for customers that already implemented them on 22H2, but that accommodation ends when 23H2 support ends in April 2026. In practical terms:

  • There is no supported in‑place upgrade path for stretched clusters to 24H2
  • After April 2026, stretched clusters are unsupported

Your next supported state must be a non‑stretched 24H2 design, delivered through deployment and migration, not in‑place conversion.

Rack‑Aware vs. Stretched: The 1 ms Line

Rack‑aware clustering introduces a strict architectural constraint:

≤ 1 millisecond round‑trip latency between racks or zones

Rack‑aware clustering is intended for:

  • A single site
  • Fault domains within the same datacenter (racks or rooms), not distant locations

Rule of thumb:

  • < 1 ms latency → Rack‑aware clustering is supported
  • > 1 ms latency → Rack‑aware clustering is not supported

New deployments only

Rack‑aware clustering is supported only for new deployments. Converting an existing standard or stretched cluster in place to rack‑aware is not supported. Migration requires a side‑by‑side deployment and workload move. If your current "stretched" design is actually two racks or rooms in the same datacenter with sub‑millisecond latency, rack‑aware clustering is often the simpler and more future‑proof replacement. If your sites are truly separate, rack‑aware is not an option.

Azure Local‑Managed VMs and the Arc Resource Bridge Boundary

When changing cluster topology, VM type matters.

Plain Hyper‑V VMs

  • Managed via Hyper‑V, Windows Admin Center, or SCVMM
  • Can be replicated, migrated, backed up, and restored between clusters
  • Remain plain Hyper‑V VMs on the destination

Azure Local‑managed VMs

  • Created and managed through Azure via the Azure Local solution
  • Their Azure identity is tied to a specific Arc Resource Bridge and custom location

If you move or replicate the underlying VM to another cluster:

  • It appears there as a plain Hyper‑V VM
  • There is currently no supported mechanism to preserve or transfer its Azure Local VM identity across clusters

You can still deploy the Azure Connected Machine (Arc) agent in the guest OS for OS‑level management, but that does not reconstitute it as an Azure Local‑managed VM. This is a design decision, not a migration footnote.

Pulling It Together: A Practical Action Plan

If you're on 22H2

Accept that you're already out of support. Use the direct 22H2 → 24H2 OS upgrade. Complete the Azure Local solution upgrade.

If you're on 23H2

Treat April 2026 as a real deadline. Keep taking solution updates to land on 24H2 (26100.xxxx). If you're OS‑only, finish the solution upgrade now.

If you're running a stretched cluster

Recognize that stretched clusters are supported only through April 2026. Plan for a non‑stretched 24H2 architecture. Expect this to involve deployment and migration, not conversion.

One‑Paragraph Summary for Stakeholders

Azure Local's 22H2 and 23H2 releases are on borrowed time. 22H2 is already out of support, and 23H2 — including stretched clusters — stops receiving security and quality updates after April 2026. Microsoft now operates Azure Local on a monthly release train based on the 24H2 OS (26100.x). To remain supported, environments must move to 24H2 and the current solution train. This transition is also the point at which stretched cluster designs must be replaced with supported non‑stretched architectures, such as rack‑aware clusters for low‑latency single‑site deployments or independent clusters with replication for multi‑site scenarios.

Back to Articles
Back to Articles

When Azure Local's Reserved IP Ranges Collide With Your Network

A recovery path when your datacenter network overlaps with Microsoft's reserved ranges

Microsoft's documentation for Azure Local includes a strict requirement:

Two IP ranges must not be used anywhere in your Azure Local deployment:

  • 10.96.0.0/12 — Reserved for Kubernetes services
  • 10.244.0.0/16 — Reserved for Kubernetes pod networking

These ranges are reserved for internal Azure Local and AKS Arc components. If your existing datacenter network uses any part of these blocks — and many do — deployment can fail or behave unpredictably.

This is especially common with 10.244.0.0/16, which appears in many legacy Kubernetes clusters, lab environments, and inherited network designs.

Microsoft's guidance is simple: don't use these ranges.

But real customer networks are rarely simple.

  • Sometimes you inherit a network.
  • Sometimes you're mid‑migration.
  • Sometimes you discover the overlap only after Azure Local deployment has already begun.

Fortunately, there is a clean, repeatable workaround.

The Workaround: Deploy First, Then Re‑Create the Arc Resource Bridge

Azure Local enforces reserved‑range validation during deployment. But Arc Resource Bridge (ARB) — the component that hosts AKS on Azure Local — can be safely deleted and re‑created afterward.

This gives you a path forward:

  1. Deploy Azure Local using the IPs you want, even if they overlap with Microsoft's reserved ranges.
  2. Delete the automatically created AKS Arc instance.
  3. Re‑create it manually with your own service CIDR, pod CIDR, and DNS service IP.

This allows you to avoid the reserved ranges and avoid redesigning your entire network.

Step 1 — Delete the Default AKS Arc Instance

Start by removing the Arc Resource Bridge that was automatically created during Azure Local setup.

Using PowerShell or the Azure CLI:

Remove-AksHci -ArcResource -Force

Or via Azure CLI for Arc-managed resources. The operation will clean up the associated Kubernetes cluster and all its networking configurations.

Step 2 — Re‑Create AKS Arc With Your Own IP Ranges

Once removed, you can deploy a new AKS Arc instance with custom networking parameters:

New-AksHci -Name  `
  -ServiceCidr  `
  -PodCidr  `
  -DnsServiceIP 

Example with custom ranges:

New-AksHci -Name azlocal-aks `
  -ServiceCidr 172.20.0.0/16 `
  -PodCidr 172.21.0.0/16 `
  -DnsServiceIP 172.20.0.10

Replace the ranges with whatever fits your environment, as long as they don't collide with your existing infrastructure.

Important Notes

The DNS service IP must always end in .10 — The specific octet is non‑negotiable for AKS Arc.

The service CIDR and pod CIDR can be anything that fits your environment, as long as they don't overlap with the rest of your network.

This process modifies only the ARB/AKS layer — not your Azure Local deployment itself. Your VM hosts, storage, and fabric remain unchanged.

Why This Works

Azure Local enforces reserved ranges during deployment because ARB and AKS rely on them internally. But once the system is deployed, you can override the Kubernetes networking configuration by re‑creating the AKS Arc instance.

This gives customers with constrained or inherited networks a viable path forward without renumbering DNS servers, proxies, gateways, or entire IP blocks.

It's not documented. It's not obvious. But it's fully supported because you're using standard lifecycle operations.

When This Workaround Makes Sense

Use this approach when:

  • Your existing network uses 10.96.0.0/12 or 10.244.0.0/16.
  • You cannot renumber critical infrastructure.
  • You want to deploy Azure Local now, not after a long network redesign.
  • You need AKS Arc to run with custom service/pod CIDRs.

It's also a great recovery path when the initial deployment fails due to IP conflicts and you want to avoid tearing everything down.

The Bottom Line

Azure Local's reserved IP ranges are strict — but they don't have to be blockers.

By deploying first and re‑creating the Arc Resource Bridge with your own CIDRs, you can run Azure Local in networks that would otherwise be incompatible.

This is one of those real‑world hybrid‑cloud challenges that customers hit all the time. Now you've got a clean, repeatable way to handle it.

Back to Articles