Article
Jan 27, 2026

8 Lessons in Edge Computing from The Home Depot

The Home Depot deployed Kubernetes to manage their connected retail environment across 2,300+ stores. Here are 8 lessons from building resilient edge computing.

Retail

The Home Depot (“THD”) has accomplished one of big box retail's most ambitious edge computing deployments. And two months ago at KubeCon + CloudNativeCon North America 2025, , THD’s Distinguished Engineer Dillon TenBrink took the stage to tell all about it. 

Over TenBrink’s 14+ years with THD, he’s led the design and deployment of a Kubernetes-powered edge platform, a platform that now runs across every store in North America for building products giant The Home Depot. The edge computing platform processes 5.5 billion documents per month and enables 4-hour chain-wide deployments.

TenBrink’s work at The Home Depot is a technical success story full of lessons about timing and constraints, resilience and the inevitable challenges of operating at the edge. 

TenBrink recently recounted 8 lessons learned as his team designed and deployed THD’s approach to edge computing. He shared them as part of a presentation at last Fall’s KubeCon conference in Atlanta. 

What follows is a written exposition of TenBrink’s presentation. 

(For the entire presentation, scroll to the end. You can also check in on the work of TenBrink via EdgeMonsters.dev.)

The intro screen from Dillon TenBrink's presentation at KubeCon titled "Flip That Stack" (YouTube)

Backstory

Prior to deploying their current edge computing architecture, THD’s approach to retail tech had powered stores for nearly two and a half decades. “That platform was built in a time where connectivity was on the end of really slow leased lines, maybe T1s, things that were not terribly reliable," TenBrink explained. "Plus, the retail experience wasn't anything like it was today. You had parts and tools in stores, but there was no online experience, no digital manifestation of the store."

Their old edge computing approach reflected these constraints: authoritative data like inventory lived in the store and was synchronized back to central systems through [[batch processes]]. This worked as it should for as long as it could, even as the retail experience moved from in-store to online to everything in-between. Overhauling The Home Depot’s edge computing architecture took multiple years, and TenBrink learned much along the way, including these eight lessons.

The Home Depot's prior edge computing platform had been in place for over 20 years.

Lesson 1: Understand Your Window of Transformation

Timing is everything. "You don't get this opportunity every day to go build a new edge platform," he said. For The Home Depot, that window opened about ten years ago (2015-2016), driven by three converging forces.

The first force was “interconnected retail” (sometimes also referred to as “omnichannel retail”), which shifted the aforementioned store-centric model to selling in-store, online (delivery), and buy online, pickup in-store (BOPIS). Customers even began to expect same-day delivery to their doorstep and job site delivery. All of these channels had to be integrated.

Next came the [[agile transformation]] where developers wanted to ship features quickly. Before, there was a three-month deployment cycle based on traditional rollout schedules. According to TenBrink, "That just was not keeping up with the pace of our business.”

Finally (and fortunately), when TenBrink and his team looked at hardware, software, operating systems, and database refresh roadmaps, they found that these items were all coming up for refresh around 2019. "We looked at the road maps and we aligned them all and said, 'Wow, there's a really interesting opportunity right here in which nearly everything in the store is coming up for refresh. So what could we do? We could take a real blue sky approach.'"

Core Design Principles

In 2017, TenBrink and his team sketched their core design ideas on a whiteboard. They aimed for their edge computing system to be “A unique expression … focused on autonomous resiliency, services, and speed to value.”

TenBrink recalls whiteboarding these four design ideas around resilience, service management, deploying applications, and supporting modern data services.

Resilient Core Platform

The stores remain the hub of retail operations, which required the edge computing platform to be stable and fit within the existing footprint. There could be no massive capacity additions or “infinite compute.”

Service Management

Meeting the needs of application developers meant shipping software more quickly and with greater agility.

Application Deployment

The new system needed to be easy and quick to deploy and able to “activate chain-wide in minutes.”

Modern Data Services

With a stated aim to “Make it easy for applications to consume, divide, and forward data across many locations,” the team had to architect how data flowed to and from the edge.

With these principles in mind, TenBrink and team had to work within the inherent constraints at the edge.

Lesson 2: Design for the Constraints of the Edge

TenBrink noted that, "The edge is not the cloud.” The team would have to work within the constraints of a store’s limitations: "Just like your home build … There's limited square footage available on your lot. You can only build a foundation so big. It's defined by terrain and conditions and the environment. That's true of our edge environment as well."

In a Home Depot store, there's no data closet or environmental controls. Whereas on the cloud you can infinitely expand capacity, the edge (a THD store) is constrained. Edge presence was a required condition for THD with each store able to run autonomously, even when disconnected from the network or cloud. That way, each THD could continue serving communities (even during disasters when customers need them most).

With constraints understood, the team needed to build their first service: a [[container runtime]]. So they built it on [[Kubernetes]]. TenBrink shared, “It wasn't so much about Kube itself. It was how do we create the conditions for Kube to succeed out [at the] edge."

Composability means separating image from configuration from platform base, then pulling these together through a build orchestration program on demand.

Over the course of building and operating their edge platform, they've changed the core [[Kubernetes distribution]] three times. Each time, they were able to drop in the new distribution without rebuilding their entire build pipeline.

Composability also made it possible for THD to build clusters elsewhere. "I could build one for you right now. I could build one tomorrow. I could build 10," TenBrink said, making it possible to spin up testing environments or places to troubleshoot without going to physical stores.

Lesson 3: “Be Eventually Consistent”

"A key consideration when you're building out on the edge, you have to decide to be eventually consistent," TenBrink explained. "We think this is the only successful model for edge operation."

THD’s heritage model pushed applications from a central location and batch-reconciled them back. The team faced an architectural decision regarding how to deploy applications to 2,000+ clusters. A [[push-based model]] forces you to manage all the complexities: [[retry logic]], what do you do when stores are down, how to work across different time zones, how to adapt to varying network conditions, etc. "[These complexities] just become a tree of failure to have to manage."

The new THD edge computing system built by TenBrink and team was designed to be resilient, despite constraints (like connectivity) at the edge.

By embracing TenBrink’s concept of eventual consistency, everything on the edge pulls from a centralized repository. THD settled on a [[GitOps]] and [[Flux]]-based model for pulling configurations.

Lesson 4: Remove Toil from Your Devs

Scaling the model to over 2,000 sites created new challenges. TenBrink pointed out that having developers write configuration [[YAML]] for 2,300 sites required flexibility, e.g. having V1 go to certain stores, V2 to another region, V3 piloting in 10 Atlanta locations. So the team used a custom templating engine to remove toil from developers. "This is some of the custom code that we had to create when creating this edge platform and it saved our devs from having to manage mountains of YAML."

“Removing toil” became a guiding principle for the TenBrink’s team.

"What is the one thing that we are all trying to do as platform engineers? Remove toil from our devs, make their lives easier. Right? And that is the goal ultimately, one of the two goals of the edge platform: make your dev's lives easier and provide the service to the business that they need."

With containers running successfully on the edge, developers soon wanted access to that data. Developers suggested building their own replication models. After hearing this two or three times, the platform team realized they needed to step in. Applying the concept of toil reduction, they decided to create a data cache service.

A flowchart depicting how data is pushed and pulled at the edge. Jump to ~minute 14 in TenBrink's presentation to hear this in his words.

The Home Depot’s [[Edge Data Cache]] became the first service on the platform. It enabled teams to pull down frequently accessed data from the cloud: pricing data, SKUs, coupons, inventory. These were things that, when looked up at the cash register, would return answers instantly by avoiding a 20-50 millisecond round trip to the cloud (a trip that would be required millions of times daily).

THD’s Edge Data Cache supports offline operation too, which even works during natural disasters:

"If you've ever been in the path of a natural disaster, fire, flood, hurricane, any of those things, you know that The Home Depot is one of the last places that people go when the disaster is impending and one of the first places they go once it has passed so that they can begin to put their lives back together. We cannot wait on network services."

The cache comes with the expectation that it's ephemeral and consistent. In the case where things get out of sync, the team can repopulate a cache in an hour or less.

With data flowing into stores, the next request was inevitable: "How do I get data back out of the store?" [[The Home Depot’s Edge Messaging Service]] prioritizes guaranteed delivery. Unlike cloud [[pub/sub]] systems optimized for throughput, the edge messaging service acts as a guardian of messages until they reach the cloud, necessary when environmental conditions at the edge can be unpredictable.

Getting data from the stores was inevitable, making it possible to know the state of the tech stack across the 2,300 stores at any given time.

Lesson 5: Edge Cases Will Be Found at Scale

The move to cloud-authoritative data introduced new challenges. "When you think about like an edge case that might happen, one in a million chance, right? Maybe one in a million chance the document that's retrieved is wrong. Well, when you're doing tens of millions of queries a day, that's more than I can count on my fingers."

The solution was The Home Depot’s Replication Tracking Service (ECRT - [[Edge Cache Replication Tool]]) after [[SREs]] requested more visibility. By tagging metadata onto the data cache, they could exercise their own cache and messaging pipelines rather than relying solely on customer reports.

The service became a self-service tool for SRE teams to check cache state at any time, which is another example of giving teams the tools to be successful.

Testing at Impossible Scale

To recreate issues in such a massive environment required innovation. The [[Edge Test Environments]] service builds on that composability principle from the container runtime. TenBrink describes it, "We can compose the platform pretty much anywhere. So, it didn't have to be built out at the edge. It could be built in the cloud."

This enabled something the team thought impossible: building 2,300 test environments in the cloud to do real-world scale testing, then tearing them back down. The service became self-service too. Developers can request a test environment built within minutes, offering a realistic environment that matches a physical store environment, a place where developers can iterate on code.

THD developed a system for testing that made it easier for devs to test based on what was happening in the real world, out at the stores. "What's cool about this service is we also turn this into that self-service model. So devs can go out there, they can request a test environment, it's built within a few minutes for them, and it assists them in their own testing and troubleshooting as well. They get something that is tangible. It's real."

Lesson 6: Invest in Observability Early

TenBrink was candid about lesson 6. Having created powerful resources running everywhere, when someone asked: "Are they up or down?" TenBrink realized that they had (not yet) answered that first question from their initial goals, being able to observe the state of the platform in real time as part of their “Service Design” principle.

Once they invested in observability and aligned it with THD's core goals (seeing fleet state at any time), they gained visibility and insights that were previously impossible. The telemetry infrastructure gave them comprehensive, real-time data across all edge locations. The test came two years ago when hurricanes moved through the Southeast. The old disaster recovery process relied on phone calls to a war room tracking store status. Now:

"We opened up a dashboard. We said we can tell you in real time now up to the minute and you could follow the graph on the map of stores going down and up as power failures moved across the southeast and then were recovered. Several times during that particular incident, we as a platform team knew before the operations center even knew that a store was back up and ready for retail."

The platform moved THD to a real-time state, which also created new services. Teams asked if they could observe the heritage platform too. Turns out, with a few extensions, they could. The telemetry and observability service expanded to monitor UPS systems, environmental conditions, and more, "What else can we look at in the stores?" became the new question as they gained more insights day after day.

Observability offered metrics, and from there, the team developed plugins for their internal developer portal giving full insights into edge deployments in a self-service way. 

Developers can see down to each individual change, pod state across the fleet, any location, any environment. "This is an absolutely powerful control plane in the hands of our SRE teams and our platform team and our developer teams," TenBrink said.

Do you live on the edge of endpoint management?

Discover how global brands and businesses manage their connected product ecosystems. Get the inside-scoop from the hidden architects and teams who ensure their remote devices never go down.

Subscribe to the Canopy Insights newsletter.

Do you live on the edge of endpoint management?

Discover how global brands and businesses manage their connected product ecosystems. Get the inside-scoop from the hidden architects and teams who ensure their remote devices never go down.

Subscribe to the Canopy Insights newsletter.

Learn More

Lesson 7: Storage is Still a Challenge

"We've got this great foundation. We've got this platform built. This house is looking fantastic. Green grass in the garden, the flowers are coming up." But some issues persisted. Continuing with his house metaphor, TenBrink noted, "Usually, it's drainage. I don't know why that's a thing. Our drainage problem [for edge computing] was storage.” According to TenBrink, “Our drainage problem is still storage."

"There are tons of storage solutions for Kubernetes. I almost need a sandbox up here on top of the stage. There are tons of storage solutions for Kube, but they're all designed for not edge."

To prove the difficulty of the problem, TenBrink recounted conversations with vendors from KubeCon: "Oh, what cloud are you running in?" TenBrink's response: "I'm not. I'm running on my own boxes. We're not for you."

Every [[Kubernetes storage]] solution they've encountered is massive in terms of resource utilization, complexity, and cost. "I personally would have no storage at the edge if possible, but we know that we need to store some things in motion. We need to have that offline capability …" TenBrink concluded with a request to his audience, “I think there is a marketplace for an edge focused storage capability."

Lesson 8: Manage the (Transformation) Timeline

Dillon TenBrink’s final lesson addressed a challenge beyond technology: managing the transition itself. "You can't infinitely live in two houses, right?" TenBrink explained. "You're busy filling the new house with new shiny furniture. You still have the old house to deal with."

Because they're at the edge, scaling the system is a big, expensive deal. Deploying another $5,000 server to 2,300 sites is an $11.5 million ask. "That's not happening." 

TenBrink goes on to observe how, "If you build this, you will be a victim of your own success. You will create a platform that people want to use and they will rush to it and you need to take an active role in managing the transition." Paraphrasing him further, without active management, you wind up in the squeeze: the new platform grows, the old platform stays the same size, and you get caught in the middle. The backlog grows, and "nobody likes you as a platform team again."

Solving this dilemma mirrors how you move from one house to another. You set a close date, know when you must be out, know when you must be in the new house. "Set that expectation early in your platform build as well."

A slide from TenBrink's Flip That Stack presentation providing overview of The Home Depot's edge computing capabilities.

Results at the Edge: Scale and Agility at The Home Depot

So how does it all work? TenBrink was transparent about the results. By the numbers:

  • All US + Canada stores plus select distribution centers
  • 15,000+ edge servers observable and secure
  • 45 minutes to build a complete platform foundation
  • 4 hours for chain-wide deployment of new code
  • 5.5 billion documents processed per month through edge cache
  • 452 million messages delivered
  • 245,119 pods running across the fleet
  • 35 TB backups encrypted and replicated for disaster recovery

The platform runs on [[CNCF technologies]] and includes six core platform services: Edge Data Cache, Edge Messaging, Edge Config, Edge Run, Edge CX (Customer Experience), and Edge Foundational Services. Below that sit observability, connectivity, and security layers.

Looking Forward

"We've unlocked incredible agility for our business so that we can respond to customer needs," TenBrink said. Congratulations to Dillon TenBrink and The Home Depot team responsible for edge computing!

Dillon TenBrink, of The Home Depot.

For those interested in diving deeper into edge computing best practices, TenBrink is part of an industry working group called Edge Monsters, which publishes articles on industry practices and standards in edge operating environments. The group recently published an article specifically addressing the storage challenges discussed in this presentation.

The Home Depot's journey from a 25-year-old store-centric platform to a modern, cloud-native edge architecture demonstrates that successful execution requires timing, respect for constraints, relentless focus on developer experience, and careful management of the transition itself.

As edge computing continues to grow across retail, manufacturing, and other industries operating thousands of distributed sites, these lessons learned at scale offer a valuable blueprint for platform engineers embarking on their own edge journeys.

Watch the full presentation on YouTube: