Settings

Theme

Chick Fil A's Edge Enterprise Architecture

medium.com

93 points by freshrap6 3 years ago · 57 comments

Reader

billllll 3 years ago

> The goal of the Restaurant Edge Compute platform was to create a robust platform in each restaurant where our DevOps Product teams could deploy and manage applications to help Operators and Team Members keep pace with ever-growing business, whether in the kitchen, the supply chain, or in directly serving customers.

> (Previous article) Our hypothesis: By making smarter kitchen equipment we can collect more data. By applying data to our restaurant, we can build more intelligent systems. By building more intelligent systems, we can better scale our business.

I must admit, from an outsiders perspective, it really sounds like a bunch of buzzwords justifying a solution in search of a problem. Their examples of forecasting waffle fries reminds me of a failed startup that forecasted how many checkout lines to open via computer vision (which I can't find on Google). In the end, it turned out it was a lot easier for a human manager to simply open a new line when required, and the computer vision provided the wrong forecasting to be accurate. I wonder what CFAs success criteria and metrics are for this project.

Tech-wise, wouldn't it be a lot simpler to do a single node, single application that gets updated via something like RAUC? Especially if you have a small team (which they emphasized), it seems to me like adding a Kubernetes cluster at the edge adds complication without much benefit, other than "redundancy" (how redundant is a single rack with the same power source anyways?). Also, how would they get an important security update to the host, if it becomes necessary?

It's a lot of nitpicks, but the project overall is very cool. Sounds like they solved a lot of hard tech problems and executed well on the ops.

  • FooHentai 3 years ago

    Fundamentally what they've deployed here is something a lot of organisations struggle with or don't even recognize as worth having - A reliable edge which you can trust sufficiently to form the functional core of a site.

    For a restaurant chain this is something worth putting the development effort into because once you've figured it out and ran it for a few years to demonstrate it's reliability you can pitch the shift from a network-optional edge at each site to a network-dependent site with intelligent components hanging off it and depending on it. That's a pathway to having a major competitive advantage in the medium term that your competitors won't be able to put into place overnight once they realize you've left them behind.

    You can't get there with the amount of effort often put into untrusted edge sites like this - aka a pc in a cupboard. You also can't get there with cloud when the weakest element in the chain is unrealiable site connectivity.

    They could have done it in a lot of different ways, but going with cheap commodity hardware and avoiding expensive cluster license nonsense (vSphere etc) were smart choices. Spend that money on a compenent centralized tech team rather than vendor shinyness, and you can do a hell of a lot more (and often move faster, to boot).

  • mrandish 3 years ago

    > Their examples of forecasting waffle fries reminds me of a failed startup that forecasted how many checkout lines to open via computer vision (which I can't find on Google).

    CFA leads the industry in revenue per site. I think more accurate forecasting is a significant factor contributing to this. Their sites aren't larger or better located than their competitors. In fact, they're often right next to their competitors in a similar footprint. Since I have a CFA nearby which I drive by multiple times a day, I've seen first hand that they always have substantially more cars in the drive-thru line and parking lot than their competitors in the same strip mall parking lot.

    Customers will see the overflowing CFA line at the drive-thru yet still choose to pull into line because they've learned that CFA's throughput is dramatically faster than their competitors. In my experience I'd guess about 2x-3x faster which is incredible when you think about it. They achieve that by getting a lot of things right but it seems obvious ensuring their order delivery backlog is as fast as possible through more accurate load prediction would be a key factor.

  • c4ptnjack 3 years ago

    Buzzwords or not, Chick-fil-A has an incredibly efficient setup that has really improved significantly in the last four years.

    • fatnoah 3 years ago

      All I know is that I can be car #105 in line at Chick-fil-A and I'll get my food faster than being car #2 in line at McDonald's. Whatever they're doing is working.

      • mrandish 3 years ago

        Yeah, in my experience they are by far both the fastest and most consistent fast food restaurant. I've read they are also the highest revenue per-site chain.

        • psychphysic 3 years ago

          I read that they have an incredibly efficient setup, sometimes in the drive thru you get served 100 places before some competitors.

          And they have the best revenue per site.

  • throwawaaarrgh 3 years ago

    I agree that K8s is adding unnecessary complexity here. But the edge computing idea might actually apply in a couple ways.

    Mostly though, any data they collect will be very valuable, as forecasting is a core component of fast food logistics. Fast food lives and dies on efficiency.

  • Traubenfuchs 3 years ago

    > Their examples of forecasting waffle fries reminds me of a failed startup that forecasted how many checkout lines to open via computer vision

    Maybe this?

    https://ixr.com/queue-prediction

  • berebere 3 years ago

    Instead of building the entire stack using (OS + K8) why not use Azure IoTEdge or AWS Greengrass for fleet management? These services seem to have solved a lot of the problems (low-footprint, redundancy, cloud management) already.

    • rektide 3 years ago

      Picking standard open source starting places seems like a more than obvious move.

      Saying you want to invest in ongoing intense data-driven store innovation, then building the whole thing atop a platform that you cannot rely on (may get discontinued, price may become huge, may become a barrier to technical innovation), that you dont control seems like an obviously bad move.

      Finding smart people, rolling up your sleeves, & recognizing this as a core competency, an enabler, a driver of your business, & not outsourcing the problems, is the right move. If future teams do a better job building edge kubernetes, there should also be good portability.

    • klysm 3 years ago

      Cloud services like those tend to be limited in annoying ways. Also sometimes they simply vanish (@google). The lock in is also a negative.

      • throwawaaarrgh 3 years ago

        The lock in isn't a negative, it's just a cost. If you didn't build it yourself, that was a time and expertise savings. If they go away, you either just use the other vendor, or you have to pay for the time and expertise now, which you would have been paying anyway if you didn't use the vendored solution to begin with.

        • klysm 3 years ago

          > The lock in isn't a negative, it's just a cost.

          > just use the other vendor,

          The point of lock in is to make it not a “just” to use a different vendor.

          I also disagree with characterizing it as a cost. It is a negative because of its risk.

      • barbazoo 3 years ago

        And they mention that in the article, too.

pm90 3 years ago

They talked about this at a Kubecon iirc. I wasn’t sure if it was an elaborate prank but they were seriously smart folks who patiently explained why they needed to do this and I remember being very impressed.

greatpostman 3 years ago

People laughing at chick fil a should look at dominoes stock price. Which has absolutely exploded due to technical innovations (and a new recipe)

rajnathani 3 years ago

HN discussion on their initial 2018 blog post: https://news.ycombinator.com/item?id=17820626 (570 points, 392 comments)

jwsteigerwalt 3 years ago

Refreshing to see dynamic and nimble solutions from a large organization.

  • psychphysic 3 years ago

    It's how they dominate.

    It's all the more important as walk ins and drive thrus reduce while delivers continue to rise.

recuter 3 years ago

   Our hypothesis: By making smarter kitchen equipment we can collect more data. By applying data to our restaurant, we can build more intelligent systems. By building more intelligent systems, we can better scale our business.

   As a simple example, imagine a forecasting model that attempts to predict how many Waffle Fries (or replace with your favorite Chick-fil-A product) should be cooked over every minute of the day. The forecast is created by an analytics process running in the cloud that uses transaction-level sales data from many restaurants. This forecast can most certainly be produced with a little work. Unfortunately, it is not accurate enough to actually drive food production. Sales in Chick-fil-A restaurants are prone to many traffic spikes and are significantly affected by local events (traffic, sports, weather, etc.).

   However, if we were to collect data from our point-of-sale system’s keystrokes in real-time to understand current demand, add data from the fryers about work in progress inventory, and then micro-adjust the initial forecast in-restaurant, we would be able to get a much more accurate picture of what we should cook at any given moment. This data can then be used to give a much more intelligent display to a restaurant team member that is responsible for cooking fries (for example), or perhaps to drive cooking automation in the future.

   Goals like this led us to develop an Internet of Things (IOT) platform for our restaurants. To successfully scale our business we need the ability to 1) collect data and 2) use it to drive automation in the restaurant.
The football game next door is over and the home team won? Start extra burgers in anticipation of hungry fans - great. I buy that.

The whole thing can be one app running on an iPad with multiple redundant data plans enabled, esims from AT&T and Verizon or whatever. You're going to need a touchscreen tablet for the POS anyway, no need for additional hardware or Kubernetes.

  • joezydeco 3 years ago

    The game is over and carryout orders are starting to flood in from hundreds of customers on the smartphone app. And now Grubhub, Doordash, and UberEats are sending orders too.

    The iPad is going to handle that and signal to the cooks to drop more chicken tenders?

  • hgsgm 3 years ago

    Do people go out to eat more after wins than losses?

    • recuter 3 years ago

      No idea, but I'm sure a super fancy machine learning big data model can run on the iPad/POS itself instead of 2000 Kubernetes clusters.

lovetocode 3 years ago

Could we have replaced the EdgeCommander with infrastructure tools like Puppet or Ansible?

almost_usual 3 years ago

Year 2023, FAA systems fail nationwide due to a “corrupt file” and Chick Fil A is highly redundant with a Kubernetes cluster running every store.

  • decremental 3 years ago

    The private sector will almost always be more efficient than the public sector, but that said, the last time that system in question was in the news it was because it had been given a more "inclusive" name. It speaks to the priorities of the regime.

  • MBCook 3 years ago

    Isn’t NOTAM run by a private company already? I don’t think the FAA runs the actual hardware.

ramesh31 3 years ago

But why?

  • mike_d 3 years ago

    When I was at Walmart the holy grail was edge compute in every store (which I think is now live). There is a mainframe in each store that powers point-of-sale, inventory, etc. But in terms of building modern apps, you couldn't assume 24/7 local connectivity was a thing or that it would be fast enough for what you wanted to do.

    Making each location its own failure domain was also a huge win. Imagine a cloud outage taking out hundreds of stores.

    • sithadmin 3 years ago

      >There is a mainframe in each store Nope. vSphere clusters in every store. Possibly the biggest ‘field’ footprint of vSphere that exists.

      • mike_d 3 years ago

        That might be true, but the heart of every store is an AS/400 or whatever IBM now calls the iSeries replacement.

  • 2d8a875f-39a2-4 3 years ago

    HA design for stores is interesting. You want the store to stay up (dispensing goods and accepting payments generally, additionally preparing food in this case) as much as possible. At the same time you want a solution that is cheap to deploy and maintain.

    Making everything in the store a dumb client is cheap and easy, but also fragile. Doing as much computing in each store (and even on each POS) as possible is great for HA but now you have more complicated hardware and software deployment problems. Different merchants trade these off in different ways.

    CFA seem to have gone for a lot of computing in the store, and the rest of the design is about mitigating those deployment and maintenance problems and costs. I like the NUC cluster, Gitops, API and support team stories. Am less keen on the K3S deployment per store, seems like a questionable choice of orchestration engine for this scenario but maybe there are details of rest of their store architecture that I'm missing.

  • shrubble 3 years ago

    I remember 10 years ago when my friend asked me to go look at his restaurant client's computer and diagnose/fix if possible.

    The 486 based PC had a mix of grease and lint/dust on every possible surface including the power supply fan, all cabling and the entirety of the motherboard. It had been placed on a shelf near one of the deep fryers and had run without problems for years. Certainly the other end of the 'long tail' of computing!

  • Rebelgecko 3 years ago

    The previous article linked in the post goes more into the "Why"

    https://medium.com/chick-fil-atech/enterprise-restaurant-com...

  • cdirkx 3 years ago

    It sound a lot better than what we had when I worked at a fastfood restaurant: every location remote desktoping to some overloaded server in the regional office, so simple things like printing an inventory list would involve a lot of waiting.

  • koz1000 3 years ago

    Why do all the tech? The goal is to have the ultimate model of just-in-time manufacturing: to never run out of supplies and, without sounding creepy, to know precisely what the customer plans to order and have it in a bag in and in their hands the moment they arrive at the store.

    There are upstream benefits for marketing to see the feedback of their campaigns in real-time, but this is mostly about keeping inventories low and service times lower. That equates directly to profit.

    CFA isn't the only chain working on this, but they're the most open about how they're implementing the infrastructure.

  • KennyBlanken 3 years ago

    Justifying the most expensive franchise 'take' of any chain in the nation, apparently.

    They're basically the mob of the fast food industry. They only want $10k from you to start your franchise; they cover all the costs of starting the business. The tradeoff they have the highest percentage take of any franchise.

    • annadar 3 years ago

      Almost like they gave you a bunch of capital and they want their money back. Weird huh? "mob of the fast food industry" That's funny.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection