PID Controller for controlling the number of servers in a data-center
gist.github.comWhy not something more intelligent than PID?
Control theory knows a lot more algorithms. PID is arguably simple to implement but is not particularly good algorithm.
It kinda seems to me as if everybody red only the first page on control theory and decided they don't need to read further and base their solution on it.
PID will basically have you experience either large overshoots (which you will experience as overcorrecting to changes in demand) or slow adaptation to changes.
There is also possibility that your system changes and your PID parameters will cause the whole controller to misbehave.
I have implemented a controller for espresso machine boiler water temperature. Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.
PID is a damn sight more sophisticated than most datacenter dynamic capacity control algorithms - most autoscalers barely even qualify as ‘bang bang’ controllers - they detect a need for more capacity, and add nodes at some artificially constrained rate until capacity is reached or they hit a max cluster size limit. Even rudimentary control theory is an improvement.
Of course the problem with applying PID to server capacity is that compute resources come in discrete chunks that are slow to bring online (‘computers’) rather than being a continually variable resource.
If you have enough of anything it starts acting continuous.
> Most autoscalers barely even qualify as ‘bang bang’ controllers - they detect a need for more capacity, and add nodes at some artificially constrained rate until capacity is reached or they hit a max cluster size limit. Even rudimentary control theory is an improvement.
Why do you think PID is an improvement here? IMHO the bang bang approach is preferable because it behaves extremely predictably. Operationally, knowing how the system will react to extreme conditions might be more valuable than being a little more optimal.
> Operationally, knowing how the system will react to extreme conditions might be more valuable than being a little more optimal.
With today's systems, we typically have neither.
> Operationally, knowing how the system will react to extreme conditions might be more valuable than being a little more optimal.
But there's a lot of theory around PID control to show how PID control systems behave under all sorts of circumstances. Hell, there's even things that are mathematically provable, like for example stability, which I strongly suspect you may often not even be able to reason about in case of many an ad-hoc systems like the ones GP is talking about.
Parent's point is that they are worse than bang-bang controller.
Its not very clear from your comment but are you claiming that PIDs are not predictable ?
Huh? Parent was arguing in favor of it: "PID is a damn sight more sophisticated", "even rudimentary control theory is an improvement".
Yes, PID upset is a real problem when things start going sideways, as they tend to occasionally in the real world.
My point was more that the idea of applying control theory is an improvement over doing what amounts to bang-bang control without really thinking about it.
Not that PID is better, not that bang-bang is better, just that.. learning from control theory is likely a fruitful direction to go in.
Most autos along algorithms have heuristically hacked hysteresis built in to stop them oscillating between provisioning a node, dropping a node, provisioning, dropping. That kind of threshold tuning stuff is ripe for replacement with something more rigorous. PID control is an example of a rigorous way to solve that in continuous control. It likely doesn’t directly work for capacity, but control theory in general does have better answers for this kind of thing than just PID.
Ok, I can agree with that.
> Huh? Parent was arguing in favor of it: "PID is a damn sight more sophisticated", "even rudimentary control theory is an improvement".
You clearly failed to read (and quote) what OP said, and therefore understand his point.
Please pay attention to OP's point that PID is more sophisticated with the current state of affairs, which don't even qualify as bang bang controllers.
The OP quite clearly and unambiguously stated that "Even rudimentary control theory is an improvement."
Just because you feel the need to one-up other people's work that does not mean it is ok you distort and strawman the points they made.
I suspect that datacenters do not implement PID not because nobody there is aware of control theory. I suspect they tried and decided against it.
I think Hanlon's razor applies; most people in the datacenters have not heard of control theory whenever I interacted with them.
The ones that did were usually from an academic background and might have had some side-track with robotics, industrial control software development or just plain theoretical studies.
I suppose it's not as much that nobody wants it, or nobody wants to know about it, but there is so much else to be known that it might not 'fit' when assembling study materials or in-house learning systems.
Maybe the best way to integrate control theory into the datacenter (or cloud) from an ops perspective would be starting out by getting some traction with control systems in general fist, just like a general software engineer might have had some software architecture and patterns for one or two semesters.
I work in datacenter optimization and we're aware of control theory.
I guess progress must be made in small steps... sigh...
Just curious, how did you replace PID with MHE? PID is a control algorithm, while MHE is a state estimation algorithm -- MHE is not a control algorithm and does not perform control.
Did you mean MPC by any chance? (which MHE is often used in tandem with)
If so, MPC is indeed a superior algorithm, but it also requires a dynamic model (LTI, or state space). Such a model may or may not be easy to identify -- it would require the characterization of the dynamics of data center operations.
PID on the other hand, while less optimal, is "model-free" (technically it has a model, i.e. its tunings can be thought to derive from IMC or direct synthesis, even though in practice hand tuning is common) in that it can respond to a wide variety of circumstances without knowing much a priori about the underlying dynamics. PID tunings are also amenable to optimization (products like Loop Pro are used in industry)
Due to their simplicity, PIDs are capable of operating at much higher frequencies than more complex algorithms like MPC. PIDs operate in the order of miliseconds or faster, while MPC operates in the order of seconds to minutes because it has to solve an optimization problem at every iteration, which is too slow for fast loops. In the hierarchy of control, there's supervisory layers on top (RTOs), then MPCs then PIDs. It's usually not either-or, but all together, working at different layers.
Even in industries where MPC is dominant, PID control is still ubiquitous and used alongside it, especially for local regulatory control loops. I don't have enough insight into data center ops to know if PID control is good enough but in my experience PID can be good enough for many applications -- most control loops in the world are essentially still PID. Not because more advanced algorithms don't exist, but because PID has the advantage of being just good enough for most purposes. (cost is also an issue: PIDs are cheap, while licensing costs for industrial MPC software range from 10s to 100k$)
In my case I built a model of espresso machine consisting of various thermal masses with different properties and impedancies between them. (Boiler heater element, water in boiler, scale on the boiler walls, boiler walls, grouphead, water in pipe between boiler and grouphead, water in pipe and pump between reservoir and the boiler, water in reservoir tank).
This model can estimate future temperature of brew water based on current and past temperature measurements in various points in the system as well as on amount of water being pumped through the system.
There are four temperature measurements being made:
- ambient temperature
- water reservoir temperature (the thermometer touches the reservouir), though I am pretty sure this one could be estimated from past operation of the boiler.
- boiler temperature (the thermometer glued to the outside of the boiler at a selected point below water line)
- group head temperature (the thermometer glued to the outside of the group head at selected point). This one could also potentially be estimated from past operation but I tried it and it complicates my model too much.
In particular the model is designed to be able to calculate a single parameter, what is going to be the water temperature if used for brewing coffee, at a point in future, if heating element keeps adding given amount of energy and pump pumps given amount of water.
The way the model is being used depends on what the machine is waiting for. For example, when you want to brew coffee, it delays start of brewing until it can achieve stable temperature.
To achieve this as fast as possible (ie power on from cold to stable temperature) the machine is heating water at maximum power and the model is being executed 50 times a second to estimate what will be maximum temperature attained by brew water if we shut the power now. The idea here is we want to run the heater at max power for as long as possible and shut it off at exact moment so that the heat that spreads will cause the brew water to achieve the exact desired temperature.
Mind that brew water temperature is not measured directly. I can only measure it experimentally with a modified portafilter.
The parameters of the system are being observed and I use other filters to correct the parameters as system changes. For example, the system can detect amount of boiler scale developing rather precisely, mainly due to how it affects impedance between water in the boiler and the temperature sensor.
---
Now, I am pretty sure it is overkill for an espresso machine. I am doing this to teach myself some control theory. But the effects are real and the algorithm works like magic -- the machine starts and achieves optimal temperature in shortest possible time and then keeps it stable with no over or undershoots that could affect the brew.
In todays world the chips that can run this model are being sold for pennies and the only real complication is rather precise, noise free temperature measurement that you need for this plus measurement of amount of water being pumped.
This sounds like an on-off controller with a predictive model. In this configuration, the predictive model (PM) predicts the temperature p-steps ahead (using 4 measurements) and then feeds the predicted output to the on-off controller, i.e. something like
PM(yₖ, uₖ) → Tₖ₊ₚ
uₖ = {uₘₐₓ if Tₖ₊ₚ < Tₛₑₜ else 0} (on-off control law)
where k = current time, yₖ = 4 temperature measurements, uₖ= manipulated variables (heating), Tₖ₊ₚ = controlled variable, i.e. temperature p-steps ahead (k + p), Tₛₑₜ = setpoint temperature.
If you're doing predictions, I'm not sure the estimator is technically called an MHE since MHEs only estimate the current state, not future states (it cannot because one technically cannot know future measurements), so I call it a predictive model here. (unless you're doing MHE to estimate the current state xₖ and then feeding that into the predictive model to get Tₖ₊ₚ predictions...)
This can indeed provide smoother, more accurate control than PID alone because (1) it uses a model that accounts for more variables than just the controlled variable, (2) it requires less tuning to avoid over/undershoot because it is guided by a model vs simply through setpoint errors; (3) it's a special case. I think it's a very practical control configuration for on-off temperature control, which has tons of applications from oven control to immersion circulator control etc.
I don't think this configuration has a name, but it resembles a simplified version of model predictive control (MPC) except for the lack of an optimizer (the control law is on-off vs an optimal control law computed via an optimizer). It is however a much less general configuration than a PID because in the general case (not on-off) you still need a control law to do trajectory tracking, and that law tends to either be a PID or similar.
> This sounds like an on-off controller with a predictive model.
Well, not exactly. At the startup, yes, but that's because the goal is to get to stable point as soon as possible. Think of this move like suicide burn made by SpaceX boosters coming back to Earth:) The most efficient burn would be done at 100% of thrust with minimum margins. In case of heater obviously the fastest way to get to temperature is to use 100% of heating power immediately and calculate when exactly to turn it off (or to transition to idle power).
But later the controller switches to maintain the temperature by producing power setting for the heater. The controller produces the desired power level as a value between 0 and 1 and this is then mapped (dithered) to turn the heater on/off for half cycles of AC power. The heater is only turned on/off on zero crossing to reduce emitted interference. For this I built a separate controller board equivalent to a solid state relay. I have actually used SCR diode pair + opto isolator + mechanical relay for safety (because SCRs may fail causing dead short).
> If you're doing predictions, I'm not sure the estimator is technically called an MHE since MHEs only estimate the current state, not future states (it cannot because one technically cannot know future measurements).
I guess this is why it is called estimator. Not only it estimates future, but also a parameter that cannot be directly measured.
I test the controller against modified portafilter with a precise temperature sensor in it, but in normal operation, obviously, I don't get this. In some sense this is an open loop controller.
Yes, I am using MHE to estimate current state (including model parameters) and use it to feed predictive model.
The current state of the system is quite complex, and it is possible to describe it in many different ways. I like to think about it as various pulses of energy traveling through the system.
One of my approaches to build predictive model was to measure effect of step pulses on the system (for example with the cold machine I apply 10s of full power, then measure the change over time). Then I tried to divide the operation of the machine into separate pulses (for example 1s of 70% of power) and superimpose predicted effects of those past pulses on future evolution. This is quite easy if you are just heating but quickly gets complicated when you also try to move water through the system. But I found it fun and gained some understanding of the thermal system.
This all sounds extremely cool. What is the base espresso machine you are building this on top of? Do you have a blog where you document your process?
So many people ask this. The base machine is Rancilio Silvia, selected because it is relatively cheap, well made, simple, and has no electronics in it. It is a collection of AC switches, thermostats and other AC-powered components.
I don't care about making a blog. I do it for fun. Given choice I would prefer to spend the time doing some more cool projects.
I had no idea this stuff existed. This sounds pretty exciting actually.
Is it really that hard/complex to calculate MPC that it can't be stuck inside a loop that runs at 1hz or less? Talking on decent generic or dedicated hardware. Some tiny SOC is a different story.
MPC on chip was tried a decade ago [1]. It was based on the idea of parametric programming, where solutions of an optimization problem (a quadratic program, QP) were pre-computed and the control law essentially only involved doing lookups. Lookups were fast and avoided the computational cost of solving an optimization problem on-line.
However, as you can imagine, pre-computing optimal solutions is of exponential complexity so scaling up this technique was difficult at the time. (I'm not sure what the state of the art is).
Computation however has come a long way since, so today it might be possible to actually solve a decent sized quadratic program (QP) on chip at very frequencies in real-time.
[1] https://www.sciencedirect.com/science/article/abs/pii/S00981...
The difficulty with MPC is in the name -- Model predictive control. You need a model.
The model need not be very complex. Think of it as first approximation. It is just, the better model you have, the faster you are going to achieve the result.
https://pidtuner.com can help with that
I think a better set of arguments against the PID as implemented is that is that does not appropriately take into account the actual penalties of the system. As written, the controller will treat errors symmetrically - the it takes the same effort to correct for over and under provisioning.
In reality, we know that this is not realistic. Over provisioning results in an immediate financial cost (that can be easily modeled in $$$), but under provisioning results in a far more complex penalty. I think it'd be important to understand these costs (along with the general shape of your traffic) before implementing a control system.
Furthermore, it's very likely that you'll want to implement a deadzone, and almost certainly you'll want to implement a low pass filter, especially if you're sampling processing time significantly faster than ~30 seconds (the estimated startup time). Oh, and the usual things like anti-integral windup, and hard limits so you don't bankrupt yourself.
Hi OP here. Yes you are right about the asymmetrical nature of the penalties. Actually I do handle this case by using an asymmetric `shrinkage` on the error. See the function called shrinkage in the notebook.
Good day :)
But that's trivial to correct w an assyemtrical error function.
Oh absolutely. But once you start adding all of these bits, then suddenly PIDs become much less simple, and more annoying to analyze. Like all of the LTI assumptions start getting significantly broken. Suddenly you have a fork in the road and you say:
A) Make it "MORE complex" - go model predictive for example as suggested by OP (or whatever). Now that your PID is a gain scheduled, asymmetric, dead-zoned beast, maybe the difference between more complex systems and PID seems less daunting.
B) Make it simpler! Just effectively make it bang bang (or pure P) with a deadzone. Leave some performance on the table, but gain the confidence that less will probably go wrong.
C) Double down on PIDs. Gain scheduling is fun!. You can figure out how to constrain your system, and carve out regions of LTI goodness and be confident in your transitions.
These are all valid solutions. As a lazy engineer, I think B) should be the first choice of any business. And honestly, I think that's where a lot of real businesses ended up.
There is one more dimension to this problem. It is how engineers are going to be interacting with the model which includes ability to understand, predict and configure it.
For a mechanism to scale server farm, it cannot be too surprising to engineers that work on it. They need to be able to understand what it does more or less to be able to figure out what is going and deal with problems.
This is even more important in corner cases. What is going to be startup behavior? What happens when a large number of servers goes down? Etc. These behaviors can be very rarely encountered but engineers must be able to predict and reason about them.
I've used PID at some very large systems. Main reason is simplicity - dynamic control is introducing huge amount of chaos to your system. Having simple algo like PID (that you need to tune carefully, and retune after each big change, true) has a big benefit - you can reason easily about behavior of the system. And for big systems, that's _extremely_ valuable property, that's often underestimated.
> Why not something more intelligent than PID?
I'm curious if there exists something intelligent enough to be able to rely on it for a large-scale deployment without human oversight? In general, I would think dead simple and manually controllable is a feature in the context of expenditure.
> Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.
I'm familiar with PID controllers, but haven't used MHE before. The formulas look really similar at first glance to me. Three terms and three weights? Is a slow PID controller due to poor tuning, or is MHE intrinsically better? What is the reason that MHE would adapt more quickly with less overshoot than PID? MHE appears to be three integrals instead of one, but I don't see immediately why that would be better, is there an intuitive and/or fundamental reason?
> I'm curious if there exists something intelligent enough to be able to rely on it for a large-scale deployment without human oversight?
I think without a context that's a no. Some systems will want good throughput, some want low latency and require pre-scaling on some cues (time of day, day of week), some want minimal cost but do want to allow bigger bursts for a max of N minutes, etc.
Any intelligent scaling without human oversight has a good chance of either burning your money or not optimising for what you care about.
Yes, Chemical and Oil manufacturing is running on fully automated economic model predictive control for literally decades.
PID is often good enough and more robust than ad hoc algorithms to do the same.
PID is poor algorithm in this case because there is relatively large delay between signal to spin up a server and observed effect of it. PID requires a lot of iterations to stabilize, which multiplied by delay will require a lot of time.
A model predictive controller will need much less iterations because it would actually try to predict number of servers necessary based on some kind of model of the server farm.
Parameters for that model can even be learned/adjusted over time, automatically.
> in this case because there is relatively large delay between signal to spin up a server and observed effect of it. PID requires a lot of iterations to stabilize, which multiplied by delay will require a lot of time.
Then you can tune the PID parameters.
Exactly, you can use PID to turn a small craft or a large ship, or to heat a tiny copper vessel or an enormous iron one. Delayed effect and tuning time to setpoint and overshoot are, like, well within its territory.
PID doesn't care how long it takes to achieve stability but sometimes you do.
In a lot of cases the time to settle is so short that it doesn't matter but sometimes it does. When you operate datacenter you might not want to wait for hours for PID to spin up the right amount of servers.
Still, the performance of PID is limited if it takes three minutes to spin up a server.
That means either you'd have to run the control loop comparatively slowly (e.g. once every five minutes), or you'd need to have the gain very low, or you'd ave to tolerate a lot of overshoot. None of which will produce great results.
A more sophisticated controller would be able to take into account the fact there are servers currently starting up, and only call for more if those already on the way won't be sufficient.
I agree with your points but this particular point that you mention ...
> A more sophisticated controller would be able to take into
> account the fact there are servers currently starting up,
> and only call for more if those already on the way won't be sufficient.
cant that be modelled into the transfer function characterizing the system. Even linear dynamics are quite capable of modeling sluggish dynamics, so PIDs should be able to handle them fine. If there are strong nonlinearities and the system may start far from a desired set point (to the extent that local linearizations are far too inaccurate) or may venture far out, then yes PID may indeed have trouble.
PID roughly occupies the same space that logistic regression does in ML. Sure one can use more complicated DNNs when the task really needs it but for many many classification or conditional probability estimation problems a regularized LR goes a long way and its hard to beat it on the axis of simplicity. Of course your resume wont look as shiny.
I'll bite. What espresso machine? Temperature control and temperature profiling is actually quite bad on most machines. It's actually mostly that people don't even have a good way to measure temperature at the grouphead to really know if their coffee is improving or if they've just improved their temps at the boiler.
I built this for Rancilio Silvia (see explanation of the model in another answer in this thread).
I chose Silvia mainly because it doesn't have its own electronics, it is all 230V AC wiring, thermostats, switches, etc.
I made myself a precise thermometer with PT 1000 probe inside coffee puck and I learned the temperature can vary as much as 10-15 degrees.
I first used PID but was not satisfied with long settling time, because when water is right temperature at the boiler it still needs to pass through huge hunk of metal that will determine to large extend the end temperature of brew water.
So I built a bit more complex algorithm with the aim of first getting the water a little hotter in the boiler (while grouphead is still cold) and then slowly adjusting setting for the boiler as the grouphead heats up.
But there were still problems, for example pumping cold water into the boiler threw everything into chaos. Then the problem that you get temperature readouts with delays and offsets.
That's when I decided to just build a more complete model of the system that does not only take current but also past states of the system into account.
Can anyone recommend a “Control theory for the layman” type textbook? Specifically I’m interested in finding something that gives good overviews and examples, and while I’m not afraid of math, I’m not looking for something overly academic or advanced.
The gold standard introductory book on control theory is "Feedback Control of Dynamic Systems" by Franklin, Powell and Emami-Naeini. While math is absolutely unavoidable when studying the subject, FCDS starts very gently and is quite accessible to the layman.
For someone with a strong mathematical background, what would be a better book? Also, do you know of any that balance the theory with examples, say simulations using R?
I am using Optimal Control Theory An Introduction from Donald Kirk and Modern Control Engineering from Katsuhiko Ogata.
Both are heavy in math, but are not that difficult (I studied theoretical math some 20 years ago and I can read them).
I recommend Ogata as well. Another classic.
Hi OP here.
To answer why PID, basically somebody I follow asked this question on twitter, and I thought yeah why not seems like a reasonable thing to try :)
Actually I do conclude that PID is not quite the right thing for this problem. For me the learnings from making a PID sort of work for this problem were:
1. must use the right error function. like frequency not time. 2. must use shrinkage on the error to handle discrete number of server. 3. have to run controller at a multiple of server delay to avoid perturbations.
Also I discuss the basic assumptions that a PID controller makes that are suboptimal in the video.
My problem with PID is people don't wrap it around a model and use it to correct the errors in the model. Often their tuning ends up "carpet-bumping" between multiple critical points in the system. But as others have mentioned, it gets you ok, predictable performance.
> Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.
PID are simple to implement but tricky to tune.
In your case you could probably have solved most issues by improving tuning.
Not sure if that's the case with others, but I have a really hard time connecting CT with actual applications. My classes were full of complex maths and toy examples, but very weak in actual engineering and heuristics.
Well, you need control theory whenever you need to have something kept at a certain level. House temperature, cruise control speed, etc..
as soon as I saw the title I thought, "huh, that really doesn't sound like a good idea". It would likely be over sensitive or under sensitive and likely require lots of continual tweaking with the tuning. Not to mention the discrete step wouldn't be smoothed out till you have quite a lot of server resources in play.
Have you practiced control theory in an actual process plant? PID is not simple to tune it at all, well PI, nobody uses the derivative part. Countless studies published in the industry show that up to 30% of the PID control loops in operation are set to MANUAL by the operators and from the rest more than half have wrong parameters.Even then +90% of the controllers are PID, because despite all their problems they work better than the more sophisticated alternatives, they are more reliable, maintainable and cheaper to license.
> PID will basically have you experience either large overshoots (which you will experience as overcorrecting to changes in demand) or slow adaptation to changes.
Such a blanket statement is meaningless without a description of the system you are controlling, those kind of overshoot can be attributed to wrong parameters, to badly sized elements of control or even to bad measuring devices, the PID algorithm has zero to do with those cases.
> I have implemented a controller for espresso machine boiler water temperature. Replacing PID with moving horizon estimator allowed me to cut time from startup until stable temperature by at least half and eliminate any measurable over or undershoots.
Did you try a "bang-bang" controller,I would no be surprised if you'd get the same results with 1% of the complexity.
The problem with espresso machine startup is that there is 1.5kg piece of metal (grouphead) through which flows around 60ml of water from the boiler.
What do you think is the relation between boiler water temperature and the water that actually reaches coffee?
When the machine starts, the grouphead is cold. If you want to get good brewing temperature you need to, either
a) Wait for 40 minutes until grouphead heats up slowly and everything stabilizes
b) Heat up water for 10 minutes and then pump a lot of water through the grouphead to get it hot from the water and pray everything works well
c) Build a model that will predict correct setting of water in the boiler given predicted temperature of the grouphead so that when you push water through the grouphead it cools just the right amount. Unfortunately, the correct brew temperature is 92C so you only have couple of degrees to work with. But, still, heating the water to higher temperature causes grouphead to heat up faster and you don't need to heat it as much because it will receive hotter water.
Commercial machines do not have this problem because they are started in the morning and only turned off for the night and they have huge tanks of brewing water in them so inflowing water does not affect the temperature so much.
Useful if you’ve never heard of PID controller: https://en.wikipedia.org/wiki/PID_controller
Thanks; I hadn't heard of it before
Do you work in a datacenter by any chance?
Nope, just found the topic interesting:)
Not sure how relevant, but reminded me of this thesis[1] which is based on resource closure operators[2]. The thesis applies the model to that of CPU frequency scaling, but I guess a model could be made for something like scaling number of compute nodes.
From the abstract of [2]:
We evaluate a specific design for a resource closure operator by simulation and demonstrate that the operator achieves a near-optimal balance between cost and value without using any model of the relationship between resources and behavior. Instead, the resource operator relies upon its control of the resource to perform experiments and react to their results. These experiments allow the operator to be highly adaptive to change and unexpected contingencies.
Not my field so not sure if anything significant has been done using this in the past 10 years, or if it fizzled out.
[1]: https://www.duo.uio.no/handle/10852/8753
[2]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.304....
Useful if you need to adjust the PID gains https://pidtuner.com
Why? A PID controller is always a kludge, here extremely so. Something ad hoc could easily be both more optimal and mathematically simpler to analyze and test.
> Something ad hoc could easily be both optimal and mathematically simpler to analyze and test.
Could you give an example ? I don't think PIDs are chosen for their optimality properties.
More optimal, you deleted a word. Not optimal, but closer to optimal. For the problem we're looking at, the bang bang controller is a far better choice IMHO.
I'd like to see an example of when you think PID is an ideal choice? I've never found a real usecase. Whenever I've hacked one into anything, I've quickly replaced it with something simpler and better (thermostats being the most obvious example).
You are throwing around words like 'better', 'optimal', "mathematically simpler" in a way that does not give me a lot of confidence. Seems padded with weasel to the point of being unfalsifiable and vapid. Engineering is a quantitative field after all.
I did not claim PID to be optimal. It was your claim that you found ad hoc methods to be more optimal (whatever that is supposed to mean). Surely you would be able to give examples and in what quantitative way you have found them "better", or "more optimal" and in what ways you have found PID to be mathematically complex.
I gave you an example: a thermostat. It's amusingly common to try and apply PID control to these, and the replies to that parent comment highlight several superior alternatives, both in terms of simplicity and optimality.
Simplicity is hard to quantify, but it's one of the most important properties of robust and reliable systems. That's not vapid at all: that's engineering.
If thermostats are indeed better why doesnt chemical industry use thermostats to control the temperature of the reagents. PID seems to rule the roost there. What problems does a PID face in comparison to thermostat. Does bang bang track the set point better (in terms of some quantifiable error) than a tuned PID ?
I would be surprised if missile guidance, drone guidance use bang bang. My hunch would be they use something more sophisticated than PID. Same for something more banal like cruise control in a car set to some cruising speed.
PID requires very fragile tuning, and that makes it generally inferior to anything that doesn't.
> If thermostats are indeed better why doesnt chemical industry use non-PID to control the temperature of the reagents. PID seems to rule the roost there.
For the same reason they still use constant speed compressors: it's how it has been done forever, it's good enough, and all the equipment still works. We know better now, we build new ones with VFDs.
I'm not saying "let's tear down everything using PID and rewrite it", it obviously works. But if you're writing new code using PID, you're probably wrong.
If you are going to sell me a bang-bang controller for cruise control of a mini car or a trajectory tracking of welding robots, or automated gun/fire control to track a moving target while firing at the same time, blaming the lack of bang bang on institutional inertia, then I unfortunately have to call you out for speaking through a hat.
Bang-bangs are just not smooth enough and does not give enough control over tracking. With tuning, PID can at least perform within the desired smoothness envelop. PID have been surprisingly robust even if the gains are set far from optimal. There are satisfactory tools and best practices for tuning PIDs.
A stuttering car on bang bang cruise control will certainly not be a pleasant ride.
Bang-bangs cannot do the job where you have a low inertia, low damping system that cannot tolerate jerks. PID introduces exactly those properties, inertia and damping, but through the controller.
You're arguing with a point I didn't make: the bang bang controller was a random counterexample.
> Bang-bangs cannot do the job where you have a low inertia, low damping system that cannot tolerate jerks. PID simulates exactly those properties, inertia and damping, but through the controller.
This is exactly what I'm saying: the control theory should model the actual system being controlled. Most uses I have seen of PID in the software industry do not, and consequently have to be hacked to death and are very sensitive to tuning. Old industrial stuff can be like that too sometimes.
Does datacenter load have analogous properties? Absolutely not. That's my point: it's a bad usecase for PID.
> Bang-bangs are just not smooth enough and does not give enough control over tracking
Well, in fairness it could if the timestep could be made arbitrarily small. But in practice, yes, this obviously wouldn't work.
> Well, in fairness it could if the timestep could be made arbitrarily small. But in practice, yes, this obviously wouldn't work.
I agree. This would essentially be pulse width modulation or control. You need sufficient inertia though, and damping that the vibration does not have any adverse effect.
> Does datacenter load have analogous properties? Absolutely not. That's my point: it's a bad usecase for PID.
I am not knowledgeable about that but I would presume there would be inertia like effects, delays. Bringing lots of servers up simultaneously may choke a contended shared resource leading to damping. In such cases one cant boot up too many servers too fast. Jerks might be better tolerated, except perhaps in high frequency trading scenario.
> I am not knowledgeable about that but I would presume there would be inertia like effects, delays. Bringing lots of servers up simultaneously may choke a contended shared resource leading to damping. In such cases one cant boot up too many servers too fast. Jerks might be better tolerated, except perhaps in high frequency trading scenario.
Sure, you can model it like that. But I can actually go measure all the physical properties of the car and calculate a tuning that will work for cruise control, because the car follows known physical laws (within the limits of my ability to measure things).
I can't "weigh" the datacenter. I must to some extent observe the system and empirically derive tuning. That tuning is basically overfitting PID to second and higher order effects in the system without taking the time to actually understand the root of them.
Can that work? Sure. That doesn't change the fact that, IMHO, there is almost always a better way.
This is probably more contraversial... but personally, I'd rather have a hacky ad hoc approach than overfit PID, because the former is so much easier to debug and understand.
I am not sure what you mean -- measuring impulse response in cars good but measuring impulse response of datacenters bad (or impossible). Measuring the weight of a car is an empirical observation as well. I also don't see how it follows that PID will necessarily overfit second and 3rd order effects whereas bang-bang wouldnt
https://diffeq.sciml.ai/dev/extras/timestepping/
Current best method for solving ODEs is PI control. (At least, when I checked it a few years ago.)
A thermostat is certainly simpler than a PID controller. But it can't solve the problem that a PID does.