Settings

Theme

Introducing Chronos: A Replacement for Cron

nerds.airbnb.com

149 points by AirbnbNerds 13 years ago · 62 comments

Reader

tptacek 13 years ago

I'm surprised there aren't more things like this, because Cron replacements are really valuable.

We wrote a small C program that serves as a scheduling daemon with Redis; we have a keyspace in Redis that can be used to schedule millisecond-granular periodic or one-shot tasks with a flexible "Chronic"-like specification syntax.

It is hugely more convenient and usable than cron. Once you have it, you immediately spot lots of opportunities to factor systems into scheduled jobs that you might have avoided doing if it meant you had to deal with cron.

  • minimax 13 years ago

    I don't think it's uncommon at all to roll your own job scheduler. I know I've done it at the last two places I have worked, though neither had nearly as fancy a front-end as Chronos appears to have. This particular scheduler (Chronos) looks like a cross between traditional cron and HPC cluster job scheduling systems (a la Sun Grid Engine). It looks like a really cool project.

    • tptacek 13 years ago

      That depends on what you mean by job scheduling, right? Lots of people would call Resque a job scheduling system, but they still reach for cron when they want to run something every 5 minutes.

      • minimax 13 years ago

        I'm about as unaware of the Ruby ecosystem as anyone who regularly reads HN can be, so while I've heard of Resque I'm not really sure where it fits in.

        I suppose the definition of job scheduling is dependent on what your workload entails. Is it something you expect to complete in less than a second, a minute, or a big resource intensive HPC job you expect to run for hours? What sorts of dependencies do you have between the things you want to run? Will it run in the main event loop, separate thread, separate process, or on a separate host? I think this will reflect in the tool you choose for scheduling, or cause you to write your own.

        It sounds to me like the target workload for Chronos is big slow batch jobs (ETL, data analysis). HPC job schedulers take care of things like resource management (run this job on this machine because it has the lowest load) and dependency management (don't run job b until job a completes), whereas cron handles running periodic tasks and that's about it. Chronos is nice because it looks like it is combining the two and includes a nice looking web based control panel to boot.

  • jeremyw 13 years ago

    I'll mention solving a lateral problem. We liked crond fine, but wanted a minimal impact way to make crons to survive machine stalls, they needed to consistently run somewhere.

    So a small bit of Python is enough to perform ephemeral leader election -- for the current minute -- in Zookeeper. Prefixed to otherwise stock invocations a set of machines run the same thing, one of them wins, nobody gets paged.

      * * * * * my long and winding command
      [becomes]
      * * * * * cron-coord somename my long and winding command
  • recuter 13 years ago

    > It is hugely more convenient and usable than cron. Once you have it, you immediately spot lots of opportunities to factor systems into scheduled jobs that you might have avoided doing if it meant you had to deal with cron.

    That's actually pretty interesting. Can you give a few examples that fall out of having a more fine grained (millisecond) resolution?

    • tptacek 13 years ago

      One example that jumps to mind was that I was able to rip out several hundred lines of fiddley retry and backoff code and replace it with fine-grain-timed repeating events.

      One advantage to doing this in Redis was that our events aren't just "run a program" (though we can do that); we can also push onto a queue that consumers BLPOP from, or send a pubsub message to a bunch of consumers, or increment counters.

      later

      Makes you think, it's nice that Redis is minimal, but a core feature that might work really nicely with Redis is timer support.

      • thezilch 13 years ago

        You will get said timer support in Redis 2.8 with the ability to subscribe to keyspace events, which includes keys expiring (with millisecond resolution). https://github.com/antirez/redis/issues/594

          psubscribe __keyevent@0__:expired cron_*
        
          psetex cron_run-me-in-1s 1000 0
        • kelnos 13 years ago

          It's not clear to me from the writeup if you'll get notification when the key actually expires, or when the key is expired because someone tried to access it.

          Currently keys don't actually really expire from the datastore until someone tries to access it after its TTL runs out. Changing redis so it will actually actively expire keys at the precise time the TTL runs down sounds expensive... but I would love that feature nonetheless.

          • thezilch 13 years ago

            Redis's expirey strategy has changed as of Redis 2.6, which does have Redis actively expiring keys.

            • kelnos 13 years ago

              Ah! I did not know this! This may make a few things simpler for me...

        • jeromeparadis 13 years ago

          Awesome! I've been waiting for this for a while.

  • codewright 13 years ago

    Don't suppose you could be convinced to release that scheduling daemon as open source, could you?

    • tptacek 13 years ago

      Seems like it's all downside and not much upside; it's an unambitious little C project that would mostly give people an opportunity to write blog posts about my C programming style. :)

      I'm not saying I'll never publish code, just that I've become choosy about it in my advancing years.

      • sillysaurus 13 years ago

        We wrote a small C program that serves as a scheduling daemon with Redis; we have a keyspace in Redis that can be used to schedule millisecond-granular periodic or one-shot tasks with a flexible "Chronic"-like specification syntax.

        Please open source this. Projects like these are some of the most valuable ones. It's a generalized solution to a very common problem. In fact, what you've described is one of the most elegant designs for this type of problem. It enables any program to schedule other programs just by using a standard Redis interface, for example. Almost every programming language already has a Redis library, meaning it's effectively zero additional work to use your system. Something like your project is sorely needed.

        If you want help with documentation, I'm happy to offer it.

        Seems like it's all downside and not much upside; it's an unambitious little C project that would mostly give people an opportunity to write blog posts about my C programming style. :)

        So was Unix, originally. Those who would deride elegant design due to the choice of language are both short-sighted and mean, and aren't worth worrying about. I know how hard that can be to deal with -- people's negativity tends to bother me a lot, too -- but the world needs more production-grade solutions to real problems. It sounds like your solution has been in use for some time now, and has proven itself effective in the field.

      • codewright 13 years ago

        My C is terrible, probably far worse than you perceive yours to be.

        I'm sorry you feel that way. I hope you change your mind, advancing years or no.

        I'm with Sillysaurus, this is a great solution to a common problem.

        Pawn off the code onto me and I'll publish it under my name if you're that worried about getting bad publicity. :P

        • tptacek 13 years ago

          I actually feel very good about my C code; C is my native language. I just feel very bad about the Internet. :)

          • Surio 13 years ago

            >> I actually feel very good about my C code [....] I just feel very bad about the Internet. :)

            I laughed first, then realised there is more than a ring of truth to that line :/

            If you do release (pseudonymously or otherwise), then release it and please announce to those of us, who are all genuinely interested in it in the first place :)

          • codewright 13 years ago

            Then ignore them and please publish.

  • thezilch 13 years ago

    We've utilized a similar setup using https://github.com/benliles/TxScheduling (Python, Twisted Scheduling) backed by a DB store.

zobzu 13 years ago

so... cron's a 200kb self contained C program, super simple, does one thing, and does it well.

Chronos is built on top of frameworks, needs a few services to run.

Certainly it has value, but marketing it as a cron replacement is wrong imo. Chronos is one of these "web-server-service-tool" thingie, but no, it's not cron.

  • bitwize 13 years ago

    Every time I come across one of these heavyweight, dependency-encrusted "replacements" for basic Unix tools, I'm reminded of L33tStart[0], a fictional, satirical init(1) replacement I wrote about on Reddit -- a sort of parody of systemd.

    Every day, in a bizarre manifestation of Poe's Law, L33tStart seems less and less a parody.

    [0] http://www.reddit.com/r/programming/comments/14ay0r/hacker_k...

  • rsync 13 years ago

    Ding!

    Complex, interdependent systems without slack. Now add production pressures. You have all the ingredients of a "Normal Accident"[1].

    You should do the opposite of this.

    [1] http://en.wikipedia.org/wiki/Normal_Accidents

  • jargonjustin 13 years ago

    It's solving a rather different problem than cron, but it's a problem that many people first attempt to use cron to solve. It's a replacement not in the sense of being a better cron, but as solving a problem for which cron is often used inappropriately.

eksith 13 years ago

This is almost like saying "A Replacement for Gravity".

The benefit having something so simple and singular in function with no dependencies is that there's so little to break. While I appreciate the need for having something with more capability, calling it a "replacement" is a bit facetious.

I wish them success and call it an airplane. I'll stick to my hang glider.

  • dice 13 years ago

    What the ops team isn't telling them (DevOps only goes so far) is that there's a cron job on each node which periodically bounces the Chronos processes (and its 50 dependencies) in order to mitigate memory leaks.

hendzen 13 years ago

Interesting to see that this was built on top of Apache Mesos, a dynamic cluster partitioning framework started by some researchers at Berkeley [1]. So far I've heard about Twitter (and now Airbnb) using Mesos in production. Is anyone else evaluating or using it currently?

[1] - http://incubator.apache.org/mesos/papers/nsdi_mesos.pdf

stfp 13 years ago

Looks great, but calling it a replacement for cron is like calling word a replacement for cat.

verelo 13 years ago

This is brilliant! I've had to do a lot of work creating queue based systems in recent years, and i really like off the shelf solutions to problems like this. Cron is a great tool, but its a mess once you have a bunch of machines and it really just does not scale.

  • bdunbar 13 years ago

    By itself it does not. If you have a bunch of machines, you also have a config system (right?) and they (puppet, chef and so on) have the means to manage cron tasks as a class.

    You still get cron, and the means to manage them across one's servers.

stiff 13 years ago

I really hope someone will succeed in rolling out a good and eventually popular replacement for Cron, because Cron completely sucks.

The crontab file has a weird syntax, that in some respects seems to look like a shell script, but isn't really one and some shell constructs work, other not. There are lots of ways you can make a mistake in writing the commands to be executed in a way that the command you intended won't run but will fail silently and you won't get any trace of an error happening. It is hard to even extract some common parts of commands and put them into a variable. I wasted hours and hours debugging weird cron errors. One case was where the crontab of one of the users was moved over to be the system crontab and strangely didn't work. Well, turns out the system wide crontab has one more field, but cron will not signal an error even in an obvious case like this, it will just fail silently (and this is the thing cron is really good at in my experience).

DannoHung 13 years ago

Awesome! If anyone's ever had to use Ctrl-M or AutoSys, you know well the supreme horridness of other solutions in this space.

philp 13 years ago

I know a couple guys that will be rather unhappy about the choice of moniker. http://www.getchronos.com/

solidsnack9000 13 years ago

Tools like this can make a lot of ops housekeeping stuff easier, like log roll ups and deleting old backups and so forth. One doesn't want to depend on particular machines being up; but on the other hand, there's not a 99.999999% uptime requirement.

3amOpsGuy 13 years ago

Always happy to see more tools like this - the bigger the toolbox, the easier our lives are.

Of the various approaches I've had to depend on in recent memory, from perl scripts querying a central Db right through to eye wateringly expensive Control-M or Autosys in larger envs, It's plain old cron, fronted by config management (cfengine, puppet, chef, salt - it doesn't matter which) that has proven most dependable, easiest to train others on and simplest to debug.

erikbern 13 years ago

We've built Luigi at Spotify to solve a lot of similar problems: https://github.com/spotify/luigi

Might be worth checking out if you are building large data flows. We probably run 10k Luigi "tasks" every day, of which the majority is Hadoop jobs. They are all organized in a large dependency graph expressed within Python, and you also get visualization, exception handling, atomic file operation, etc.

Zenst 13 years ago

Many people do much of the fancy stuff with system managment products (Tivoli or BMC patrol spring to mind from past experiences). With that they can do centralised monitoring of the server and services and software and automate responses to defined conditions allowing automatic responses with the right thought and alerting callout when exceptions to the rules occur and when things need looking at.

Now all that said, I'm not uptodate on that side of things and even less uptodate on open source alternative, though my quick look at this does indicate that it is a start in the right directon and can only get better. So quick look and glance over gave me a good gut feeling, which is always nice to have.

revscat 13 years ago

Anyone know why launchd hasn't taken off in this regard? I've used it on my personal systems with success, but rarely (if ever) see it mentioned as a cron replacement. Instead, you frequently see these solutions done from scratch.

  • andrewflnr 13 years ago

    Link for the ignorant and lazy, as I once was: http://en.wikipedia.org/wiki/Launchd

    Seems like a decent idea. According to Wikipedia, Ubuntu considered it in 2006 when they were looking for a new unit system, but didn't like that it was under Apple's own license, which was shortly thereafter changed to the Apache license.

    I'll remember it if I want to do something unconventional on a system.

  • SeoxyS 13 years ago

    Because writing property lists is even more annoying than writing crontab syntax.

    Seriously, plists suck.

rjurney 13 years ago

I've seen it in action, and it can schedule down to the milisecond across machines.

nemesisj 13 years ago

And....that blog is going offline in a few weeks. Thanks Posterous/Twitter!

  • benatkin 13 years ago

    pls...they used a custom domain and they have plenty of time to migrate to a new blogging service and point their DNS to it

sandGorgon 13 years ago

Im not sure what mesos brings to the table (possibly distributed dependency management), but for a powerful replacement for cron with dependency management, logging, etc. one could use systemd timers [1]

1. http://jason.the-graham.com/2013/03/06/how-to-use-systemd-ti...

philipcristiano 13 years ago

I was planning to do something similar after I learned about Mesos although I haven't found the time. At the moment I'm using a few processes with MySQL publishing with RabbitMQ to handle distribution and node failures. It doesn't have nearly as nice an interface but RabbitMQ isn't terribly difficult to manage. Thanks for open sourcing!

darose 13 years ago

Seems like Chronos covers a lot of the same functionality as Rundeck.

tszming 13 years ago

I am using a similar tool for job automation, history & alert, and it is called Jenkins :)

  • gzur 13 years ago

    Yeah, I'm amazed that people consistently run around reinventing the wheel, when there are 18-wheeled chariots out there free for the grabbing.

  • cowmix 13 years ago

    I second this. I runs 100s of jobs with Jenkins all day. Works awesome.

gnuvince 13 years ago

As far as Cron replacements go, I prefer whenjobs [1].

[1] http://people.redhat.com/~rjones/whenjobs/

dschiptsov 13 years ago

in Java?

Something is deeply wrong with this world.)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection