We decided to go for the big rewrite

tech.channable.com

167 points by duijf 6 years ago · 74 comments

Reader

Reading this article it seems like yet another example of "you don't have big data". Most of the features that are unique to Spark (or Spark-like setups) were not needed, so in the end it's mostly...just an app talking to Postgres?

I'm not sure, but reading other articles[0] on the blog seems like they've been jumping on bandwagons before, so it's probably good to come back on those decisions every now and again.

Edit: Not trying to come off as too snarky, though I've found that this type of thing is pretty common in startups where everyone from the CTO on down has some experience but not a lot of experience. I've fallen into that trap too, at some point saying "Sure, Scala will work great! It's future-proof and everyone will love it!" cue crickets

[0] https://tech.channable.com/posts/2017-02-24-how-we-secretly-...

duijfOP 6 years ago

> Reading this article it seems like yet another example of "you don't have big data".
Yes definitely. This thing was a big exercise in "You aren't going to need it".
> so in the end it's mostly...just an app talking to Postgres?
Yes. We solve problems for our customers. It's nice if we can do that without also creating problems for ourselves :)
> seems like they've been jumping on bandwagons before
Just wanted to point out that this system is also written in Haskell. We didn't really switch bandwagons.
- royjacobs 6 years ago
  
  >Just wanted to point out that this system is also written in Haskell. We didn't really switch bandwagons.
  Fair enough! In the end it's great that you've been able to take a step back and look at your ultimate goal (i.e. the customer problem solving). That doesn't always happen enough, and it's a good sign for your IT teams :) Also the fact that you were able to get the time to actually do this.
mumblemumble 6 years ago

There's that, but Spark is also a little bit tricky because it has such a big feature set that it's not just attractive as a big data tool. On paper, it can be attractive as a tool for easy single-node data parallelism, for easy streaming data processing, for easy machine learning on the Java platform, stuff like that. I'm looking at migrating off of Spark, too, and finding that Spark is still the only way to get a decently ergonomic (for developers) data table library on a Java language.
So there's all of that stuff, and then you think, "Oh, and it gives us an easy scaling path if we ever find our data volumes growing at an unexpectedly rapid clip." And at first you think it's just a cherry on top.
It may seem too good to be true at first, but with everyone using it, and with most the alternatives looking like they'll involve at least as much dev effort during the initial analysis, it's pretty easy to miss that.
And it's only a while later, when you're already pretty heavily invested in the ecosystem, that you start to really understand some things that the bloggers and book authors don't talk about in public. Like how Spark makes Big Data and scale-out a self-fulfilling prophecy. You will need to scale out, even if your data should fit in memory, because Spark wastes memory like a 22-year-old football player wastes money.
And maybe you're an appropriately cynical jerk, and can therefore spot that it couldn't possibly live up to the hype from a mile away. Congratulations. Hopefully you're the technical lead. Even if you are, though, too bad, because, nobody else in the meeting room is as jaded about tech as you are. Certainly not the management folks who don't program or have been out of the game for years. And, since you are cynical and jaded, that means there's still a good chance you'll go with Spark anyway. Because that's a hard argument to win - unless you've already been burned by Spark in the past, you aren't going to have enough intimate knowledge to make an argument that's concrete enough to sound convincing. And because this isn't the hill you want to die on. And because, being appropriately cynical, you realize that, at the end of the day, it's not really your problem. Spark will get the job done. It'll be inefficient, sure, but the extra server and development costs aren't coming out of your pocket, they're coming out of the pocket of the person who wants to be using Spark.
- pbourke 6 years ago
  
  Also, if you collaborate with data scientists and don't have the resources to use fully distinct systems for the production and research/reporting functions then Spark is very nearly the only game in town.
- mason55 6 years ago
  
  Curious what you recommend instead of Spark.
  - commandlinefan 6 years ago
    
    Almost 100% of the time when somebody says “curious what you recommend instead of _____” the best answer is “write a program”, but for some reason programmers aren’t supposed to do that any more.
  - mumblemumble 6 years ago
    
    That's a question that has no single answer. Spark is a giant omnibus project with all sorts of bells and whistles, and the viable alternatives are going to vary wildly, depending on which subset of them you actually need/use.
    If you want a single, simple answer that's easy to sell to management, I'd refer you back to the last paragraph of my original post.
  - claytonjy 6 years ago
    
    I'm not a user myself, but just from watching the chatter in the world it seems dask and rapids are quickly overtaking spark in mindshare, or at least enthusiasm, and that spark is following hadoop/mapreduce to the Big Data Graveyard.
nicoburns 6 years ago

I think it is, but it's a worthwhile point to reiterate. It's also worth pointing out that it can be worth doing a rewrite to a less scalable architecture if it simplifies the architecture and provides more flexibility.
The company I work is currently undergoing a similar rewrite from firebase to Postgres (although we're doing a gradual migration), and it's amazing how much code we have been able to throw away, and how much quicker we can move in the parts of the code that have been migrated.
fulafel 6 years ago

Trying Haskell if you happen to land a Haskell expert on your team sounds entirely reasonable. It's an old language by now and even used in the conservative FAANG companies. And if you're stuck on the JVM, so does checking out the competition to Java. Lots of people are happy with Clojure or Scala and don't look back.
macspoofing 6 years ago

>Not trying to come off as too snarky
I had a similar thought. Postgres is a great choice, but then they also went with Haskell ... I look forward to another blog post in 2-3 years detailed all the ways that Haskell failed them and that at the day they should have just gone with an industry standard language.
- samprotas 6 years ago
  
  https://tech.channable.com/posts/2017-02-24-how-we-secretly-...
  FWIW they have this post from almost 3 years ago about adding Haskell to their stack. I'm guessing the time for your prediction has come and gone.
  On the other hand, I find it quite encouraging that Haskell was barely mentioned. It seems they viewed the risk as "changing the projects language" rather than "using a non industry standard language".
kod 6 years ago

> "Sure, Scala will work great! It's future-proof and everyone will love it!"
Scala is still the least-bad option for a JVM language.
Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.
- nemothekid 6 years ago
  
  >Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.
  It’s funny that you gloss over one the big productivity issues with Scala. Somehow everyone who joins your team should be acutely aware of your flavor of Scala.
  If the community decided if wanted to be a better java or a worse Haskell then I bet more people would be productive in Scala
  - kod 6 years ago
    
    It's not "my" flavor of Scala, it's what it's designed for. It's a pragmatic, ML family language.
    Read Odersky's book, write code, still have access to all JVM libraries and a lot less brain damage.
    There are people that try to write worse Haskell in everything from Perl to Kotlin too. The reputation is overblown and has little to do with the actual language.
  - vips7L 6 years ago
    
    They also gloss over the huge productivity killers that are sbt and scalac. Compile times are almost as bad as C++ where I work. Sbt will be "Done compiling." and then hang for 15-20 minutes.
- royjacobs 6 years ago
  
  I would argue that in a sufficiently large organization you're better off using plain Java or perhaps Kotlin. The latter hits the sweet spot between 'expressive' and 'unreadable' whereas Scala can miss the mark.
  I'm sure if you are very disciplined when writing Scala then this will not happen, etc etc. But the fact that people need to decide upfront which parts of Scala to use and which ones to avoid seem like red flags to me (and in fact were red flags, in my experience).
- apta 6 years ago
  
  With Java getting value types, record types, and pattern matching it will end up giving all the other JVM languages a run for their money. Today, Kotlin is much more approachable than Scala, not to mention better tooling. Once Java catches up though, it will be a different story.
  - vips7L 6 years ago
    
    Lots of good things coming for the jdk. I'm extremely looking for project loom (fibers) to land.
    I expect kotlin to fall off everywhere except android once loom and the rest of project amber land.
  - mbo 6 years ago
    
    When are higher-kinded types roadmapped for Java?
    
    apta 6 years ago
    
    I'm not aware that they are. It's a nice feature, but it might introduce unnecessary indirection is used inappropriately, that and the vast majority of languages get by without it without any significant hindrance.
- julienfr112 6 years ago
  
  Yeah the team is productive. But only in the 20% left for real work between the interminable arguing over monads or circe or play or sbt or whatever.
  - kod 6 years ago
    
    Doing real work in Scala for a decade. Never argued about any of that.
- s_Hogg 6 years ago
  
  > Scala is still the least-bad option for a JVM language.
  Source, please?
  > Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.
  Why is that?

alexpotato 6 years ago

Rewriting an application can mean different things:

1. "We are going to start over from scratch and rewrite the whole thing!"

Joel Spolksy famously said to "Never do this!"

2. "We are going to slowly refactor the whole codebase."

This can, eventually, lead you to a place where none of the original code is there so it's like a rewrite but much simpler.

3. "We are going to slowly add new places to replace the old system till there is no new system."

This is called the "Strangler App" model as described by Martin Fowler (https://martinfowler.com/bliki/StranglerFigApplication.html)

Granted, for some reason, it seems that "I retired a legacy system and rolled out a brand new one" seems to look better on a resume than "I refactored a legacy system into a better system." so your YMMV.

steve_adams_86 6 years ago

I'm not sure about the resume part. My experience refactoring legacy stuff into more modern stuff is actually a big talking point for my resume in my experience. People are always interested in why I did it, how I did it, why didn't I just build anew, etc.
Maybe that's because most jobs I apply for tend to have at least one 'legacy' system laying around that needs some help. But this is somewhat common outside of the startup scene.
Legacy in this case means an app that's maybe 7-10 years old, often written like everything happened on the back of a napkin. Sprawling, inconsistent, monolithic stuff. But not true legacy where it's written in Fortran or something in the 1500s.
At any rate, I think it's worth having refactors on your resume. I know I'd be interested in people who have done it - it's always such an educational experience.
- alexpotato 6 years ago
  
  Should have been more specific as to which industry I was referring to regarding the resume comment.
  I worked for a big bank for 5 years supporting large trading systems. After the 3rd attempt to retire a legacy trading system by rolling out a new system, someone pointed out that if you are a mid level manager you don't get any resume points for refactoring old systems.
  Hence, you keep trying to roll out new systems even if those systems have bugs like "switch all buy orders to sells" (true story).
collyw 6 years ago

I guess if you are replacing a system part by part as opposed to big bang rewrite, you will be more likely do make sure interfaces are well defined and components are more modular.

gtsteve 6 years ago

This doesn't really sound like a big-bang rewrite as such but an incremental development process. The sort of rewrite that would be a major strategic mistake is where you start with a new repository and begin re-implementing the entire product, but this seems all perfectly sensible to me.

This just sounds like the sort of incremental Ship of Theseus [0] development that many of us are doing. The product I'm working on has had enough key internal portions rewritten over a long enough time (including interestingly the job management system) that you could say it's a rewrite compared to the product from 2 years ago.

[0] https://en.wikipedia.org/wiki/Ship_of_Theseus

barking 6 years ago

I'd like to see some idea of what they mean by big. An order of magnitude in terms of lines of code and the number of engineers they had for the job and how long it all took versus how long the original took to write etc. One of the things that scares me about something like this are all the seemingly illogical bits of code that were added over time and are actually that way for a reason because they fix some edge case or whatever. Hard to see those not getting swept away in a rewrite as well as a whole bunch of new rarely occurring issues added. A complete rewrite strikes me as something that's almost guaranteed to be buggier than what it replaces, at least for awhile.
- mgkimsal 6 years ago
  
  > A complete rewrite strikes me as something that's almost guaranteed to be buggy
  Likely buggy, but buggy in different ways. The real test will be in measuring the number and quality and severity of bugs, as well as the ability to identify and repair them in both systems. A testable system with good test code written may still be 'buggy', but will help foster a sense of confidence when making the fixes required.
- LandR 6 years ago
  
  >I'd like to see some idea of what they mean by big. An order of magnitude in terms of lines of code and the number of engineers they had for the job and how long it all took versus how long the original took to write etc.
  Yeah, at my last place before I left, they were looking to do a rewrite of their main project that had been the result of years of hundreds of developers working on it. The time-scale was wooly, but was expected to be between 5 and 10 years and probably require ~200 developers and testers.
  This was rewriting millions of lines of code, very little would be reusable.
  I don't know if they went ahead with it or not, but it was looking like it would descend into chaos.
  - eej71 6 years ago
    
    What prompted everyone to want to rewrite so much code?
    
    LandR 6 years ago
    
    They were on a system where if bugs existed past a certain date of them being logged, their was financial penalties that had to be paid.
    Also adding new features was incredibly slow, painful and dangerous. Even a minor change resulted in so much manual regression needing to be done. It was absurd.
baud147258 6 years ago

The company where I work is doing (and is close to finishing) a big rewrite of one the product. The base version is deployed on-prem at our clients, the new one will be cloud service, with the same features. But the on-prem version is still worked on and supported (I'm working on it) alongside the cloud one, the cloud one is to offer a less cumbersome alternative.
- barking 6 years ago
  
  Is the on-prem one Windows based? Also by less cumbersome do you mean the cloud one has fewer features or is just easier to get up and running with it because it's a browser based. Are the company expecting your on-prem one to fade away over time etc? Could you give any information of what each was written in and an order of magnitude, hundres of thousands of loc etc. Sorry for all the questions but I'm nosey! No problem if you can't.
  - baud147258 6 years ago
    
    The on-prem is windows-based, we're delivering new version as msi. The goal of the cloud version is be easier to get up and running, since there's no installation steps and (I guess) less configuration to do. The company might want the on-prem to fade, but it won't happen in the short term since we still have clients. The on-prem is 15-20 years old, but not always actively worked on and with ~1M Loc (including sql scripts, setup, config files, but not the test suite). The cloud project started last year and I think there's already clients using it, but I don't have access to the code, so I have no idea for the loc.
    
    barking 6 years ago
    
    Thanks for the info. I am involved with a Windows based product and we've thought now and again about doing something like this. We have a competitor that is browser and cloud based but the product seems to be a lot less featured. It feels like they are quite limited in what locally connected hardware (cameras, modems, card readers, barcode readers, printers and scanners) they can interact as well as seamless bridges to other locally installed software. With desktop software this is fairly straightforward but I don't know if this is/was a problem for cloud software.
    
    baud147258 6 years ago
    
    I'm afraid I wasn't clear, it's installed on Windows server and user access it via Internet Explorer (because of ActiveX, in theory we'll do the switch for Chrome soon). To interact with card readers, cameras and printers, we're using a bunch of extension, including a java applet (!) that we'll have to migrate.

Darkstryder 6 years ago

> Prematurely designing systems “for scale” is just another instance of premature optimization

> Examples abound: (...) using a distributed database when Postgres would do

This is the only part of the article that bugged me a little, because in my experience the choice between single-machine and distributed databases is not so much about scale as it is about availability and avoiding a single point of failure.

Even if your database server is fairly stable (a VM in a robust cloud for instance), if you use Postgres or MySQL and you need to upgrade to a newer version of the database (let say for an urgent security update), you have no choice but to completely stop the service for a few seconds / minutes (assuming the service cannot work without its database).

Depending on the service and its users, this mandatory down-time might or might not be acceptable.

Anecdotally I suspect services requiring high SLAs are more common than ones requiring petabyte scale storage.

duijfOP 6 years ago

Re availability: We had a hard time keeping the system based on Spark available. There were days when the cluster would freak out multiple times in a single day. The 'fix' would be: restart a bunch of spark workers. We spent a lot of time debugging/finding this out (some parts documented in [1]) but couldn't work out what the problem was. (EDIT: Assuming there even was a single problem.)
In this particular case, I'd take the single point of failure over the previous situation.
That being said: we have successfully used PostgreSQL's fail-overs multiple times. In my experience, they work quite alright.
[1]: https://tech.channable.com/posts/2018-04-10-debugging-a-long...
- Darkstryder 6 years ago
  
  Yeah, I agree. It was more of a general comment, because you seem to have one Postgres instance for every client, which is already a big step against SPOF.
  At $previous_job we had a "one service" = "one MySQL instance" policy. Every time a MySQL server would go down all clients would all lose access to that service at the same time. It was stressful and much less robust than your setup.
devonkim 6 years ago

High availability Postgres setups are a minimum for a production system and a staging system to understand how your system behaves during a failure event. These failure scenarios should be tested not necessarily on every commit but often enough that there’s confidence during a failover you’re not going to drop queries on the floor and pretend it’s all good as well as your monitoring systems report on the event for the sake of event reporting.
- Darkstryder 6 years ago
  
  Yeah. I guess after having been bitten myself a few times with failed MySQL failovers and especially after having read the GitHub October 2018 incident postmortem [1], I stopped considering failover solutions as a reliable availability solution altogether.
  However this is just a personal opinion that I might revisit at some point.
  [1] https://github.blog/2018-10-30-oct21-post-incident-analysis/
  - devonkim 6 years ago
    
    High availability setups are absolutely required to upgrade / patch running databases as well without significant downtime. The engineering and business costs in time to try to work around these issues are from the 90s and have no place in a modern business environment. Heck, they figured out HA decades before then in commercial, proprietary DBs. Things are much more reliable now with OSS tools than even 4 years ago to the extent few talk about it anymore. There are definitely mistakes and bugs possible but the number of _successful_ failover and failback events must be considered in the calculus.
    Upgrades aren’t to be taken lightly of course but again, it’s now a cost of doing business and a reality that we need to engineer properly for.
mannykannot 6 years ago

I had the same reaction... this statement seems to be an over-generalization, but it can be resolved by being careful about what 'premature' means. In this case, "we can trivially shard datasets of different projects over different servers, because they are all independent of one another", so it seems the scaling issue had a solution from the outset.
Warnings about the risks of premature optimization should not stop people thinking about issues that are likely to arise in future, and what they might do about them. On the other hand, this does not mean that you should necessarily implement that solution now.
StreamBright 6 years ago

You can use a replication setup for Postgres which gives you a pretty good availability.
Also, Spark has some single point of failures that are not obvious at first.
https://gist.github.com/aseigneurin/3af6b228490a8deab519c6ae...
smilliken 6 years ago

Downtime is not mandatory while upgrading to a new PostgreSQL version. I've upgraded from 9.0 to 11 and most versions in between without downtime on large busy databases.
There's not a single approach that works for every case, but they all involve a replica and upgrading one at a time.
wvenable 6 years ago

You don't need a distributed database to have replication and a hot backup. Stackoverflow runs this kind of configuration -- if it works for them, it can work for you.
nicoburns 6 years ago

> Depending on the service and its users, this mandatory down-time might or might not be acceptable.
True. But if it is acceptable, then it may well be a good trade-off to make.
- Darkstryder 6 years ago
  
  I agree completely. But it is a trade-off you need to be aware of, and need to be confident it will still be true in the future.
  I've had situations where a client wanted a 99.95% availability SLA for our SaaS service instead of 99.5% as we were providing at the time.
  He thought it was a minor change on our side, but it required completely rethinking our architecture from the ground up.
  - nicoburns 6 years ago
    
    Yeah, makes sense. Conversely, our SLA is only only 98%! (we actually have much better uptime than that and have only had about ~1 or 2 hours of downtime in the last 6 month - but we have plenty of wiggle room if we need it)

jermaustin1 6 years ago

This is a story of one of my first product launches. And inevitable rewrites that ensued. You can read more of it here: https://jeremyaboyd.micro.blog/2016/11/05/my-first-product.h...

Years ago I was building SEO software. One of the products was originally written as an internal tool, and handled our work load without skipping a beat. Then we decided to release it to the public, so I did a small refactor to implement accounts. We launched with it on hosted on a small Dell PC under my desk (where it had been running as our internal tool). Within 2 hours of launch, it was completely overwhelmed and shutting down due to overheating.

It was "rewrite" time.

While doing that, I had to come up with SOME work around. So I opened the case and stuck a box fan on it to try and exhaust some of the heat. That lasted about 8 more hours. Before the server shut down, and I got a call from the boss.

I went in to the office in the middle of the night, and started profiling the application. I found a VPS host, quickly spun up their largest Windows VM they offered, and that helped for a few days while I rewrote large swaths of the application. Even after a ~80% rewrite and splitting the application in two, we had more users that we'd ever anticipated and I was out of my depths with scaling. So we got a few (much larger) physical servers at Softlayer.

This was the setup that this website ran under for the next couple of years with minor tweaks, more space with an iSCSI array, more RAM, migrating to a more CPU, etc, but all staying at Softlayer. Eventually when the hosting bill was getting into the high four figures a month, we reevaluated and decided a rewrite was in order to switch it to Microsoft Azure utilizing Azure SQL, Azure Table Storage, Azure Queue Service, and offloading all of the complicated tasks from the web server onto the Azure infrastructure. For all I know it is still on Azure.

goto11 6 years ago

Is the current system is so badly architected that it cannot be refactored gradually or rewritten piecemeal? Then the forces which caused these problems will also be in effect during the rewrite, so it will end in the same place when it reaches feature parity.

I can only think of a few places where a full rewrite is justified:

* You lost the source code

* The application is almost purely integration with some 3'rd party platform or component, and you need to replace that platform. (E.g. you are developing a registry cleaner and need to port it to Mac)

* You don't have any customers or users yet and time-to-market is not a concern.

* You are not a business and are writing the code purely for your own enjoyment

But these are business level considerations. For individual developers there may be compelling reasons to push for a rewrite:

* You find it more fun to work on green-field projects than to perform maintenance development.

* The new platform is more exciting or looks better on the CV than the old

ryanelfman 6 years ago

That's not necessarily true. Those forces will still be there but the learning of what not to do has also been had. People can improve and learn over time.
- goto11 6 years ago
  
  Yes but if you have improved over time, then at least the most recently developed parts of the code would be high-quality and would not need rewriting. So you would only need to rewrite some encapsulated legacy parts of the codebase, which is completely different from a full rewrite.

mark_l_watson 6 years ago

I have also found Spark (and Hadoop before that) a little clunky to prototype and develop on, but when you need to handle very large data sets with good throughout performance then systems like Spark/Hadoop are great. One problem they had was maintaining infrastructure, and to be honest, when I used mapreduce as a contractor at Google or AWS Elastic MapReduce as a consultant I didn’t have to deal too much with infrastructure.

Anyway, it makes sense that they backed off using Spark and HDFS - makes sense given the size of their datasets.

The original poster mentioned that their data analytics software is written in Haskell. I would like to see a write up on that.

EDIT: I see that they do have two articles on their blog on their use of Haskell.

z3t4 6 years ago

"A creature from another dimension would see us as noise as our atoms are constantly being replaced"

Don't wait for a big rewrite. Constantly keep deleting and rewriting. Just make sure you are solving real problems while doing it.

barking 6 years ago

This feels like a far safer path to take

lifeisstillgood 6 years ago

>>> One of our main reasons for choosing Apache Spark had been its ability to handle very large datasets (larger than what you can fit into memory on a single node) and its ability to distribute computations over a whole cluster of machines ... We cannot fit all of our datasets in memory on one node, but that is also not necessary, since we can trivially shard datasets of different projects over different servers, because they are all independent of one another.

So this seems to be the massive takeaway - if you need to operate on a whole dataset that is larger than one node's memory capacity then you have to go distributed. Else it still seems an overhead barely worth the effort.

So Google: dataset is all web pages on the internet - yes that's too large go distributed.

Tesco / Walmart : dataset might be all the sales for a year. Probably too large. But could you do with sales per week? per day?

having the raw data of all your transactions etc lying around waiting for your spiffo business query sounds good but ... is it?

I would be interested in hearing folks' cut-off points for going full Big Data vs "we don't really need this"

jimbokun 6 years ago

We decided to go for a Big Rewrite for a completely different reason. The initial license for the proprietary NoSQL database we had negotiated was about to expire, and the company was going to charge us an order of magnitude more to renew.

So we immediately set out redesigning our system to use other, fully open source technologies. Also gave us an opportunity to reconsider architecture decisions that had not scaled well. In our case, moving from a monolith to microservices has had major benefits. Maybe the biggest being able to quickly see which microservice is the bottleneck and needs to be scaled up to handle the load. With the monolith, if it got slow, it was very difficult to figure out which part of the workload was making it slow.

bluedino 6 years ago

Our company runs on a pile of VBA/Access. At least it talks to a MariadDB server on Linux.

The biggest problems are trying to run/develop this code on machines that were made in the last ten years, the other is that it's a horrible, horrible codebase. Code practices from the early 90's.

To make things worse, objects are 'evil', all HTML/SQL/XML is built by appending strings, there's no data sanity checks.....

I started a proof of concept replacement system that was written in Python and ran on the web.

It was met with "We can go with a web based system, since if anything changes in the browser we'll be up shit creek."

:-/

viraptor 6 years ago

Ah... The fine line between "better the devil you know" and "I don't understand the options so every change is too scary".

sandGorgon 6 years ago

in case you are using PySpark - a good framework to move to is Dask. https://docs.dask.org/en/latest/

it is also natively integrated with K8s - https://github.com/dask/dask-kubernetes - and Yarn https://yarn.dask.org/en/latest/

roland35 6 years ago

I am currently evaluating if we should rewrite a large chunk of our embedded controller which handles motion control right now, so this is a timely write-up! I think the lessons here are the same in embedded code - we can keep the existing black box (consultant written!) code for now while the new motion control code is written in a parallel branch. The more modular the project the easier this is luckily!

sebastianconcpt 6 years ago

This begs the question: in which situations is it appropriate to decide on a full rewrite?

In theory, there is an easy answer to this question: If the cost of the rewrite, in terms of money, time, and opportunity cost, is less than the cost of fixing the issues with the old system, then one should go for the rewrite.

In our case, the answer to all of these questions was yes.

One of our original mistakes (back in 2014) had been that we had tried to “future-proof” our system by trying to predict our future requirements. One of our main reasons for choosing Apache Spark had been its ability to handle very large datasets (larger than what you can fit into memory on a single node) and its ability to distribute computations over a whole cluster of machines4. At the time, we did not have any datasets that were this large. In fact, 5 years later, we still do not. Our datasets have grown by a lot for sure, both in size and quantity, but we can still easily fit each individual dataset into memory on a single node, and this is unlikely to change any time soon5. We cannot fit all of our datasets in memory on one node, but that is also not necessary, since we can trivially shard datasets of different projects over different servers, because they are all independent of one another.

With hindsight, it seems obvious that divining future requirements is a fool’s errand. Prematurely designing systems “for scale” is just another instance of premature optimization, which many development teams seem to run into at one point or another

AzzieElbab 6 years ago

This article is insane. I wouldn't even know how to begin configuring HDFS/Spark cluster for 10Gb of data.

apta 6 years ago

Next up: why we decided to re-write our Haskell re-write in $HYPED_LANGUAGE.

duijfOP 6 years ago

Rust looks pretty cool.

Settings

We decided to go for the big rewrite

Keyboard Shortcuts