Ask HN: Appropriate platform for a data crunching app
I'm in the process of building a social Bayesian suggestion engine. I've coded out a large portion of it from scratch in PHP following an MVC design pattern, but I'm starting to wonder if this is the right environment for doing some Bayesian inference. Will good ol' MySQL work on a grand scale?
Reading about the obstacles Hunch has encountered has made me reconsider PHP, but that's what I know best. I'm fairly proficient in Python, so it wouldn't be such a leap to use Django, but I'd prefer not to if there's not a big performance difference. I've looked at MongoDB and it looks like I could replace MySQL without requiring me to rewrite too much of my code base. Node.js looks pretty exciting, but I know very little about it's proper application or whether it's stable enough.
I'm up for learning to work in a new environment if there's something that's just _right_ for the job. Any suggestions? Python is generally faster than PHP, particularly if you're not using APC or Zend. But the real benefit of using Python is the vast amount of software out there to do so many things. For example, PyBrain (http://pybrain.org/) may be right up your alley. The book "Programming Collective Intelligence" is all Python-based, and covers a wide range of algorithms including classifiers and so on, some of which use SQL backends. I thought the book was awesome, practical and inspiring, but I'm not a hardcore AI person. MySql vs NoSql: you may find that sql is totally adequate for what you're trying to do. You may be able to factor the data access out to a data layer such that you can switch the underlying storage without too much work. If your goal is to just get it finished, go with what you know. If your goal is to grow and/or earn street cred, leave your comfort zone asap! I've definitely been trying to abstract all my models out so that I can swap out a new DAL if needed. I've got a pretty extensive SQL class going, but the app still has some hard coded SQL queries in it. I'm not too sure I'd be enticed to use lots of python modules. I think this may be a programming character flaw, but I am averse to using pre-built stuff in my code, otherwise I'd be using CakePHP or Codeigniter instead of my own framework right now. I always want to build everything from scratch. If you're committed to the style points that you get from writing good software, seriously consider starting to do that in a language other than PHP. But aside from that issue, consider this: You have probably create quite a war chest of software you've created for rocking and rolling problems, in PHP. Python's package structure makes it a lot easier to re-use software than it is in PHP. There are some amazing micro-frameworks out there, that let you use other people's code, just the parts you need, without any bloat. But even if you ignore what other people have written, and want to write it all yourself, Python is a way better place to be. Using WSGI you ahve all kinds of options for ultra-fast, lean and mean hosting of code that you write for frameworks. But things like Flask may appeal to you: http://flask.pocoo.org/ It's a collection of a handful of tiny things that together form an alternative to something like Django. Considering you've eschewed CakePHP, Flask may interest you greatly. There are actually a number of micro frameworks like this. Playing with, reading the code, or using some of these things may be fun for you. Edit: "FLASK" not FLASH. Oops.