Ask HN: Is it still worthy to learn Lisp nowadays for data mining tasks
Seeking advice whether I should continue investing more time on Lisp.
My background: switched major to CS in graduate school, familiar with Python, and need to deal with JSON and csv-like data every day. Python has been handy doing the job, but I just started learning Lisp for fun. Though the language looks powerful, for many simple tasks I have to start from scratch while Python usually has convenient libraries and can do things in one line.
My friends say Lisp is too old and not suitable for data mining task, no one is actually using that for work, and they all prefer Python. But I leave their words to doubt as they are not familiar with Lisp. (In our generation, people don't seem to learn functional programming any more?)
Is it still worthy to invest time learning Lisp, for data mining purposes? Any advice is welcome. Thanks :-) I'm torn in how to respond. As usual, I think the answer is "it depends". A few years ago, i got bored/frustrated with using SAS for really specific custom built work/research that also worked at large scales and went searching for a replacement. For my uses, I wanted it to be: Compiled, fast but also flexible, free, have functions, macros, and an object system. (and I wanted to learn something from it even if i ended up not using it). I settled on Common Lisp. The rest of the world seems to be python/R these days. I really do like common lisp, lisp is easily my favorite language so far, but there are some realities which, two years on or so, which i feel able and qualified to share: Some observations on Common Lisp: Cons: -its a big language. I still feel like i don't get all of it, but that doesn't necessarily matter. -its not batteries included. That means you're basically going to be coding ALOT yourself. And that requires a lot of work/knowledge about what you're doing. And if you don't understand good data structures/compsci fundamentals, you aren't going to beat the implementations that already exist in other languages. I want to dismiss this because I'm generally trying to write new stuff that doesn't exist anyway and I want low/high level access simultaneously for performance reasons, but it seems there's always supporting libraries/infrastructure underlying some of your work that you didn't really think of that don't exist sufficiently now. THIS IS REALLY THE BIG NUMBER 1 STRIKE AGAINST THE LANGUAGE. Do not underestimate how much you'll be doing if you think you can just implement techniques that have already had X years of work being implemented in other languages. -It is not dominant, or even widely known any more. Workmates/friends will ignore you for writing in it. Your work probably won't let you use/install it... -You'll resent other languages if you successfully learn it. Pros: -of course, i find current libraries in languages don't do what i want, so i often find i have to rewrite things anyway. - Its great for solo, exploratory work or work that doesn't exist yet. - Its fun/liberating to code in. - It will beat the absolute pants off of Python/R performance wise if you ever get it up and running and if that's important to you. It is to me. But you can only implement said performance if you know what you're doing. - SBCL kind of gives you the best of both the dynamic/static compiled/interpreted worlds. - I found it really does open up your eyes to a lot of compsci-theory aspects other things gloss over. Of course, by glossing those aspects over, python/R can make your job a whole lot easier if they aren't important... Python is actually pretty cool, but its also pretty slow. I really do prefer lisp, but most of the world prefers algol-esque syntax. I think its really up and coming in the machine learning/stats world. R: Is liberating coming from SAS. Has huge stats community backing and huge number of stats packages. Coming from LISP however, its the horribly disfigured plastic-surgery older-hollywood nightmare of a beautiful starlet you remember from your youth, but has now carved up its face to try to look like the other young starlets :P (haha, only serious). Which is to say, its got enough lisp influence behind it to seem familiar, but its a horribly designed/implemented language coming from a programmer background... More accurately though, if Python is slow, then R is SLOOOOOOOOOOOOOOOOOW. Really, its painful to type things at the R repl after coming from common lisp. It has a bit of a cult-like following amongst stats people though... Hope i've offered at least a little bit of valued feedback...YMMV Thanks for sharing your valuable experience! Yes I have the same feeling (for now), that at many times I have to implement things from scratch using Common Lisp, and that pushes me to understand both the essence of the algorithm and how computation works. As I am still in the early stage of learning Lisp and just switched to CS, I am learning many mind-blowing things at the same time (SBCL+Emacs+Slime), this combination sum up to create a quite steep learning curve, and yes they are just time black holes, I have invested more time than expected every day in them and sometimes have to use Python to get the job done first to make my advisor happy, and then try rewrite in Lisp in leisure time. Maybe it can be good for me to learn a bit of Lisp every day, because I don't expect to understand all the ideas in the SICP book and remember all rules in the Practical Common Lisp book in 21 days. As I am doing some research, Python is indeed effective in exploring data/testing preliminary models. But most recently as I just experienced, it gets unbearably slow sometimes. (By unbearably I mean >15 minutes on PC.) Performance is also important for me, not only for the sake of "big data", but also for self-fulfillment: feels much better if the code gives result in 10 secs rather than several days. (By the way, does CL have mature support in parallelization?) My lab mates also use Mathematica a lot and it is even more convenient in testing models for data mining tasks, but I have been warned that some packages are buggy at this moment, and most modules are just black boxes that don't allow fine-grained tuning. I also tried R and at most times don't quite understand the logic of the language. Maybe as you pointed out, it is the gene of the language that determines its shape. Thanks again for your response, it is good to see someone doing real stuff with Lisp for data mining tasks nowadays. If you're going to try to do practical things in Common Lisp, make sure you get well acquainted with quicklisp, for easy library installation, and http://www.cliki.net/ for finding out about half of those libraries. I say you have/will want to do a lot yourself, but you don't have to do it all. Be aware that SICP is written for Scheme. They're both lisp dialects, but they are different. I've done bits of it here and there in Common Lisp, but the two variable namespaces, no first class continuations, different function names, less emphasis on recursion (due to specific iteration forms/macros in Common lisp) and native efficient data structures offered by common lisp might make some things non-applicable or difficult even if you're already familiar with the language. Tail call optimization is, i believe, not required in the Common Lisp standard, but I believe SBCL performs it, at least above certain low optimization declarations. I've only just started looking at multi-threading in Common Lisp myself. I imagine the proper answer about "mature" parellelization is no, relative to some other languages (i have no real basis for this, just side-knowledge that other languages focus on such issues more. Hell, its probably better than R.). Multi-threading is not established in the language specification, but in practice, implementations have their own version of it, including SBCL, which is the version I'm sticking to for numerical work. If you're looking into it, I believe you might also get acquainted with Bordeaux Threads. Once you've got quicklisp installed, you can install that easily from there :P That being said, when you're performing something 10-500 times faster than the interpreted memory-hungry mess that is R for example, the complication of multi-threading can fall down the priority list... Totally independent of my decision to try to the language, i later found this paper,which suggests that one of the authors of R is well aware of and in agreement with some of my own observations, plus there's a benchmark in there of common lisp, R, and python, though take all such microbenchmarks with a grain of salt, because you will be spending more programmer time in Common Lisp: https://www.stat.auckland.ac.nz/~ihaka/downloads/Compstat-20... Unlike some language zealots, i feel its my responsibility to remind you this is the road less traveled. I don't think its the way industry/academia is moving. I think there is, perhaps arrogantly, a selection bias in who becomes a Common Lisp programmer, such that inquisitive/smart/alternatively-thinking people tend to be attracted to the language, but if you think it will magically makes you 100 times better than you already are because some other smart guy uses it, put that idea right out of your head. But also honestly, I've found the experience extremely rewarding, and I code my own projects in Common lisp now where I can. I came in the top 20 of http://www.kaggle.com/c/facebook-recruiting-iii-keyword-extr... by literally coding something up from scratch in common lisp, and my model had easy room for improvement and was among the fastest, so you can most certainly compete writing in lisp. Now, aside from my eternal side-project, I'm trying to put something together in the field of data linking in Common Lisp. I think its one of the only options that will let me combine the flexibility of dynamic languages with the performance of C...(i'm actually hoping to perform better than the US Census Bureau C program)...I already know it blows the current python/R options out of the water, but time will tell whether I can match it with C performance-wise... Sure. LISP is always worth the time, even if you cannot use it at work. Technically still far superior to everything else out there. BugSense (now part of Splunk) uses Lisp for quick analytics. http://highscalability.com/blog/2012/11/26/bigdata-using-erl... Which dialect of Lisp are you studying? Common Lisp? Yes I am trying to learn to use SBCL+Emacs+Slime, using the book "Practical Common Lisp" and SICP