Settings

Theme

Don't Assume Your Data Scientist Is a Software Engineer. A Thread:

twitter.com

43 points by dynamicwebpaige 8 years ago · 16 comments

Reader

Xcelerate 8 years ago

I started working as a data scientist 9 months ago, coming primarily from a research background. I had never heard of tools like Docker or Airflow in grad school. After reading about them though, their value to a small but growing team of data scientists was quite apparent, so our team took some time to learn how to use them. We now have a reproducible, versioned workflow that removes a lot of headaches that previously existed.

I don’t think it’s too much to expect data scientists to quickly learn some DevOps skills, as long as you can motivate the value for using them.

bllguo 8 years ago

if you really need your data scientist to know these things then just invest some time in training. So many skills can be quickly picked up on the job, at least to a "good enough" level, yet people insist on looking for unicorns that appear to tick all the unrealistic checkboxes

  • smallnamespace 8 years ago

    This is one reason I ask brain teasers in interviews.

    I don't just care about what you know now, I need to know your willingness and ability to think on your feet when confronted with a seemingly random puzzle and actually persevere towards an answer if necessary.

    Not everything will fit neatly into the box of tools that were previously learned.

    • yodon 8 years ago

      Be careful about focusing on brain teasers during interview sessions. In the early days at Microsoft they focused heavily on brain teaser interviews, and hired what was quickly seen to be an incredibly smart bunch of devs BUT much later seen to be a highly concentrated monoculture that over indexed on one set of skills at the expense of bringing in team members with other strengths. It took a lot of work to break away from that monoculture driving interview style and get the full set of skills and capabilities the company actually needed.

      Not all smart people play chess, or know every line of Dr Who dialog, or are Makers, or like brain teasers.

      • smallnamespace 8 years ago

        Fully agree with that, brain teasers should definitely not be the dominant component of any interview process.

        That said, I've run into interviewees who have simply refused to fully engage with a brain teaser -- like they just shut down and gave up, even after I provided hints. For me, that seems to be a signal that they're not likely to deal well with unexpected puzzles that appear in the normal course of engineering work.

        • yodon 8 years ago

          It may also be they dislike the artificiality of the question style, particularly if they look like “find the twist hidden deep in the description because this isn’t a normal real world problem” sort of questions.

justherefortart 8 years ago

Lmfao, why would a data scientist need to know TCP/IP, Server Setup, SOAP/REST Web Services, SDLC, etc?

Sounds like someone looked up a list of IT stuff you should know and applied it to data scientists randomly. In fact, most of those things in the list may or may not apply to a "software engineer".

  • clintonb 8 years ago

    I agree that this knowledge is not necessary, but it could be useful for certain scenarios.

    TCP/IP: networking between cluster nodes Server setup: deploy a map-reduce cluster SOAP/REST: read/write data from services Software development life cycle: plan/deploy a reporting system for end users

wohlergehen 8 years ago

I agree that there is a big issue in the field w.r.t. "unknown unknowns", where more effort needs to be put into making useful knowledge available. However, I do not think that many of these technologies are hard for someone who understands data science, at least at the level neccessary to use them. Doing productive developement in these more systems or CS focused topics is a wholly different topic though...

  • ztjio 8 years ago

    There is no such implication. In fact she specifically implies otherwise. It's just a matter of not making assumptions of specific knowledge.

thisisit 8 years ago

The classic problem of software engineering. Talking about how your specialist doesn't know other stuff. Then durinng interviews lamenting the fact that while you are getting well rounded generalists they are not up to par.

He/she knows SOAP/REST but that unaware of that NN model.

A human can only retain so much. Invest in a team which has it's own specializations.

cdancette 8 years ago

Data scientist have a variety of background: CS, applied mathematics, pure mathematics..

I don't think a data scientist need to know all that stuff to be good at his job

  • rcoveson 8 years ago

    I imagine what prompted this thread was the growing tendency of software companies to hire for "Data Scientist" positions and imagine that what they'll be getting is analogous to a Database or Distributed Computing specialist--someone who has a strong software engineering background plus deep knowledge of their specialty.

  • calt 8 years ago

    Yes. That's the point. They don't know them, and they can still be productive. However, if you require the knowledge it can be taught and you might have to help teach it.

jinonoel 8 years ago

After looking at the stuff listed, no worries. Most software engineers don’t know all these either

kapauldo 8 years ago

Don't assume your data scientist is a scientist. It's a made up non-credentialed title.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection