What is a "real" data scientist? (credentials to be distinguished from "fake" DS: Excel/SQL junkie, BIE or data analyst)

Google ΜLE
Oct 13 36 Comments

One who has mastered knowledge of...

58 VOTES MULTIPLE SELECTIONS ALLOWED
VOTE VIEW RESULT

comments

Want to comment? LOG IN or SIGN UP
TOP 36 Comments
  • Cisco 🍦Pied
    Simple.

    Data scientists come up with formulas to represent an environment of data. They know how to fine tune the model and know how to respond to unfamiliar results.

    BI simply plug data into variables and report it to a BU.

    Source: former data scientist
    Oct 13 2
    • Oracle numbaz
      OP is asking what it takes to create that representation
      Oct 13
    • Cisco 🍦Pied
      No they’re not, they’re asking where the separation line is and I clearly presented it with the assumption that a true data scientist knows how to apply the mathematical concepts in the survey to this framework.

      Picking one of the responses is an insufficient response to the underlying question.
      Oct 13
  • E*Trade / Finance cbEV72
    You need hard science PhD and python. That’s all. The rest is bs
    Oct 13 12
    • Google ΜLE
      OP
      We don't usually ask such implementation-heavy and algorithm-specific tasks, although for an "equivalent" level dynamic programming question, one would typically have around half an hour. In this line of thought, it is similar to the probability brainteasers for quant/trading jobs to test quick thinking under pressure.
      Oct 13
    • E*Trade / Finance cbEV72
      That’s interesting. There’s no way someone can come up with a solution in 30mins unless they did a similar problem recently.

      I hate brain teaser actually precisely because they’re biased towards people who train in them. They’re all out there for folks to prepare thus they lose their actual teasing quality
      Oct 13
    • Google ΜLE
      OP
      I'd estimate the top programming contest participants have implemented convex hull at least a dozen or so times, and can type it out bug-free under 10 minutes. This is a relatively easy problem in terms of actual thinking involved.
      Oct 13
    • E*Trade / Finance cbEV72
      I actually was going to say that a problem is similar to school competition type. So it’s just a matter of preparation then. I know what’s convex hull but never had to code one. I’d be in disadvantageous situation. On the other hand, maybe that’s exactly what you need: someone who did code it in the past
      Oct 13
    • Google ΜLE
      OP
      Yep, they are all closed (solved) problems. It's a quick and dirty sports competition.
      Oct 13
  • Amazon / Data JohnnyUta
    God I hate you
    Oct 13 1
    • Google ΜLE
      OP
      Why? Also, I am not God.
      Oct 13
  • Microsoft 0eo39dk2o
    Real data scientist make sense from nonsense. Fake data scientist make nonsense from sense.
    Oct 13 0
  • Oracle numbaz
    Surprised Bayesian theory is so high up there. Being truly Bayesian is very hard and almost no one actually does it. Or are those votes coming from people who don't actually know their way around Bayesian stats, but have memorized the conditional probability formula and on rare occasion use naive Bayes?
    Oct 13 0
  • Google ΜLE
    OP
    How did the voters in this poll develop a solid grasp on the theory behind

    ML models & NNs
    mathematical statistics
    optimization theory
    stochastic processes
    probability theory

    without the necessary prerequisites in linear algebra, diff eqs., and real analysis? It seems contradictory.
    Oct 13 5
    • New / R&D tonyperkin
      I was thinking the same thing.

      Stats, prob, Lin alg, and optimization will get you pretty far.
      Oct 13
    • Oracle numbaz
      Having used it in the past to prove something to yourself doesn't mean you still actively use it. So much is prewritten that you can escape having to actually use any "actual math" on your own directly once you're out of school, even as a researcher. Not saying it won't handicap you, but it's possible to get by
      Oct 13
    • Google ΜLE
      OP
      Consequences of black box hacking: Erroneously assuming i.i.d. and strong sense stationarity, failure to transform data properly, applying the wrong models, misinterpreting results -> garbage output
      Oct 13
    • Oracle numbaz
      Not writing the math from scratch yourself doesn't make something black box unless you have no idea what it is you're calling
      Oct 13
    • Google ΜLE
      OP
      How would you know about a particular algorithm's numerical stability, for instance, without reading its paper and implementation in code? I don't mean to say that one needs to be the original author, but at least personally, I find it helpful to read through the derivation.
      Oct 13
  • EA cfnE68
    Quick off topic but related question- do you have to have a pHD to be a data scientist?
    Oct 13 2
    • New / R&D tonyperkin
      No, but good luck getting interviews for really good roles
      Oct 13
    • Spotify Atinlay3
      Yeah, it’s really hard without previous DS roles at top companies or a phd.
      Oct 13
  • Airbnb wodkeo
    Dunno but every DS I’ve met here has a PhD
    Oct 15 0
  • New / R&D tonyperkin
    Probably any reasonable subset of the above would indicate the ability to learn enough of the rest on the job to succeed.

    Can any real DS comment on the most advanced thing they had used?
    Oct 13 2
    • Spotify Atinlay3
      I don’t think that it’s about how advanced stuff you use, but about having a deep knowledge of the underlying theory, in order to know when to apply those methods. I think that this is the most underrated skill of a data scientist, and often the hardest.

      For me, the most advanced thing was to create an algorithm that turned into a paper at a top-3 ML/DL conference.
      Oct 13
    • Spotify Atinlay3
      And just to be clear: most tech companies have different data scientist roles, such as; product and research.
      Oct 13
  • Google ΜLE
    OP
    Additionally, it's interesting to observe that the top choices tend to form a cluster of nonparametric techniques (& modern statistics), while people in tech (CS-heavy background?) apparently de-emphasize mathematical analysis and scientific modeling from traditional physics or Wall Street.
    Oct 13 0
  • Oracle numbaz
    Without statistical computing and understanding of probability/stochastic processes, there is no data science. You won't be able to solve problems at scale. Linear algebra is a runner-up, but I can imagine someone doing flips to avoid writing it out themselves, and usually succeeding.

    Everything else is interchangeable, though at least a few boxes should be checked per individual.
    Oct 13 0
  • Cruise Automation hqudy651
    I know some of these words
    Oct 13 0

Salary
Comparison

    Real time salary information from verified employees