One of my possible options for further carrier paths was to move my research skills deeper in data science. In spite of the fact, that only blockchain can compete with DS in popularity nowadays, I’ve already made another choice.
That’s why the correct title for this story is “What’s wrong with data science from my biased point of view”.
This year a couple of my friends and I organized a meetup devoted to data science, where I also was a speaker, and was giving advices on how to start efficiently.
Before we start I’d like to mention one detail. We tend to be confused with responsibilities of data <engineer/scientist/analyst>. Engineer is someone between backend developer and dev-ops, scientist is someone who is very close to PhD in statistics and analyst is a math storyteller. These are completely different areas of expertise, which require specific background but real world mixes them up that leads to misunderstandings.
And still, what’s wrong with data science?
The first thing is that it turns out to be not so interesting as in articles telling how AI owns chess, go, anyOtherGame players, draws paintings and guesses dogs vs. cats. It’s even less interesting then “how to” articles, containing several lines of Python code and describing a receipt to N-th place of yet another machine learning contest.Less interesting until you are developing Tesla autopilot.
Real world data science is probably 5% of having fun with cool machine learning models and 95% of preparing raw data. Raw data may be not only dirty, but may also be hardly suitable for X-Y objective that can be solved by ML algorithms. (Though it’s VERY interesting to turn real-world problems in ML ones).
Another part of data scientist job is endless loop of putting forward, checking hypotheses that may or may not improve the model. Every job has a routine parts, but as for me, this can be much more boring than the most boring bugfix in the life of a frontend developer. Have you ever heard of “data monkey” ?
Furthermore, it can be so boring, that there are algorithms that were created to fight against this routine: they are looking for optimal combination of features transform, learning algos and their hyperparameters.
Data science can be hardly imagined as independent product. It’s not what I can do on holidays as my hobby side project.
Every second article claims that the world is going to need e^k data scientists during next several years.
What I see is that data science market is not big comparing to web and mobile development from the outsourcing point of view. And I primarily look exactly from this point.
While small companies don’t have enough data, big companies keep their data for themselves and setup their own data science departments. So finding your place as an outsourcing data scientist might be much more tricky than an outsourcing developer.
I’m sure this order of things will change in not very distant future but the content of DS job will change as well.
I believe that 99% of routine will be more or less automated so the main challenge will be to build and maintain infrastructure.
DS is at least 1/3 about statistics. And statistics is a great place for deceiving yourself and others. Any result obtained on one data set can be spoiled on another due to many reasons starting with real world changes and ending with simply poor modelling. The difference may be too slight and can make the job completion criterion very misty.
Frankly, me and a couple of my friends had an idea of outsourcing data science jobs. Supercharge civilized western countries with the preterhuman powers of Russian mathematicians to be more detailed. But after some discussions with developers and owners of outsourcing companies I came to a conclusion that there are better options.
And one more thing.
I’m sure that almost any experienced developer with math background is able to more or less succeed in solving problem of forecasting with machine learning stack. It will be just like smoking one more framework.
But I doubt that there is backward compatibility