One of my possible options for further carrier paths was to move my research skills deeper in data science. In spite of the fact, that only blockchain can compete with DS in popularity nowadays, I’ve already made another choice.
That’s why the correct title for this story is “What’s wrong with data science from my biased point of view”.
This year a couple of my friends and I organized a meetup devoted to data science, where I also was a speaker, and was giving advice on how to start efficiently.
Before we start I’d like to mention one detail. We tend to be confused with responsibilities of data <engineer/scientist/analyst>.
Engineer is someone between backend developer and dev-ops, the scientist is someone who is very close to Ph.D. in statistics and an analyst is a math storyteller. These are completely different areas of expertise, which require specific background but real-world mixes them up that leading to misunderstandings.
And still, what’s wrong with data science?
The first thing is that it turns out to be not so interesting as in articles telling how AI owns chess, GO, or any other game players. How brand ned deep artificial neural network draws paintings and guesses dogs vs. cats.
It’s even less interesting than “how to” articles, containing several lines of Python code and describing a receipt to N-th place of yet another machine learning contest.
Less interesting, of course, until you are developing Tesla autopilot.
Real-world data science is probably 5% of having fun with cool machine learning models and 95% of preparing raw data. Raw data may be not only dirty but may also be hardly suitable for X-Y objectives that can be solved by ML algorithms.
However, I have to admit that it’s VERY interesting to turn real-world problems into ML ones.
Another part of a data scientist's job is the endless loop of putting forward, and checking hypotheses that may or may not improve the model. Every job has a routine part, but for me, this can be much more boring than the most boring bugfix in the life of a frontend developer.
Furthermore, it can be so boring, that there are algorithms that were created to fight against this routine: they are looking for an optimal combination of features transformation, learning algorithms, and their hyperparameters.
Data science can be hardly imagined as an independent product. It’s not what I can do on holidays as my hobby side project.
Every second article claims that the world is going to need e^k data scientists during the next several years.
What I see is that the data science market is not big compared to web and mobile development from the outsourcing point of view. And I primarily look exactly from this point.
While small companies don’t have enough data, big companies keep their data for themselves and set up their own data science departments. So finding your place as an outsourcing data scientist might be much more tricky than an outsourcing developer.
I’m sure this order of things will change in not very distant future but the content of the DS's job will change as well.
I believe that 99% of the routine will be more or less automated so the main challenge will be to build and maintain infrastructure.
DS is at least 1/3 about statistics. And statistics is a great place for deceiving yourself and others. Any result obtained on one data set can be spoiled on another due to many reasons starting with real-world changes and ending with simply poor modeling. The difference may be too slight and can make the job completion criterion very misty.
Frankly, I and a couple of my friends had the idea of outsourcing data science jobs. Supercharge civilized western countries with the preterhuman powers of Russian mathematicians to be more detailed. But after some discussions with developers and owners of outsourcing companies I came to the conclusion that there are better options.
And one more thing.
I’m sure that almost any experienced developer with a math background is able to more or less succeed in solving the problem of forecasting with a machine learning stack. It will be just like smoking yet another framework.
But I doubt that a data scientist can easily switch to software engineering and start building production-ready systems.