Why do you have to read this?
The shortage of data scientists is becoming a serious constraint in some sectors. — Data Scientist: The Sexiest Job of the 21st Century from Havard Business Review
Nowadays Big Data is one of the most important key resources to get a competitive advantage in business, especially for IT companies. All of GAFA, accelerate their business by applying Data Science Technique. Let me explain a little bit more.
For instance, Google, Youtube uses Recommend Engine which will never let you go, because they suggest contents by completely following your taste. Amazon increases its Gross Merchandise Volume(GMS) by matching user and product very efficiently. Facebook shows Advertisements with higher CVR from extracting critical insights from your demographical and behavioral data. (I used their Ads in the previous job and was surprised at how effective it performed.)
Till now I’ve only mentioned about Internet Giants. However, according to Masayoshi Son who is Japanese CEO of SoftBank Vision Fund’s which is one of the biggest founds all over the world, and investments hover near $45 Billion dollars in 2018, left an insightful comment.
The impact of Internet is somehow limited in particular domains or industries, like Advertisement, Retail (E-commerce), but AI is different. — Masoyoshi Son
“Next Big Waves” in other industries, which is applying data science methods into Unstructured Data like Image, Natural Language, Sounds, are coming up. I mean, for example, Mobility Industry you can see what Elon Mask does, Robotics you see what’s happening in Amazon’s warehouse, Finance, Healthcare area and so on.
If you are young and wanna be the biggest winner in your career, this Data Science or Machine Learning field could be the most possible choice.
Because from my marketing perspective, a quite simple demand-supply balance analysis, I think some special paid job title like professionals in the Finance field or classic Software Engineer, Business Development position with MBA grad is already taken by our elders. And those skills and knowledge are became commoditized. I mean these are very competitive.
Comparing to them, Data Science field is way messier and still in the mist. Then why not invest your career and passion, intelligence into this challenging field? Welcome to this fantastic ML technology with a full of hope!
In this article, I will focus on how to get a Data Scientist jobs or Machine Learning Engineer posts who will be needed by real industries, or in other words, who will pass the job hunting process with lesser pain. Let’s get started!
— — —
After you read this, you’ll get:
- The Shortest Path towards A Real Data Scientist
- The Best Learning Resource everyone trust in this area
- The Realistic Possibility of your career with ML
— — —
Menu
- What is Data Science for Business?
- Data scientist vs Machine Learning Engineer
- Required Skill for A Full-Stuck Data Scientist
- A Golden DS Learning path for a newbie
- The Secret Bibles towards A Full-Stack Data Scientist
— — —
1. What is Data Science for Business?
Data Science itself is just a Method, Not a Goal in our business. Then definition of it should be explained in much simpler way. — an unknown data scientists
Data Science itself can never be a purpose, if it becomes, it means you already fail. Rather than that, I would say “We just became able to make some additional or novel values in the real business which we couldn’t 10 years before”.
I highly recommend listening to this podcast, SDS 131: The One Purpose to Data Science and The Truth about Analytics from SuperDataScience.
To define Data Science, I wanted to start from their goal and purpose. The fundamental goals of Data Science in business is pretty clear. I think it can be simply described as followings:
- To make more profit in your business by using data (as a result)
- To understand and satisfy your customers by efficiency, better matching
- To create a new business or startup by using machine learning
The background behind this Game Change
In addition, I think it’s very important to understand the reasons why the significance of Data Science is gradually recognized in recent business. It could be the following three:
- An explosion the Amount of Data by Internet and Smartphone
- Improvement of Computational power by GPU and TPU
- Deep Learning enables to Process Unstructured Data
OK, so now we can take the next step to understand how Data Science jobs are separated in real industries. Let’s figure out where to start your career depends on what you have right now and what you should have to get an ideal position for 5–10 years long-span career plan.
— — —
2. Data Scientist vs Machine Learning Engineer
In2018 summer, I decided to build a Machine Learning related career from general Data Analyst job because I was pretty sure this technological innovation will exactly reproduce what we’ve seen a drastic change which was caused by internet and smartphone.
Suppose you already heard about the average salary of data scientist (117,000USD/year!) or understood enough the potential of ML innovation. Here, I only focus on talking to people who want to switch their expertise domain into ML(data science) by taking several years.
After I built my Data Science portfolio and started my job hunting I realized there are mainly two different job title which can work closely with machine learning technology:
One is Data Scientist. Another is Machine Learning Engineer.
As you can see above image, the difference between these two and classic Data Analyst job and Data Engineer(Big Data Engineer) job is relatively easy to describe because they’ve been already existing for more than 10 years and required skills are very clear.
However, I found the required skills and knowledge of Data Scientists and Machine Learning Engineer is quite duplicated.
The reason is simple. Since Machine Learning is the most impactful innovation in recent Data Science field, which even able to create new business and companies like Chatbot startup or Drone startup, the specialist of Machine Learning itself became an undoubtedly respectable job title.
On the other hand, nowadays we also cannot talk about Data Science without taking Machine Learning into account, Data Scientist also must know ML Theory as well. Because most of the companies are currently interested in to accelerate their business by using ML technology.
In fact, Data Scientists might not only have theoretical knowledge of Machine Learning but also, at least, be able to implement several Machine Learning Algorithm like SVM or Random Forest for classification task by using scikit-learn or build Neural Network by using Keras.
Because ironically or interestingly, unless you understand math and statistics deeply, if we want to understand ML theory and Deep learning, it’s an efficient or required way to write codes and implement it by using Tensorflow and scikit-learn or those high-level API.
2–1. The different Area of Expertise between DS and ML engineer
I know this highest rated answer for this question from Quora is not enough for you, to clarify your career in this field.
Finally, I found the wall which is unable to climb over between these two jobs in real industries. It was like this:
Professional Machine Learning Engineer can build an “end-to-end software product” which has machine learning algorithm as a part of them.
Professional Data Scientists can define “the problem which should be solved(or not)” with scalability by using by machine learning and “How” as well.
I hope you get some pictures of what I wanna say. Please don’t forget that the final goal and responsible mission of ML engineer are, I think, finalizing to build a moving software. More clearly said, unless you don’t have experience of backend software engineering, it seems hard to get a comprehensive ML engineering position which we can often find on JDs.
I wanted to tell the newbies from a non-engineering background, like data analyst, the reality is that some serious tech companies write neural network from scratch, I mean they even don’t rely on Keras or Tensorflow.
On the other hand, Data Scientist requires outstanding business understanding which is more vague and difficult to prove though (life is hard). But this is so important because in some case, a classical statistics method like Multiple regression analysis can be applied, ML is even not required. And also the application of ML in software requires an enormous amount of time and human resources. It’s necessary to calculate cost-performance balance before a huge investment decision making.
2–2. So is it impossible to get ML engineering job?
Well, for the enthusiastic ML fresher, I found a suitable position for us in ML projects. That is Data Preprocessing and Feature Engineering role.
In the real machine learning application, we repeat the following main process again and again until we acquire significant enough accuracy:
- Data Preprocessing and Feature Engineering
- Modeling ML/DL architecture and Training
- Model Validation and Hyperparameter Tuning
Then finally ML engineers put this architecture into existing software or define whole architecture at the same time if you build a new product. In this process, it requires more development experiences and knowledge.
— — —
3. Required Skill for A Full-Stuck Data Scientist
Conclusion: In my definition, A Full-Stack Data Scientist is a perfect mix of Data Scientist and Machine Learning Engineer, who can design and build “End-to-End Machine Learning Project and Software”. — by me
After I analyzed over 200 job description of the Machine Learning related position in Japan and India, Singapore, including companies like Google, Facebook, IBM. I found must-have skill towards A Full-Stack Data Scientist Career.
I will divide them into two different categories, one is a visible and more practical skill(more important in terms of getting a job!), another is theoretical and relatively difficult to prove.
You can use the following checklist before you start making a learning plan. It’s very flexible as well depends on JDs which you want to apply.
3–1. Practical Skill (Easy to prove and visualize)
- Basic Statistical Language: Python, R, Julia
- Data Science Library: Numpy, Pandas, Scipy, Seaborn
- ML/DL Library Experience: Tensorflow, Torch, scikit-learn
- Unstructured Data Processing: Image, Text, Sounds
- Relational Database: MySQL, PostgreSQL, SQLite
- Distributed File System: Hadoop, Spark, AWS, MongoDB
- Container-type virtual environment: Docker
- Version control system: GitHub
- Web Framework: Django, Flask, Ruby on Rails
Very informative post and a guide for the beginners where in you are making a codes and the rise of artificial intelligence is playing a crucial role. Be a part of programming languages online course irrespective of your background.
ReplyDelete