-->

Welcome to our Coding with python Page!!! hier you find various code with PHP, Python, AI, Cyber, etc ... Electricity, Energy, Nuclear Power

Tuesday 19 October 2021

Solving Machine Learning Problems On Kaggle Vs Real Life

All about Agile, Ansible, DevOps, Docker, EXIN, Git, ICT, Jenkins, Kubernetes, Puppet, Selenium, Python, etc

According to Kaggle, the most commonly used algorithms were linear and logistic regression, followed closely by decision trees and random forests.

n the era of data science and machine learning, hackathon platforms like Kaggle, MachineHack, etc., have emerged as testbeds for many ML and data science professionals, alongside helping companies to hire the best talent using the hackathon model. 

According to Kaggle’s 2020 edition of the State of Machine Learning and Data Science report — which includes insights gathered from a survey of 20,036 Kaggle members — more than 55 per cent of data scientists have less than three years of experience, and six per cent of professionals pursuing data science have been using machine learning for more than a decade.  

The study further revealed that machine learning has become more rooted in the companies where Kaggle scientists work. Nearly 31 %of data scientists claimed well-established machine learning methods, up from 28% in 2019 and 25 % in 2018. 

Kaggle vs real life 

Though Kaggle competitions are great to practice data science skills, are they really that different from real-world data science and machine learning work? This article will unveil the difference between the two, especially when solving machine learning problems on Kaggle vs real life. 

Problem definition

While some argue its [Kaggle] real-world implications and question the effectiveness, the problem-solving aspect remains common for real-life as well as hackathons. 

In Kaggle, the problem is well defined, and you are provided with clear instructions on how to solve the problem and how it will evaluate your work. 

A typical problem-solving cycle (Source: Humor That Works)

However, in real life, the problem is often not defined clearly, and you will have to come up with some inputs from data that can lead to concrete KPIs in the business environment. Plus, you will have to do lots and lots of meetings to get a better understanding of your problem statement. 

Solution 

According to Kaggle, the most commonly used algorithms were linear and logistic regression, followed closely by decision trees and random forests. For more complex techniques, gradient boosting machines and convolutional neural networks (CNN) were the most popular approaches.

(Source: Kaggle)

But, in real life, there are no shortcuts. Aakash Nand, Software Engineer (Data Science) at NTT Communications, said many Kagglers use a few ‘sneaky’ methods to boost the performance of their model, which in the real world should be avoided.

“For instance, some perform transformation or imputation on both train and test set combined instead of splitting them and preprocessing them separately to avoid data leakage. This increases performance but might make your model less generalisable to new, unseen data,” said Nand.

Machine learning 

Almost every dataset can be seen as a machine learning problem on Kaggle. It is quite famous for hosting machine learning competitions, which makes you an expert in improving your score by 0.0001, fine-tuning parameters, and making an algorithm work. 

In the real world, not every company uses machine learning and not every data scientist deals with machine learning in their daily work, so the exposure is minimal.  

Data

In Kaggle, you can access the datasets with minimal effort. Also, you are provided with a platform where you can discuss with domain experts to understand the features. The datasets provided are usually ready for analysis and require minimal cleaning efforts or skills. 

For instance, a Kaggle alternative, MachineHack, offers various such platforms like MocksPractice and Bootcamps, making it easier for participants to experiment with an array of datasets and ace data science hackathons



1 comment:

Thanks for your comments

Rank

seo