A DATA SCIENTIST
– NEHA KARAN
Data scientists are a new breed of analytical data experts who have the technical skills to solve complex problems. Data Science is an ever-growing field and in the recent years has gained the spot-light for employment. In this article I will throw light on various roles of the data scientist, and how data science overlaps with related fields such as machine learning, AI, data mining, operations research, Six Sigma and Data Analytics.
It is a misconception by many that in order to become a Data Scientist one needs to master all the above skills. There are 9 types of Data Scientists. Each of them excels at one of the above-mentioned skills and not all. They serve the industry by their individual skill and yet be known as a Data Scientist.
The first type are those who are strong in Statistics. They sometimes develop new statistical theories for big data, that even traditional statisticians are unaware of. They are expert in statistical modelling, experimental design, sampling, clustering, data reduction, confidence intervals etc. The second type are those strong in Mathematics. They work for the NSA or Defense/military,these are people working on big data and operations research, people doing analytic business optimization as they collect, analyse and extract value out of data. Third type are the Data Engineers. They have commendable skills in Hadoop, database/memory/file systems optimization and architecture, API’s, Analytics as a Service, optimization of data flows, data plumbing. The fourth ones are those strong in Machine Learning. They are highly paid and have good knowledge in algorithms and computation complexity. The fifth type are those individuals having a good Business Sense. They should be skilled in ROI optimization, decision sciences and are involved in tasks traditionally performed by business analysts in bigger companies sometimes. The sixth type are those involved in Production Code Development, these are mainly software engineers and are fluent in few programming languages. The seventh type are those strong in Visualization. Visualization is the art of making data beautiful and is a major field. It involves data art, info graphics and data dashboards. The eighth type are those strong in GIS (Geographic Information System). It is used to store and manipulate geographical data. The ninth type are those who are Experienced and are strong in few of the above mentioned skills. They are highly paid. After several years of experience across many industries, and lots of training, one is strong both in statisstics, machine learning, business, mathematics and more than just familiar with visualization and data engineering.
Few Inter-Disciplines of Data Sciences:
- Machine learning: Machine Learning is the heart of data science. It’s basically teaching a computer how to think and solve it’s problems, in the end, without user assistance. Python or R are intensely used in ML. Regression and Classification are the 2 bolsters of Machine Learning. Any problem is classified as Regression or Classification.
- Data mining: Data Mining is the art of extracting useful information or knowledge from a bulk of data using the best algorithms. Data has to be pre-processed in order to be mined. Big Data Analytics is based on Data Mining. It is a process of discovering patterns in large data sets. Data mining task is the semi-automatic.
- Artificial intelligence. It’s vastly intersecting with Data Science. The intersection involves patter recognition and designing of smart/automated systems to perform many tasks in machine to machine communication such as identifying the right keywords (and right bid) on Google, smart search.
- Operations research. Abbreviated as OR. OR and Statistics are very closely related. OR is about decision sciences and optimizing traditional business projects: inventory management, supply chain, pricing. It has many applications in the army and Defense sector. One real life solution provided is car traffic optimization with simulations, commuter surveys, sensor data and statistical modelling.
- Data Analysis. This is the new term for business statistics. It covers a large spectrum of applications including fraud detection, advertising mix modelling, attribution modelling, sales forecasts etc. Except in big companies, data analyst is a junior role. These practitioners have less knowledge and experience than a data scientist and also lack business vision.