Data Science : An Introduction

 In the last post you saw how python is useful for data science, but there is a main question that is left out

"What is Data science?", This post is like a small intro to show you what data science is and answers some other questions like :

  • Is it useful in real life or Is it just another Pythagorean theorem?
  • What do Data scientists do?
  • Math behind data science

What is Data Science?

Well in my terms, "Data science is the process of turning the data to answers to real world problems."
It involves collecting data, finding patterns, and using those patterns to make decisions, predictions, or just understand something better. 

Is Data Science really useful?

We all might've memorized the Pythagorean theorem in school to pass exams and never used it again(unless you are an architect, engineer or someone who is really into triangles for some reason). But Data science is unlike that , It is everywhere...like recommendation systems in YouTube, Netflix and even the reels on Instagram which we scroll like there's no tomorrow.

What do Data Scientists do?

So yeahh, now that we know what data science is, we might be wondering who does this stuff? That's where Data scientists come into the picture and it also leads us to the question , What do they do? 
Data scientists turn the data into answers. Though it is not done in a single step they typically follow a structured process, it look something like:
  1. Understanding the problem
  2. Gathering the necessary data 
  3. Cleaning data
  4. Exploratory Data Analysis
  5. Model building
  6. Evaluating the model
  7. Deploying the model
Let me explain these 7 steps in short

  1. Understanding problem : We try to figure out what is the problem that we are gonna solve
  2. Gathering Data : Well we need resources to answer anything, and this data would be our resource to solve the problem. Gathering data can be done by Webscraping, files, data bases, surveys etc..
  3. Cleaning the data : The data we gather in second step would be raw data and raw data tends to have missing values, duplicates and errors, we try to fix that and make the data usable. 70-80% time on the project goes on this step.
  4. Exploratory Data Analysis : Now that our data is clean , we visualize this data into heatmaps, scatterplots, Histograms, Boxplots , Linegraphs etc.. to find out any patterns, relationships and outliers(unusual values) in the data.
  5. Model Building : So we understood patterns in the data, we now make the machine understand data. This is done by implementing machine learning algorithms on the data, the machine understands from the patterns, outliers from the data just like how we did(or even better) and gives out predictions.
  6. Evaluating model : This is where we test the model's performance on new data. It's like test exam that we have in school/college, how we used to learn information from textbooks and the exam shows how good we understood the concept it goes the same with the machine. 
  7. Deploying the model : If our model is performing well and giving out correct predictions on the new data it is time to put it to real use. We can deploy it in a website, app or an API so that others can use it. 

Math for Data Science

Now you might be wondering, "Do I need to go full Einstein mode to do all of this?"
Well the simple answer is no not that much but you need to have knowledge in Probability and Statistics, Linear Algebra and Calculus, Math is used to make sense of the numbers and build better models.
  • Probability and statistics is used to spot correlations, how the data is distributed, testing hypotheses.
  • Linear algebra is all about Matrices and vectors, well everything in data science is numbers in a matrix form and Linear algebra helps us working on them efficiently.
  • Calculus helps the machine to learn, it is used to optimize the models like finding best fit line for a linear regression algorithm.
And that concludes my blog , we went from "What is Data science" to "Woah it's in my reels too". I hope this post helped you in understanding how data science works and what data scientists really do, and ofcourse the math which powers the whole thing behind the scenes.

Comments

Popular Posts