Data Science for Business Fundamentals
The Basic Concepts You Need to Know About Data Science and Analytics
The term "big data" gets thrown around a lot, but what are data science and analytics?
The title of data scientist is only about a decade old, and as a young profession, it is still defining itself. The same is true with the concept of a data science project.
So what is a data scientist? It refers to anyone specializing in turning vast amounts of data into useful information. The process of extracting value from raw data is the definition of data science. Data scientists are the diamond miners of the tech world. To find the few valuable pieces in a cavernous mine, workers dig, cut, and polish only the best pieces.
Imagine you are the CEO of Bank of America. You need to know how much interest people are paying on their mortgages. Knowing how much income you're making off mortgages can help you figure out how many loan officers to hire. How can a data scientists help you figure this out?
Data scientists use the data science life cycle to turn data into meaningful information. To understand what is data science, we need to understand the data science life cycle.
Data Science and Analytics Life Cycle
The first step in the cycle is understanding your data objectives. All aspects of business produce data. From window shoppers looking at your site on their lunch break to wholesale accounts that buy every month.
If you want to increase the time customers spend browsing, you'll need data on what they spend the most time on. Figuring this out is a data scientist's job. They do this by assigning numbers and names, or categories and labels, to all the data they are looking over.
The five questions data experts begin to answer are:
-
Is this A or B? Called classification, this is where we ask our questions. What works better, a $10 coupon or buy one get one free?
-
Is this weird? Anomaly algorithms detect any outlying data points, like unauthorized purchases on a credit card.
-
How much or how many? Regression algorithms predict future numbers based on previous incidents, like a sales forecast.
-
How is this organized? Clustering algorithms combine similar data points, like which customers are buying the same product.
-
What should I do next? Reinforcement learning algorithms help determine what action to take next, like a smart thermostat deciding to change the temperature.
Algorithms Are like Recipes
Data science and analytics is like cooking a big meal. The recipe you follow is your algorithm. Your raw data sets are the ingredients. Computing power is the flame on the stove, changing your raw ingredients into a something you can eat.
But how do you know if your ingredients will taste good together? By sorting them out.
Data can only give you an answer to your question if it's relevant. You can't answer what tomorrow's weather will be like with data about driving speed in your state. But you can answer what Amazon's stock is likely to look like a month from now if you have relevant data about their business and the market.
If your data is incomplete, you may not be able to make anything useful out of it. It also needs to be accurate. Inaccurate data will give you an answer, but it will be wrong.
You're also going to need enough data to make something meaningful. This varies a lot depending on what the question is. The more general the question, the more data we're going to need.
Sharp Questions
How do we ask productive questions about our data? Make sure our questions are sharp, meaning there is no room for vague answers. Asking sharp questions can help you outside of data science as well.
Getting an accurate answer relies on examples within our data. To answer which of your products is the best selling, you need data on previous sales.
You can also reword your questions to be more effective. Rather than asking which ad was most interesting to your viewers, you can ask, "how interesting were each of these ads to the viewer?" This will rank your ads by the number of plays and duration watched. Now you can see how each of your ads performed and why.
How Models Answer Questions
A model is a conclusion drawn from our data points. If we were tracking the price of gasoline over 10 years, the line we could draw through all the price points would be our model. This lets us predict what future prices could look like.
Because models are the average taken from our data, answers won't be 100% correct. That's because each of our data points has noise or variance associated with it. There is a relationship between this data, but the real-world differs from our predicted model.
We can make predictions based on the confidence interval. Confidence intervals include the set of data that we are using to make our prediction as well as the possible predicted value.
Data Science Careers
Now we know how data scientists are using all that raw data. Let's talk about the different roles data experts play in finding the answers to our questions.
Data Engineer
Data science all starts with engineers. Data engineers are responsible for the infrastructure that brings in data. Engineers begin the long process of turning big data into something useful.
Data Scientist
Data scientists focus on finding the necessary data sets to begin answering questions. They focus on finding the most relevant data and begin whittling it down.
A data scientist will answer your question with solid facts and statistics that may not make sense to a nontechnical worker. That's where data analysts come in.
Data Analyst
Data analysts are the bridge between business analysts and data scientists. Data analysts take these complex data-driven answers and make it understandable for laymen. They help add meaning to the findings of data scientists.
Data Science in the Real World
Let's revisit the example of needing to figure out how much money the bank makes off mortgages. You bring the question down to your data analyst. The analyst works with the scientist who has gathered data from the engineer.
By looking at past data on how fast customers pay back their mortgages, we can get an estimate of future earning from mortgage interest. Data science and analytics help businesses hire people, manage their finances, and prioritize their resources.
From thousands of data points, we are able to get a well-rounded answer to our question. You can make a sound decision on how many people to hire thanks to big data. To learn about other tools data scientists are using to visualize data, check out this article now.
Need help with Data Science & Analytics?
Check out our services to find out how we can help!
Online Data Science Corporate Training
Live, online, instructor-led courses on the latest data science, analytics, and machine learning methods and tools.
Need help with Data Science & Analytics?
Our consulting services help you use data to make smarter decisions, grow your business, and accomplish more with the resources you have.