1.1 Data, Databases, & SQL

Welcome to the World of Data!

Whether the following overview information is new, a review, or somewhere in between, this course is built for you to succeed! We'll begin by learning exactly what data is and how data analysts interact with it.

Watch the video to get started.

Data

Many of us have worked with data in the form of a table or spreadsheet, which uses rows (horizontal) and columns (vertical).

  • rows contain data entries
  • columns organize that data into categories

For example, in the table below, the rows (or data entries) contain information about individual employees at a fake company called Tech517, and the columns represent categories of data about these employees, like their name, employee ID, where they live, and the year they were hired.

We'll call this table employees.

employees
employees
name employee_id city hire_year
Brian Hernandez 290308 Novi 2021
Tamara Hobbs 250199 Lansing 2014
Chloe Kim 305204 Okemos 2018
Priya Shepherd 195074 Mason 2022
Jesse Wright 305427 Lansing 2017

When a table is small like this one, spreadsheet software like Excel or Google Sheets can help us manage and analyze information. However, at the levels that real companies manage data, we'll need relational databases.

Relational Databases

A relational database is a way of storing information using tables that relate to one another.

Why is this helpful?

In a large dataset, a single table can't hold all of a data entry's information – there would be too many columns in just one table.

Instead, datasets are made of multiple tables that share common columns. Having a column in common allows the tables to relate to each other, that way we can still get information we need even when it's split in between two tables.

For example, look at the two tables below. Can you identify the name of the employee will be bringing a key lime pie to the potluck?

employees
employees
name employee_id city hire_year
Brian Hernandez 290308 Novi 2021
Tamara Hobbs 250199 Lansing 2014
Chloe Kim 305204 Okemos 2018
Priya Shepherd 195074 Mason 2022
Jesse Wright 305427 Lansing 2017
potluck_dishes
potluck_dishes
employee_id category dish
195074 side mac n cheese
250199 main lasagna
290308 side
305204 dessert key lime pie
305427 side caesar salad
Answer

Answer

How'd we get to that answer?

We looked for “key lime pie" in the potluck table and saw that it was linked to a specific employee id. Checked against the second table, we saw who the employee id was linked to.

In other words, even though the name and dish information weren't in the same table, the tables related to each other through the column employee_id. As a result, we could find the information we needed for our analysis.

SQL enables us to do this type of relational analysis on a much larger scale, connecting multiple tables and tons of information.

Why the separate tables?

Imagine if the potluck data was stored in the same table as the employee information. Who is bringing what to this year's potluck is not very useful for day-to-day business operations, so it doesn't make sense to store it there. Instead, the potluck data is stored in its own table that's related to the employee table.

SQL

SQL stands for "Structured Query Language" and is pronounced either "S, Q, L" or "sequel.”

To work with a database, an analyst inputs SQL statements, or queries, and the output is a table. In other words, we write code to ask questions about the data and SQL generates a table in response.

We'll start writing SQL soon, but let's see an example of how SQL could help us answer a question first. To do so, let's return to the table called "employees", which has four columns: name, employee_id, city and hire_year.

employees
employees
name employee_id city hire_year
Brian Hernandez 290308 Novi 2021
Tamara Hobbs 250199 Lansing 2014
Chloe Kim 305204 Okemos 2018
Priya Shepherd 195074 Mason 2022
Jesse Wright 305427 Lansing 2017

Let's say we want to know the names and hire year for employees who were hired after 2019. We would want:

  1. to SELECT the columns name and hire_year
  2. FROM the table employees
  3. However, we only want to see the data WHERE the hire_year is after 2019

As a SQL query, the input might look like this:

and here’s the output:

employees
employees
name hire_year
Brian Hernandez 2021
Priya Shepherd 2022

But it's not just about getting the information - SQL can also help us sort information in helpful ways. Let’s look at a different example in this video:

This may look like a lot right now, but we’re going to learn how to do this (and so much more) step by step. More importantly, we’re going to learn to work confidently with data and use SQL to answer questions that are useful in industry, whether in tech, healthcare, educational companies, or pretty much any field you can think of.

Welcome to the start of your new career!

Review

  • Data:
    • Data is typically organized in tables with rows representing individual data entries and columns representing categories.
  • Relational Databases:
    • Large datasets require relational databases composed of multiple tables linked by a common column.
  • SQL (Structured Query Language):
    • SQL enables data analysts to work with vast amounts of data by writing code-based queries to retrieve and manipulate information.

Preview

  • First, we'll learn the SQL basics – no previous experience needed!
  • As we gain experience, we'll put our work into portfolio projects that show prospective employers our skills.
  • As we get more advanced, we'll download real SQL software to complete complicated analyses.


Head to the next lecture to learn more about how to write SQL queries!

Complete and Continue