SQL or Structured Query Language is a critical tool for data professionals. It is undoubtedly the most important language for getting a job in the field of data analysis or data sciences. Millions of data points are being generated every minute. And raw data does not have any story to tell. After all this data gets stored in databases and professionals use SQL to extract this data for further analysis. In this blog post, we will cover what is SQL and how does it work. SQL is fairly simple, thanks to easy syntaxes of this language. Let’s dig into understanding this Structured Query Language by asking this simple question…
What is SQL And How Does it Work?
SQL is the most common language for extracting and organising data that is stored in a relational database. A database is a table that consists of rows and columns. SQL is the language of databases. It facilitates retrieving specific information from databases that are further used for analysis. Even when the analysis is being done on another platform like Python or R, SQL would be needed to extract the data that you need from a company’s database.
SQL manages a large amount of data, especially if there is a lot of data that is being written simultaneously and there are too many data transactions.
There are different versions and frameworks for SQL, the most commonly used is MySQL. MySQL is an open-source solution that helps facilitate SQL’s role in managing back-end data for web applications. Companies such as Facebook, Instagram, WhatsApp, etc. all use SQL for back-end data storage and data processing solutions. When an SQL query is written & run (or parsed), it is processed by a query optimiser. The query reaches SQL server, where it compiles in three phases; Parsing, Binding and Optimisation.
- Parsing – A process to check the syntax
- Binding – A process to check the query semantics
- Optimisation – A process to generate the query execution plan
In the third step, all possible permutations and combinations are generated to find the most effective query execution plan in a reasonable time. The shorter the query takes, the better it is.
What is SQL Used for?
Now that we understand what is SQL and how does it work, let’s try to see what SQL can do. This programming language has various uses for data analysts & data science professionals. It is particularly helpful because it can:
- Execute queries against a database
- Retrieve data from a database
- Insert records into a database
- Update records in a database
- Delete records from a database
- Create new databases, or new tables in a database
- Create stored procedures & views in a database
- Set permissions on tables, procedures, and views
Imagine how difficult life would be if we did not have a way to control the database in this format. After getting a sense of what SQL is used for, let’s try our hands at learning SQL on an actual database.
What Is SQL And How Does It Work: Learning SQL With Some Basic SQL Commands
We have learnt above what SQL is used for, so let’s try out each of these things on a real dataset. As an example, we have chosen a database of customers at an e-commerce store. This is what it looks like this:
Here are some commands that we will be learning about, and later practising as well.
|CREATE||Creates a new table, a view of a table, or another object in the database.|
|ALTER||Modifies an existing database object, such as a table.|
|DROP||Deletes an entire table, a view of a table or other objects in the database.|
|SELECT||Retrieves certain records from one or more tables.|
|INSERT||Creates a record.|
|UPDATE||Modifies a record.|
|DELETE||Deletes a record.|
|GRANT||Gives a privilege to users.|
|REVOKE||Takes back privileges granted from users.|
If you notice, the commands ‘create, alter & drop’ define the complete structure of data. They come under the ‘Data Definition Language’ set of commands. Similarly, ‘select, insert, update, delete’ fall under Data Manipulation Language & ‘grant, revoke’ come under Data Control Language set of commands.
Let’s write some SQL commands and understand what the outcomes would be like:
SELECT * FROM Customers
SELECT * FROM Customers
WHERE Country='Germany' AND (City='Berlin' OR City='München');
Here is how you can write the syntax for other functions:
How SQL Works When Joining Tables
An SQL join clause is like a join operation in relational algebra. It combines the columns from one or more tables in a relational database to create a set that can be saved as a table or used as it is. A JOIN is a means for combining columns from one or more tables by using values common to each. A critical skill, and a frequently asked interview question, JOINS help you get a lot of work done with complex databases. Having the ability to manipulate JOIN queries with precision will give you an added advantage.
There are 4 major JIONS to combine data or rows from two or more tables based on a common field between them. Different types of Joins are:
- INNER JOIN
- LEFT JOIN
- RIGHT JOIN
- FULL JOIN
A JOIN is a SQL instruction in the FROM clause of your query that is used to identify the tables you are querying and how they should be combined. But before we actually start joining tables, let’s learn about primary key & foreign key.
- Primary Keys
A primary key is a column (or sometimes set of columns) in a table that is a unique identifier for each row. It is very common for databases to have a column named id (customerID, emailID, EmployeeID) as a primary key for each table.
- Foreign Keys
Foreign keys are columns in a table that specify a link to a primary key in another table. For instance, in the image above, the artist_id column in the albums table is a foreign key to id in the artists table.
Coming back to JOINS, let’s look at the below datasets to understand how they would work:
- For instance, you want to have a table that contains all your user and event table data together
For this, you will use an Outer Join. An Outer Join will combine the columns from all tables on one or more common dimensions when possible, and includes all data from all tables.
- What if you want to have a table that contains only users that have done an action?
This is where an Inner Join comes in play. An Inner Join would combine the columns on a common dimension (the first N columns) when possible, and only include the data for the columns that share the same values in the common N column(s). In the above example, the User_ID would be the common dimension used for the inner join.
- Now, what if you want to have a table that contains all the users’ data and only actions that those users have done? Actions performed by other users, not in the user table should not be included?
You would use a Left Join to join the tables together. A Left Join combines the columns on a common dimension (the first N columns) when possible, returning all rows from the first table with the matching rows in the consecutive tables. The result is NULL in the consecutive tables when there is no match. In this case, we would make the User Table the first (left table) to use for the Left Join.
A Right Join works exactly like a Left Join, with the only distinction being in the base table. In the Left Join, table 1 (left table) was considered to be the base, whereas, in the Right Join, table 2 (right table) will be considered as a base. A Right Join combines the columns on a common dimension (the first N columns) when possible, returning all rows from the second/right table with the matching rows in the first/left table.
Seems complicated? It really isn’t when you start joining the tables in real-time data. SQL being as simple as it is, it can be integrated & used within multiple languages like R, Python, Scala, and Hadoop. This makes data science & big data management an easy breezy tasks to do!
In this blog, we have summarized what is SQL and how does it work, but there is a lot more to learn in it to become an expert. SQL is a must-have skill for any data analytics or data science professionals. This is, in fact, a very commonly covered topic in Data Analytics Interviews. In a world where data is the new oil, finding out what is SQL and how does it work, and understanding what is sql used for can take you a long way ahead. To learn this and much more, join Springboard’s Data Analytics Career Track program. With 1:1 mentoring-led, project-driven approach, detailed curriculum, and real-life examples, you will become an expert at SQL & data analytics in 6 months! Also, did we mention that the career track program offers a job guarantee? Apply Now!