Member-only story

Data Cleaning & Transformation using SQL Only!

Dhaval Thakur
3 min readMar 10, 2022

--

When we talk about data cleaning, python or R are the only programming languages that come to our mind. True! these both languages have awesome libraries to clean the data before we can start out analysis or build machine learning models . But I feel that SQL is still underrated and we can do normal data cleaning directly using some of the uncommon SQL functions!

Data Cleaning using SQL (image credits: kdnuggets)

TRIM

The TRIM function is used to remove characters from the beginning and end of a string. Here's an example:

SELECT column,
TRIM(position 'characters' FROM column)
FROM Tablename

The TRIM function takes 3 arguments. First, the position, whether you want to remove characters from the beginning , the end , or both . Next ,we have to specify all characters to be trimmed.The last argument is the column name.

Parsing Dates

If the date is written as a string we can surely convert it to date data type using CAST. But this only works when it is in an SQL identified format. But what if it is other formats like MM/DD/YYYY? We first need to convert it to a string in an acceptable format and then cast it. For example,

(SUBSTR(date, 7, 4) || ‘-’ || LEFT(date, 2) || ‘-’ || SUBSTR(date, 4, 2))::date AS cleaned_date

--

--

Dhaval Thakur
Dhaval Thakur

Written by Dhaval Thakur

Data Enthusiast, Geek, part — time blogger. Every week 1 new Data Science/ Product Management story 🖥 I also write on Python, scripting & blockchain

No responses yet