Member-only story

How are HIVE and Pig are different?

Dhaval Thakur
2 min readDec 18, 2022

--

If you have been working in data pipeline or looking to work in the data engineering field, you might have heard about HIVE and Pig. If not don’t worry.

In this story I would briefly tell what HIVE and Pig are. Thereafter, I would move in the crux of this story which is the differences in between them.

HIVE vs PIG (Image Credits: Wizlabs)

What is HIVE?

Also known as Apache HIVE, Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

Long story short, Using Hive, you can skip the requirement of the traditional approach of writing complex MapReduce programs. Hive supports Data Definition Language , Data Manipulation Language, and User Defined Functions.

What is Pig?

In short, Apache Pig enables people to focus more on analyzing bulk data sets and to spend less time writing Map-Reduce programs.

Pig in Hadoop has two execution modes:

  1. Local mode: In this mode, Hadoop Pig language runs in a single JVM and makes use of local file system. This mode is suitable only for analysis of small datasets using Pig in Hadoop

--

--

Dhaval Thakur
Dhaval Thakur

Written by Dhaval Thakur

Data Enthusiast, Geek, part — time blogger. Every week 1 new Data Science/ Product Management story 🖥 I also write on Python, scripting & blockchain

No responses yet