Demo Group5 Hive Vs Impala

Team Members - Group-5

Name: Sushmita Rudra

GitHub profile: Sushmita-Rudra

Subtopics:

Load local data file into table using Hive commands.
Perform HiveQL queries for data retrieval and insertion of data manually.

Prerequisites:

Install Oracle Virtual box
Install and set up Cloudera Virtual VM

Step-by-step process and commands:

1. How to insert into a table manually:

Open terminal in Cloudera VM and press hive, which will open up the Hive CLI.

Create a table 'user_details' inside the database by using the following command.


  create table if not exists user_details (
     id int,
     age int,
     gender string,
     profession string,
     reviews int
  );

Now, insert the data into that table using the below command.

  insert into table user_details values (1,24,'F',Doctor,234455);

This is how we can manually insert a row data inside a table using Hive command. We can verify the data insertion by typing in the below command

  select * from user_details;

2. How to load data from a local file into a table:

More often than not , we will have large data files that needs to be dumped into a table, we can achieve that by using Hive.

Inside the terminal, create another table 'user_info' inside the database by using the following command.

 create table if not exists user_info (id int, profession string, age int, gender string, reviews int) row format delimited fields terminated by '|'
lines terminated by '\n' stored as textfile ;

Have the data file which you want to load into the table saved in your local machine. In my case, my local data file is at location "/home/cloudera/Documents/u.user". "u.user" data file is added in the repo files. Use the destination path of the local file in your machine in the below command to load the data into the table 'user_info'.
```
load data local inpath '/home/cloudera/Documents/u.user' into table user_info;
```
Verify the successful loading of data by querying out the table.

select * from user_info;

References:

Name: Manisha Mengani

GitHub profile: Manisha-Mengani

subtopics

To Load data From Local to HDFS create tables and update them using hive

Prerequisites:

Install cloudera virtual Machine into the local.Cloudera
Install oracle virtual box.Virtual Box

Process and Commands:

1. How to load data from Local to HDFS.

In the cloudera Virtual Machine follow below steps
Create file named 'sales_demo' in local
Once the file is created update the file with content as per your requirement delimeted with special symbol. Example:
```
1-Anu-boxes-18
```
Once the file with data is ready, lets move the file from local file system to HDFS
To check the file content give the command cat sales_demo.This is display all the data present in the file.

To put the file in to HDFS

hadoop fs -cat  /user/cloudera/sales_demo

file is now in HDFS.

2.Use the loaded Data in hive tables

In the hive terminal
```
show databases
```
use database and create table.

To create table : create a table as per the data loaded in hdfs

create table sales_demo(id int, 
                 name string,
                 product string,
                 num_sales int)
                 row format delimited fields terminated by '-';

To load the data into the sales_demo table

load data inpath '/user/cloudera/sales_demo' overwrite into table sales_demo;

View the content of the table
```
select * from sales_demo;
```

Demo link :

https://app.vidgrid.com/view/1BRoY0Z9ygas

References:

Name: Shravani Jaidi

GitHub profile: Sravani Jaidi

subtopic

Interacting with Hive and Impala

Prerequisites:

Download and Install Oracle Virtual box
Download and Install Cloudera Virtual VM

Process and commands

Steps for creating and loading data using Impala

Craeted data file on the desktop inorder to load into hadoop.
Open cloudera terminal and using impala-shell command logged into impala cli where I can execute commands.
create database using create database hive_impala
verify database using show databases
create table using create table customer_orders (store String,category String,cost double,paymenttype String)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
Load data using LOAD DATA LOCAL INPATH '/home/cloudera/Desktop/RawData.txt' OVERWRITE INTO TABLE customer_orders;
The above command results analysisexception beacuse we cannot load data using impala, since it doesn't have any own metadata structure it will use hive-metatdata structure.
Now let's load data by connecting to hive-cli, open new terminal and enter hive in terminal.
Load the data using same command LOAD DATA LOCAL INPATH '/home/cloudera/Desktop/RawData.txt' OVERWRITE INTO TABLE customer_orders;
Let's verify the data is loaded or not using some basic command.

Demo Video Link:

https://use.vg/gHSjlX

References

Name: Anil Bomma

GitHub profile: Anil Bomma

subtopic:

Implementation of aggregate queries using Impala and exploring web interface for impala and hive

Prerequisites:

  - Install oracle virtual box.[Virtual Box](https://www.virtualbox.org/)
  - Install cloudera virtual Machine into the local.[Cloudera](https://www.cloudera.com/)

TODO list:

Lets verify database, tables and data which is already loaded using impala shell
Let's calculate the maximum cost for each category using groupby clause on the loaded data
select category, max(cost) from customer_orders group by category;
Let's try to execute the above command in both hive and impala and calculate the execution time using Hue.
As we all know Impala sit's directly on the hdfs whereas hive sit's on the hdfs -> map-reducer. So the execution will be faster using impala when compared to the hive.

Demo link:

https://use.vg/yGGrUI

Sushmita-Rudra / demo-group5-Hive-Vs-Impala

Demo Group5 Hive Vs Impala

Team Members - Group-5

Name: Sushmita Rudra

Subtopics:

Prerequisites:

Step-by-step process and commands:

1. How to insert into a table manually:

2. How to load data from a local file into a table:

References:

Name: Manisha Mengani

subtopics

To Load data From Local to HDFS create tables and update them using hive

Prerequisites:

Process and Commands:

1. How to load data from Local to HDFS.

2.Use the loaded Data in hive tables

Demo link :

References:

Name: Shravani Jaidi

subtopic

Interacting with Hive and Impala

Prerequisites:

Process and commands

Steps for creating and loading data using Impala

Demo Video Link:

References

Name: Anil Bomma

subtopic:

Implementation of aggregate queries using Impala and exploring web interface for impala and hive

Prerequisites:

TODO list:

Demo link:

References:

About