Hive基础

0

Hive常用操作

beeline客户端命令:

1
2
3
4
5
6
$beeline --hiveconf  hive.server2.logging.operation.level=NONE
$beeline -f "test.sql"
$beeline -e  show databases
$beeline -n  username -p password -u jdbc:hive2://hs2.local:10012
$beeline -u  jdbc:hive2://192.168.0.76:10000 -n $user
$beeline --hiveconf  'hive.server2.tez.initialize.default.sessions=true'  'hive.cli.print.header=false' -e 'select * from table'
1
2
3
4
5
6
>describe formatted  $table_name;
>show create database $db_name;
>show create table $table_name;
>drop  database  $db_name  cascade;
>TRUNCATE TABLE  $table_name;
>describe database $db_name;

Hive常用语法

修改执行引擎:

1
>set hive.execution.engine=spark;

修改hive 提交队列:

1
>set mapreduce.job.queuename=default

修改hive表的ower:

1
>ALTER TABLE $table_name SET OWNER user $username;

修改表的序列化

1
>ALTER TABLE $tablename set SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe";  (RCFlie)

修改表存储格式:

1
2
3
>ALTER TABLE $tablename SET FILEFORMAT
INPUTFORMAT “org.apache.hadoop.hive.ql.io.RCFileInputFormat”
OUTPUTFORMAT “org.apache.hadoop.hive.ql.io.RCFileOutputFormat”;

修改存储路径:

https://www.cnblogs.com/weijiqian/p/14335772.html 

1
>ALTER TABLE $table_name  set location '/hive/warehouse/test.db/tb_table2';

MR引擎

    set mapred.min.split.size=1; 
    set mapred.max.split.size=128000000; 增大map task的数量 
    set mapred.map.tasks=2; 增大map task的数量 ,无用!!
    set mapred.reduce.tasks=1000; 增大reduce task的数量
    mapreduce.map.memory.mb 每个Map task设置map的最大内存
    mapreduce.reduce.memory.mb 每个Reduce task设置reduce的最大内存

TEZ引擎

   1.set hive.tez.cpu.vcores=3 设置每个task的cpu使用
   2.如果tez是contain模式,尽量调节contain的数量,因为contain多了,那么里面运行的 task也多了,并行度就上来了

spark引擎

1
>set hive.execution.engine=spark;

Hive实战

1· 创建内表

1
2
3
4
5
>create table bigdata.eimrs(id string,name string,worker string,happies string,age int);
>insert into table bigdata.eimrs values("0","张三","程序员","踢足球",30);
>insert into table bigdata.eimrs values("1","李四","艺术家","打篮球",28);
>insert into table bigdata.eimrs  values("2","王五","程序员","乒乓球",32);
>insert into table bigdata.eimrs  values("3","成哥","程序员","铅球",32);

2· 创建外部表

1
2
3
4
5
>create external table student (s_id string,s_name string) row format delimited fields terminated by '\t';
>create external table bigdata.eimrs0(id string comment "id",name string comment "名称",teacher string comment "授课老师信息")
comment "测试JSON格式外部表"
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile;

3· 创建分区表

1
2
>create table score2 (s_id string, s_score int) partitioned by (year string,month string,day string);
>insert into table score2 partition(year='2021',month='01',day='01') values("1",20);

4· 创建表指定obs 路径 以华为云为例 阿里云类似

1
2
>create database if not exists $database_name  location  "obs://桶/目录/";
>create table test(name string) location "obs://桶/user/hive/warehouse/test";

5. load本地数据

追加操作

1
2
3
>load data local inpath '/export/servers/hivedatas/student.csv' into table student;

>load data local inpath '/export/servers/hivedatas/student.csv' overwrite  into table student;

6. loadHDFS数据

1
2
>load data inpath '/hivedatas/techer.csv' into table techer;
>load data inpath '/hivedatas/techer.csv' into table techer partition(cur_date=20201210);  (load到指定分区)

7. 修改hive表的ower

1
>ALTER TABLE $table_name SET OWNER user $username;

8. 修改存储路径

https://www.cnblogs.com/weijiqian/p/14335772.html 

1
>ALTER TABLE $table_name SET location '/hive/warehouse/test.db/tb_table2';