前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >0922-7.1.9-使用Spark和Hive访问Ozone

0922-7.1.9-使用Spark和Hive访问Ozone

作者头像
Fayson
发布2024-05-09 14:22:24
820
发布2024-05-09 14:22:24
举报
文章被收录于专栏:Hadoop实操Hadoop实操

1 使用Spark访问Ozone

1.下载数据并上传到Ozone的bucket中

代码语言:javascript
复制
hdfs dfs -mkdir -p ofs://ozone1/data/vehicles
wget -qO - https://www.fueleconomy.gov/feg/epadata/vehicles.csv | hdfs dfs -copyFromLocal - ofs://ozone1/data/vehicles/vehicles.csv 

2.使用Spark拆分数据并保存到Hive外表中。

代码语言:javascript
复制
spark-shell --conf "spark.debug.maxToStringFields=90" --conf spark.yarn.access.hadoopFileSystems="ofs://ozone1/" << EOF
val df = spark.read.format("csv").option("header", "true").load("ofs://ozone1/data/vehicles/vehicles.csv")
df.createOrReplaceTempView("tempvehicle")
spark.sql("create table vehicles stored as parquet location 'ofs://ozone1/data/vehicles/vehicles' as select * from tempvehicle");
EOF

3.运行以下Spark SQL代码:

代码语言:javascript
复制
spark-shell --conf "spark.sql.debug.maxToStringFields=90" --conf spark.yarn.access.hadoopFileSystems="ofs://ozone1/" << EOF
var df=spark.sql("select count(*) from vehicles")
df.show()
EOF

4.运行以下Spark SQL代码:

代码语言:javascript
复制
spark-shell --conf "spark.sql.debug.maxToStringFields=90" --conf spark.yarn.access.hadoopFileSystems="ofs://ozone1/" << EOF
var df=spark.sql("select make, count(*) from vehicles group by make order by 2 desc limit 10")
df.show()
EOF

5.运行以下Spark SQL代码:

代码语言:javascript
复制
spark-shell --conf "spark.sql.debug.maxToStringFields=90" --conf spark.yarn.access.hadoopFileSystems="ofs://ozone1/" << EOF
var df=spark.sql("select make,model, count(*) from vehicles group by make,model order by 3 desc limit 10")
df.show()
EOF

6.运行以下Spark SQL代码:

代码语言:javascript
复制
spark-shell --conf "spark.sql.debug.maxToStringFields=90" --conf spark.yarn.access.hadoopFileSystems="ofs://ozone1/" << EOF
var df=spark.sql("select make,model, max(cast(combe as float)) from vehicles group by make,model order by 3 desc, 1,2 limit 10")
df.show()
EOF

2 使用Hive访问Ozone

1.在Hive中建表

代码语言:javascript
复制
CREATE EXTERNAL TABLE `hive_vehicles`(
`barrels08` string,`barrelsa08` string,`charge120` string,`charge240` string,`city08` string,`city08u` string,`citya08` string,`citya08u` string,`citycd` string,`citye` string,`cityuf` string,`co2` string,`co2a` string,`co2tailpipeagpm` string,`co2tailpipegpm` string,`comb08` string,`comb08u` string,`comba08` string,`comba08u` string,`combe` string,`combinedcd` string,`combineduf` string,`cylinders` string,`displ` string,`drive` string,`engid` string,`eng_dscr` string,`fescore` string,`fuelcost08` string,`fuelcosta08` string,`fueltype` string,`fueltype1` string,`ghgscore` string,`ghgscorea` string,`highway08` string,`highway08u` string,`highwaya08` string,`highwaya08u` string,`highwaycd` string,`highwaye` string,`highwayuf` string,`hlv` string,`hpv` string,`id` string,`lv2` string,`lv4` string,`make` string,`model` string,`mpgdata` string,`phevblended` string,`pv2` string,`pv4` string,`range` string,`rangecity` string,`rangecitya` string,`rangehwy` string,`rangehwya` string,`trany` string,`ucity` string,`ucitya` string,`uhighway` string,`uhighwaya` string,`vclass` string,`year` string,`yousavespend` string,`guzzler` string,`trans_dscr` string,`tcharger` string,`scharger` string,`atvtype` string,`fueltype2` string,`rangea` string,`evmotor` string,`mfrcode` string,`c240dscr` string,`charge240b` string,`c240bdscr` string,`createdon` string,`modifiedon` string,`startstop` string,`phevcity` string,`phevhwy` string,`phevcomb` string) 
row format delimited 
fields terminated by ','
location 'ofs://ozone1/hive/warehouse/distcp/vehicles';

2.在Hive中执行以下SQL

代码语言:javascript
复制
alter table hive_vehicles set tblproperties('skip.header.line.count'='1');
select count(*) from hive_vehicles;

3.在Hive中执行以下SQL

代码语言:javascript
复制
select make, count(*) from hive_vehicles group by make order by 2 desc limit 10;
select make,model, count(*) from hive_vehicles group by make,model order by 3 desc limit 10;
select make,model, max(cast(combe as float)) from hive_vehicles group by make,model order by 3 desc, 1,2 limit 10;

4.在Ozone下创建warehouse

代码语言:javascript
复制
CREATE DATABASE ozone_wh 
LOCATION 'ofs://ozone1/hive/warehouse/external' 
MANAGEDLOCATION 'ofs://ozone1/hive/warehouse/managed';

5.创建内表,并插入数据

代码语言:javascript
复制
create table ozone_wh.test_managed (name string, value string);
show create table ozone_wh.test_managed;
insert into ozone_wh.test_managed values ('foo1', 'bar1');

6.创建外表并插入数据:

代码语言:javascript
复制
create external table ozone_wh.test_external (name string, value string);
show create table ozone_wh.test_external;
insert into ozone_wh.test_external values ('foo1', 'bar1');

7.查看一下内外两个表在Ozone下的文件:

代码语言:javascript
复制
ozone fs -ls -R ofs://ozone1/hive/warehouse/managed
ozone fs -ls -R ofs://ozone1/hive/warehouse/external
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2024-05-01,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Hadoop实操 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1 使用Spark访问Ozone
  • 2 使用Hive访问Ozone
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档