博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
个人集群部署hadoop 2.7 + hive 2.1
阅读量:6689 次
发布时间:2019-06-25

本文共 18832 字,大约阅读时间需要 62 分钟。

环境:centos 6.6 x64 (学习用3节点)

软件:jdk 1.7 + hadoop 2.7.3 + hive 2.1.1 

环境准备:

 1、安装必要工具

yum -y install openssh wget curl tree screen nano lftp htop mysql-client mysql-server

2、使用163的yum源:

cd /etc/yum.repo.d/wget http://mirrors.163.com/.help/CentOS7-Base-163.repo#备份mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backupmv CentOS7-Base-163.repo CentOS-Base.repo#生成缓存yum clean allyum makecache
View Code

3、 关闭图形界面init 3:

vim /etc/inittab  #将启动级别5更改为级别3字符界面启动

4、设置静态IP、修改主机名、hosts、

(1)规划

192.168.235.138 node1192.168.235.139 node2192.168.235.140 node3

下面在各个节点上,根据当前机器的规划设置IP、主机名、hosts

(2)静态IP(各个节点)

#方式一:使用setup在图形界面下设置# setup #方式二:修改网络配置文件,一个完整的设置如下 # cat /etc/sysconfig/network-scripts/ifcfg-Auto_eth1 HWADDR=00:0C:29:2C:9F:4ATYPE=EthernetBOOTPROTO=noneIPADDR=192.168.235.139PREFIX=24GATEWAY=192.168.235.1DNS1=192.168.235.1DEFROUTE=yesIPV4_FAILURE_FATAL=yesIPV6INIT=noNAME="Auto eth1"UUID=2753c781-4222-47bd-85e7-44877cde27ddONBOOT=yesLAST_CONNECT=1491415778
View Code

(3)主机名(各个节点)

# cat /etc/sysconfig/networkNETWORKING=yesHOSTNAME=node1      #修改hostname的值
View Code

(4)hosts(各个节点)

# cat /etc/hosts# 在文件末尾添加如下内容192.168.235.138 node1192.168.235.139 node2192.168.235.140 node3
View Code

5、关闭防火墙

# service iptables stop# service iptables status# chkconfig iptables off
View Code

6、建立普通用户

# useradd hadoop# passwd hadoop# visudo 在root    ALL=(ALL)       ALL行下面增加:hadoop  ALL=(ALL)       ALL
View Code

7、设置ssh免密码登录

方式一:自动部署脚本

# cat ssh.sh ERVERS="node1 node2 node3"PASSWORD=123456BASE_SERVER=192.168.235.138yum -y install expectauto_ssh_copy_id() {    expect -c "set timeout -1;        spawn ssh-copy-id $1;            expect {            *(yes/no)* {send -- yes\r;exp_continue;}            *assord:*  {send -- $2\r;exp_continue;}            eof        {exit 0;}        }"}ssh_copy_id_to_all() {    for SERVER in $SERVERS    do        auto_ssh_copy_id $SERVER $PASSWORD    done    }ssh_copy_id_to_all
View Code

方式二:手动设置

ssh-keygen -t  rsa  #生成公钥scp ~/.ssh/id_rsa.pub hadoop@192.168.235.139:~/ #使用scp或scp-copy-id 分发公钥到其他节点上
View Code

集群规划与安装

1、节点规划

规划:node01:NameNode、DataNode、NodeManager、node02:ResourceManager、DataNode、NodeManager、JobHisotrynode03:SecondaryNameNode、DataNode、NodeManager、

说明:注意节点功能的划分,DataNode存储数据,NodeManager处理数据,需要放在同一节点上,避免占用大量的网络带宽。

此处仅用于个人机器,学习使用。事实上,一个典型的生产环境示例如下

7台节点参考配置hadoop2.x (HA: 高可用)主机名    IP地址    进程cloud01    192.168.2.31    namenode    zkfc          cloud02    192.168.2.32    namenode    zkfc          cloud03    192.168.2.33    resourcemanager               cloud04    192.168.2.34    resourcemanager               cloud05    192.168.2.35    journalNode      datanode     nodemanager     QuorumaPeerMaincloud06    192.168.2.36    journalNode      datanode     nodemanager     QuorumaPeerMaincloud07    192.168.2.37    journalNode      datanode     nodemanager     QuorumaPeerMain 备注:   namenode: 管理元数据              resourcemanager: 用于资源控制              datanode :用于存储数据              nodemanager:用于数据计算              journalNode: 用于共享元数据存储              zkfc: ZooKeeper failOverSwitch ,namenode失败切换              QuorumaPeerMain : 是ZooKeeper启动进程
View Code

HA+zookeeper可以有效防止单点故障,实现自动故障转移。

摘自:

2、安装jdk、Hadoop

(1)安装JDK、Hadoop

上传软件包到服务器,并在软件包所在目录下编辑、运行如下脚本:

#!/bin/bashtar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/tar -zxvf hadoop-2.7.3.tar.gz /usr/local/cat >> /etc/profile << EOFexport JAVA_HOME=/opt/jdk1.7.0_79/export HADOOP_HOME=/usr/local/hadoop-2.7.3export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOE/sbinEOFsource /etc/profile
View Code

(2)预设Hadoop工作目录

mkdir -p hadoop/tmp  mkdir -p hadoop/dfs/data  mkdir -p hadoop/dfs/name  mkdir -p hadoop/namesecondary
View Code

3、配置Hadoop

几个基本的配置文件如下:

# cd /usr/local/hadoop-2.7.3/etc/hadoop/# ls -l | awk '{print $9}'core-site.xmlhadoop-env.shhdfs-site.xmlmapred-site.xmlslavesyarn-site.xml

配置内容如下:

(1)core-site.xml

fs.defaultFS
hdfs://node1:9000
io.file.buffer.size
131072
hadoop.tmp.dir
file:/usr/hadoop/tmp
Abase for other temporary directories.
fs.trash.interval
1440
View Code

(2)hadoop-env.sh

export JAVA_HOME=/opt/jdk1.7.0_79/

(3)hdfs-site.xml

dfs.namenode.name.dir
file:///usr/hadoop/dfs/name
dfs.datanode.data.dir
file:///usr/hadoop/dfs/data
dfs.namenode.secondary.http-address
node3:9001
dfs.namenode.checkpoint.dir
file:///usr/hadoop/namesecondary
dfs.replication
2
replication
dfs.webhdfs.enabled
true
dfs.permissions
false
dfs.datanode.max.transfer.threads
4096
View Code

(4)mapred-site.xml

mapreduce.framework.name
yarn
mapreduce.jobhistory.address
node2:10020
MapReduce JobHistory Server host:port,Default port is 10020.
mapreduce.jobhistory.webapp.address
node2:19888
MapReduce JobHistory Server Web UI host:port Default port is 19888.
yarn.app.mapreduce.am.staging-dir
/history
mapreduce.jobhistory.done-dir
${yarn.app.mapreduce.am.staging-dir}/history/done
mapreduce.jobhistory.intermediate-done-dir
${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate
mapreduce.map.log.level
DEBUG
mapreduce.reduce.log.level
DEBUG
View Code

(5)slaves

node1node2node3
View Code

(6)yarn-site.xml

yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.resourcemanager.address
node2:8032
ResourceManager host:port for clients to submit jobs.host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.scheduler.address
node2:8030
ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.resource-tracker.address
node2:8031
ResourceManager host:port for NodeManagers:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.admin.address
node2:8033
ResourceManager host:port for administrative commands.:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.webapp.address
node2:8088
ResourceManager web-ui host:port.host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.hostname
node2
host Single hostname that can be set in place of setting all yarn.resourcemanager*address resources. Results in default ports for ResourceManager components.
yarn.nodemanager.aux-services
mapreduce_shuffle
Shuffle service that needs to be set for Map Reduce applications.
yarn.log-aggregation-enable
true
yarn.log.server.url
http://node2:19888/jobhistory/logs
View Code

注意:实际配置文件中尽量不要有中文

此配置仅供参考,使用时,去掉注释即可,也可根据自己情况增删配置

4、启动集群

(1)格式化NameNode

hadoop namenode -format  #{HADOOP_HOME}/bin

(2)启动、关闭集群

几个常用启动命令

# ls -l | awk '{print $9}'start-all.sh/stop-all.sh #启动、关闭所有进程start-dfs.sh/stop-dfs.sh #启动、关闭hdfsstart-yarn.sh/stop-yarn.sh #启动、关闭yarnmr-jobhistory-daemon.sh #作业查看hadoop-daemon.sh / hadoop-daemons.sh yarn-daemon.sh / yarn-daemons.shstart-balancer.sh/stop-balancer.sh #更新datanode的文件块分布情况
View Code

三种启动方式:

三种启动方式:方式一:逐一启动(实际生产环境中的启动方式)hadoop-daemon.sh start|stop namenode|datanode| journalnodeyarn-daemon.sh start |stop resourcemanager|nodemanager方式二:分开启动start-dfs.shstart-yarn.sh方式三:一起启动start-all.sh作业查看服务:    mr-jobhistory-daemon.sh start historyserver
View Code

 部署Hive

启动Hadoop集群。需要注意的一点是Hive只在其中一台节点上部署即可,是没有Hive集群这个概念的。

1、启动、初始化MySQL

#启动mysql服务service mysqld start#加入到开机启动项chkconfig mysqld on#初始化配置mysql服务/usr/bin/mysql_secure_installation
View Code

注:

问题:Host '192.168.235.138' is not allowed to connect to this MySQL server

解决办法:
mysql> grant all privileges on *.* to 'root'@'%' identified by 'root';
mysql> flush privileges;

2、安装、配置hive

(1)安装

tar -zxvf apache-hive-2.1.1-bin.tar.gz -C /usr/local/cd /usr/local/mv apache-hive-2.1.1-bin/ hive-2.1.1find . -name "*.cmd" -exec rm -rf {} \;
View Code

(2)导入MySQL驱动

cp /usr/share/java/mysql-connector-java-commercial-5.1.25-bin.jar /usr/local/hive-2.1.1/lib/

注:如果没有需要先安装,yum -y install mysql-connector-java

(3)创建HDFS存储目录

hdfs dfs -mkdir -p /usr/hive/warehousehdfs dfs -mkdir -p /usr/hive/tmphdfs dfs -mkdir -p /usr/hive/loghdfs dfs -chmod g+w /usr/hive/warehousehdfs dfs -chmod g+w /usr/hive/tmphdfs dfs -chmod g+w /usr/hive/log
View Code

创建目录:

hadoop dfs -ls /usr/tmp hadoop dfs -mkdir -p /usr/tmp/hive/local hadoop dfs -mkdir -p /usr/tmp/hive/resources
View Code

(4)配置

备份文件

# pwd/usr/local/hive-2.1.1/conf# cp hive-env.sh.template hive-env.sh# cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties# cp hive-log4j2.properties.template hive-log4j2.properties# cp hive-default.xml.template hive-site.xml
View Code

修改hive-site.xml

javax.jdo.option.ConnectionURL
jdbc:mysql://node1:3306/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
root
javax.jdo.option.ConnectionPassword
123456
hive.metastore.warehouse.dir
/usr/hive/warehouse
hive.exec.scratchdir
/usr/hive/tmp
hive.querylog.location
/usr/hive/log
View Code

增加配置:

hive.exec.scratchdir
/usr/tmp/hive
hive.exec.local.scratchdir
/usr/tmp/hive/local
hive.downloaded.resources.dir
/usr/tmp/hive/resources
View Code

否则,会报错误:

Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D

修改hive-env.sh

HADOOP_HOME=/usr/local/hadoop-2.7.3export HIVE_CONF_DIR=/usr/local/hive-2.1.1/conf
View Code

(5)元数据初始化

# bin/schematool --help# bin/schematool -dbType mysql -initSchema  #元数据初始化

注:hive2需要元数据初始化,否则启动时会报错误:

# hive-2.1.1/bin/hivewhich: no hbase in ...SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in file:/usr/local/hive-2.1.1/conf/hive-log4j2.properties Async: trueException in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient    ...Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate     ...Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient    ...Caused by: java.lang.reflect.InvocationTargetException    ...Caused by: MetaException(message:Version information not found in metastore. )    ...
View Code

 (6)启动

方式一:cli

# /usr/local/hive-2.1.1/bin/hive

方式二:webUI

i.打war包,并导入到hive的lib目录下

#下载、解压源码包wget http://mirror.bit.edu.cn/apache/hive/stable-2/apache-hive-2.1.1-src.tar.gztar -zxvf apache-hive-2.1.1-src.tar.gz #提取jsp文件并打包成war文件cd apache-hive-2.1.1-src/hwi/jar cfM hive-hwi-2.1.1.war -C web .#将war包导入到hive中cp hive-hwi-2.1.1.war /usr/local/hive-2.1.1/lib/cp /opt/jdk1.7.0_79/lib/tools.jar /usr/local/hive-2.1.1/lib/

注意:打war包时,如果jar不加“-C”参数指定目录执行,都会报错:

adding: session_kill.jspjava.util.zip.ZipException: duplicate entry: session_kill.jsp

ii.修改hive-site.xml

hive.hwi.listen.host
0.0.0.0
This is the host address the Hive Web Interface will listen on
hive.hwi.listen.port
9999
This is the port the Hive Web Interface will listen on
hive.hwi.war.file
lib/hive-hwi-2.1.1.war
This sets the path to the HWI war file, relative to ${HIVE_HOME}.
View Code

iii.替换ant文件

wget http://124.205.69.164/files/823800000544EA17/mirror.bit.edu.cn/apache//ant/binaries/apache-ant-1.9.9-bin.tar.gztar -zxvf apache-ant-1.9.9-bin.tar.gz -C /opt/ant#替换原来的文件cp /opt/ant-1.9.9/lib/ant.jar /usr/local/hive-2.1.1/lib/cp /opt/ant-1.9.9/lib/ant-launcher.jar /usr/local/hive-2.1.1/lib/
View Code

若若不替换ant文件,会报500错误,这是因为hive使用的版本为1.9.1的,需要使用本地的ant版本

The following error occurred while executing this line:jar:file:/usr/local/hive-2.1.1/lib/ant-1.9.1.jar!/org/apache/tools/ant/antlib.xml:37: Could not create task or type of type: componentdef.

参考:

 # 部署HUE(待续)

1、安装依赖

#!/bin/bash#!/bin/bashyum -y install asciidocyum -y install cyrus-sasl-develyum -y install cyrus-sasl-gssapiyum -y install cyrus-sasl-plainyum -y install gccyum -y install gcc-c++yum -y install krb5-develyum -y install libffi-develyum -y install libtidy #(for unit tests only)yum -y install libxml2-develyum -y install libxslt-develyum -y install make#mysql  #已安装yum -y install mysql-develyum -y install openldap-develyum -y install python-develyum -y install sqlite-develyum -y install openssl-devel #(for version 7+)yum -y install gmp-devel#安装antwget wget http://mirror.bit.edu.cn/apache//ant/binaries/apache-ant-1.9.9-bin.tar.bz2bzip2 -d apache-ant-1.9.9-bin.tar.bz2tar xf apache-ant-1.9.9-bin.tar -C /opt/cd /optmv apache-ant-1.9.9/ ant-1.9.9vim /etc/profilesource /etc/profile#安装mavenwget http://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gztar -zxvf apache-maven-3.5.0-bin.tar.gz -C /opt/cd /opt/mv apache-maven-3.5.0/ maven-3.5.0/vim /etc/profilesource /etc/profile
View Code

2、下载、编译Hue

下载地址:,不要选择github上的,编译时会报如下错误:

error: can't copy 'lib/Crypto/SelfTest/Random/OSRNG/test_posix.py': doesn't exist or not a regular filemake[2]: *** [/usr/local/hue/desktop/core/build/pycrypto-2.6.1/egg.stamp] 错误 1make[2]: Leaving directory `/usr/local/hue/desktop/core'make[1]: *** [.recursive-env-install/core] 错误 2make[1]: Leaving directory `/usr/local/hue/desktop'make: *** [desktop] 错误 2
View Code

编译:

wget https://dl.dropboxusercontent.com/u/730827/hue/releases/3.12.0/hue-3.12.0.tgz  tar -zxvf hue-3.12.0.tgz -C /opt cd /opt/hue-3.12.0/ make apps

 编译成功后,hue所在目录下新增两个文件夹:app.regbuild

3、启动测试服务

./build/env/bin/hue runserver

打开浏览器访问127.0.0.1:8000,能正常访问说明编译成功,正式使用之前还需修改配置。

4、修改配置

(1)全局配置

# vim /opt/hue-3.12.0/desktop/conf/hue.ini修改内容如下:  21   secret_key=c!@#$%^&*yy{
146}[]<>?un`~:. #secret_key随便填写一个字符串即可,如果不填写的话Hue会提示错误信息,这个secret_key主要是出于安全考虑用来存储在session store中进行安全验证的。 29 http_host=192.168.235.140 34 time_zone=Asia/Shanghai #修改时区为亚洲时区

 (2)修改MySQL为元数据库

       hue默认使用sqlite作为元数据库,不推荐在生产环境中使用。会经常出现database is lock的问题。

   i、修改hue.ini,配置MySQL信息如下:

[[database]]         name=hue    engine=mysql    host=192.168.235.140    port=3306    user=root    password=root
View Code

   ii、创建并初始化MySQL元数据库

        连接到MySQL,创建数据库hue。

        初始化:

#  ./build/env/bin/hue help#  ./build/env/bin/hue syncdb#  ./build/env/bin/hue migrate

       初始化完成后,可以在Hue库中看到创建的表。启动服务,可在浏览器中正常访问。

5、启动服务

# ./build/env/bin/hue runserver

访问地址:主机名:8888

 

#可能遇到的错误及处理方法汇总

1、启动服务后,浏览器不能正常访问:OperationalError: (1045, "Access denied for user 'root'@'node3' (using password: YES)")

 解决:连接到MySQL,执行如下操作:

mysql> grant all privileges on *.* to 'root'@'%' identified by 'root';mysql> flush privileges;

2、启动服务后,浏览器不能正常访问:OperationalError: (1049, "Unknown database '/opt/hue-3.12.0/desktop/desktop.db'")

解决:可能是在配置MySQL(或其他元数据库)时,信息有误。以本文为例,在MySQL库中创建hue数据库,并在hue.ini中配置:

[[database]]   engine=mysql  host=node3  port=3306  user=root  password=123456  name=hue
View Code

重启服务,问题得到解决

3、ProgrammingError: (1146, "Table 'hue.desktop_settings' doesn't exist")

可能的原因:使用MySQL(或其他数据库做元数据库)后,没有进行初始化操作。解决方法参考:修改MySQL为元数据库部分

 

 

 

 

Hue参考:

安装配置和使用hue遇到的问题汇总 -

 

转载于:https://www.cnblogs.com/chinas/p/6684423.html

你可能感兴趣的文章
进程管理工具top、htop、glances、dstat
查看>>
使用Jenkins发布腾讯云项目
查看>>
sqlserver 2005数据库,提示属性Owner不可用于数据库“[test]”。该对象可能没有此属性...
查看>>
Spark通过Java Web提交任务
查看>>
appium实现的一个简单的测试用例
查看>>
IOS手机截屏
查看>>
Quidway AR 28-12 做自反ACL+NAT
查看>>
Spring的beanFacotry模拟
查看>>
监狱兔-我最喜欢的卡通片
查看>>
linux文件权限位详解
查看>>
Javascript动态加载脚本与样式
查看>>
LINUX用户和组小练习
查看>>
IPV6与VOIP
查看>>
Google搜索引擎特殊结果展示介绍
查看>>
集合框架-可变参数
查看>>
Nginx代理显实真实IP的解决
查看>>
开源的企业虚拟化平台:CecOS
查看>>
由于系统缓冲区空间不足或队列已满,不能执行套接字上的操作
查看>>
gns3 protocol is down的一个问题终于找到解决对策了
查看>>
centos 7 配置 iptable-service
查看>>