centos 7.5上安装hadoop 2.7.7

服务安装说明

主机NameNodezkfcJournalNodeResourceManagerDataNodeNodeManager
data-repo-01
data-repo-02×
data-repo-03×××

最优的方式是管理归管理(namenode,standbynamenode,resourcemanager各一台机器),工作归工作(工作节点只有datanode和nodemanager进程)

安装hadoop

1.解压hadoop安装包

tar -zxvf hadoop-2.7.7.tar.gz

2.创建目录

cd /home/hadoop/soft/hadoop-2.7.7
mkdir -p {hdfs/namenode,hdfs/datanode,hdfs/journalnode,tmp,logs,pid}
#创建文件
touch hdfs/excludes

3.设置环境变量

vim /etc/profile
# 追加内容
export HADOOP_HOME=/home/hadoop/soft/hadoop-2.7.7
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH

4.编辑core-site.xml文件

vim etc/hadoop/core-site.xml

配置内容:

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/soft/hadoop-2.7.7/tmp</value>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://hadoop-cluster</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
    </property>
    <property>
        <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec
        </value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>data-repo-01:2181,data-repo-02:2181,data-repo-03:2181</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
    <property>
      <name>hadoop.proxyuser.hue.hosts</name>
      <value>*</value>
    </property>
    <property>
      <name>hadoop.proxyuser.hue.groups</name>
      <value>*</value>
    </property>
</configuration>

5.编辑hdfs-site.xml文件

vim etc/hadoop/hdfs-site.xml

配置内容:

<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>data-repo-01:9001</value>
</property>
<property> 
    <name>dfs.namenode.name.dir</name>  
    <value>file:/home/hadoop/soft/hadoop-2.7.7/hdfs/namenode</value>  
</property>  
<property> 
    <name>dfs.datanode.data.dir</name>  
    <value>file:/home/hadoop/soft/hadoop-2.7.7/hdfs/datanode</value>  
</property>  
<property> 
    <name>dfs.replication</name>  
    <value>3</value>  
</property>
<property>
    <name>dfs.nameservices</name>
    <value>hadoop-cluster</value>
</property>
<property> 
    <name>dfs.webhdfs.enabled</name>  
    <value>true</value>  
</property>
<property>
    <name>dfs.datanode.fsdataset.volume.choosing.policy</name>
    <value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
 <property>
     <name>dfs.hosts.exclude</name>
     <value>/home/hadoop/soft/hadoop-2.7.7/hdfs/excludes</value>
 </property>
<property> 
    <name>dfs.ha.namenodes.hadoop-cluster</name>  
    <value>nn01,nn02</value>  
</property>
<property> 
    <name>dfs.namenode.rpc-address.hadoop-cluster.nn01</name>  
    <value>data-repo-01:9000</value> 
</property>  
<property> 
    <name>dfs.namenode.rpc-address.hadoop-cluster.nn02</name>  
    <value>data-repo-02:9000</value>  
</property>
<property> 
    <name>dfs.namenode.http-address.hadoop-cluster.nn01</name>  
    <value>data-repo-01:50070</value>
</property>  
<property> 
    <name>dfs.namenode.http-address.hadoop-cluster.nn02</name>  
    <value>data-repo-02:50070</value>  
</property> 
<property> 
    <name>dfs.namenode.shared.edits.dir</name>  
    <value>qjournal://data-repo-01:8485;data-repo-02:8485;data-repo-03:8485/hadoop-cluster</value>
</property>
<property> 
    <name>dfs.client.failover.proxy.provider.hadoop-cluster</name>  
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>  
</property>
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
</property>
<property> 
    <name>dfs.ha.fencing.ssh.private-key-files</name>  
    <value>/home/hadoop/.ssh/id_rsa</value>  
</property>
<property> 
    <name>dfs.journalnode.edits.dir</name>  
    <value>/home/hadoop/soft/hadoop-2.7.7/hdfs/journalnode</value> 
</property>  
<property> 
    <name>dfs.ha.automatic-failover.enabled</name>  
    <value>true</value>  
</property>
<property>
    <name>dfs.datanode.max.xcievers</name>
    <value>65530</value>
</property>

6.编辑mapred-site.xml文件

cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
vim etc/hadoop/mapred-site.xml

配置内容:

<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>data-repo-01:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>data-repo-01:19888</value>
</property>
<property>
    <name>mapreduce.map.output.compress</name>
    <value>true</value>
</property>
<property>
    <name>mapreduce.map.output.compress.codec</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

7.编辑yarn-site.xml文件

vim etc/hadoop/yarn-site.xml

配置内容:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>data-repo-01:8032</value>
    </property>
    
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>data-repo-01:8030</value>
    </property>
    
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>data-repo-01:8031</value>
    </property>
    
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>data-repo-01:8033</value>
    </property>
    
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>data-repo-01:8088</value>
    </property>
</configuration>

8.编辑httpfs-site.xml文件

vim etc/hadoop/httpfs-site.xml

配置内容:

<property>
    <name>httpfs.proxyuser.hue.hosts</name>
    <value>*</value>
</property>
<property>
    <name>httpfs.proxyuser.hue.groups</name>
    <value>*</value>
</property>

<property>
    <name>httpfs.proxyuser.hadoop.hosts</name>
    <value>*</value>
</property>
<property>
    <name>httpfs.proxyuser.hadoop.groups</name>
    <value>*</value>
</property>

9.编辑slaves文件

# hadoop 3.x后改为vim etc/hadoop/workers
vim etc/hadoop/slaves

配置内容:(datanode节点数最少要配置三台)

data-repo-01
data-repo-02
data-repo-03

10.配置start-dfs.sh和stop-dfs.sh文件

vim sbin/start-dfs.sh

追加内容:

HDFS_DATANODE_USER=hadoop
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_ZKFC_USER=hadoop
HDFS_JOURNALNODE_USER=hadoop
HDFS_NAMENODE_USER=hadoop
HDFS_SECONDARYNAMENODE_USER=hadoop

11.配置start-yarn.sh和stop-yarn.sh文件

vim sbin/start-yarn.sh

追加内容:

YARN_RESOURCEMANAGER_USER=hadoop
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=hadoop

12.配置hadoop-env.sh文件

vim etc/hadoop/hadoop-env.sh

配置内容:

export JAVA_HOME=/usr/local/jdk1.8.0_261

# 在HADOOP_OPTS中添加-Djava.library.path=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=${HADOOP_HOME}/lib/native"

# 修改内容
export HADOOP_PID_DIR=/home/hadoop/soft/hadoop-2.7.7/pid
export HADOOP_SECURE_DN_PID_DIR=/home/hadoop/soft/hadoop-2.7.7/pid

13.修改yarn-daemon.sh和hadoop-daemon.sh文件

pid最好不要放到/tmp,系统会自动清理

修改yarn-daemon.sh

vim sbin/yarn-daemon.sh

# 添加内容
YARN_PID_DIR=/home/hadoop/soft/hadoop-2.7.7/pid

#修改etc/hadoop/yarn-env.sh,尾部添加
YARN_PID_DIR=/home/hadoop/soft/hadoop-2.7.7/pid

修改hadoop-daemon.sh

vim sbin/hadoop-daemon.sh

# 修改内容
YARN_PID_DIR=/home/hadoop/soft/hadoop-2.7.7/pid
HADOOP_PID_DIR=/home/hadoop/soft/hadoop-2.7.7/pid
HADOOP_SECURE_DN_PID_DIR=/home/hadoop/soft/hadoop-2.7.7/pid

14.将hadoop-2.7.7目录复制到其他主机中,并在目标节点中配置环境变量

scp -r hadoop-2.7.7 hadoop@data-repo-02:/home/hadoop/soft/
scp -r hadoop-2.7.7 hadoop@data-repo-03:/home/hadoop/soft/

15.启动journalnode

一次性启动所有journalnode节点:

sbin/hadoop-daemons.sh start journalnode

或者一台一台启动(在data-repo-01,data-repo-02,data-repo-03中执行):

sbin/hadoop-daemon.sh start journalnode
#hadoop 3.x可以
#bin/hdfs --daemon start journalnode

16.格式化及启动namenode

  • 在data-repo-01上执行
#创建zk名称空间
bin/hdfs zkfc -formatZK
#格式化namenode
hdfs namenode -format
# 启动namenode
sbin/hadoop-daemon.sh start namenode
  • 在data-repo-02上执行
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode

此时还没有启动zkfc,nn01和nn02都是standby的状态。

17.启动hadoop其他服务(在data-repo-01上执行)

# 第15和16步骤在第一次安装hadoop时需要按步骤来,其他时候重启整个Hadoop执行start-all.sh和执行18步骤就可以了
sbin/start-all.sh

启动后查看data-repo-01和data-repo-02节点是否启动了zkfc,如果没有着手动启下:

sbin/hadoop-daemon.sh start zkfc

18.启动data-repo-02的namenode热备

执行sbin/start-all.sh后可能热备的namenode也会跟随启动,如果没有跟随启动只执行该步骤(整个hadoop集群只能有一个主nanmenode)

# 启动热备,第一次启动热备namenode时需要,不然会提示该节点没有格式化
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode

#或者
sbin/hadoop-daemon.sh start secondarynamenode

可以通过命令查看当前主namenode在哪台节点上:

# 输出active是主,standby是热备
hdfs haadmin -getServiceState nn01

19.进程检查

依次:data-repo-01,data-repo-02,data-repo-03

data-repo-01
data-repo-02
data-repo-03

20.访问地址

yarn webapp地址:http://data-repo-01:8088/cluster

Namenode webapp地址:http://data-repo-01:50070/dfshealth.html#tab-overview

21.mapreduce测试是否可用

  • 创建文件
vim test.txt

#文件添加内容
hello world
hello hadoop
  • 创建输入目录
hadoop fs -mkdir /input
  • 将test文件上传到hdfs
hadoop fs -put test.txt /input
  • 执行任务
yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /input /output

常见问题

1.namenode启动失败,提示无法与其他主机journalnode通信

现象:

【telnet本机】

telnet localhost 8485 #可以通
telnet data-repo-01 8485 #可以通
telnet 192.168.20.131 #不通

【telnet其他主机】

telnet data-repo-02 8485 #不通
telnet 192.168.20.132 #不通

解决办法,重启设备:

vim /etc/hosts
#注释一下内容
#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

2.如果没有DFSZKFailoverController进程可能是journalnode启动过程有问题,查看日志排除问题

3.启动zkfc后如果namenode没有起来会一直提示连接失败,这是正常的,启动namenode即可

更多hadoop资料参考:
http://hadoop.apache.org/docs/r2.9.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

如果觉得我的文章对你有用,请随意赞赏