Linux单机运行hadoop平台,2上安装过程

2019-11-08 02:06栏目:网络时代
TAG:

Hadoop 1.0.3 在CentOS 6.2上安装过程 [个人安装通过的全程记录] 。

Now,let us begin to install the Hadoop on your linux 5 (RHEL5)

Hadoop-0.19.2的代码可以到Apache上下载,使用的Linux机器是RHEL 5,Linux上安装的Java版本为1.6.0_16,并且JAVA_HOME=/usr/java/jdk1.6.0_16
实践过程

//安装SSH 

 

1、ssh无密码验证登陆localhost
保证Linux系统的ssh服务已经启动,并保证能够通过无密码验证登陆本机Linux系统。如果不能保证,可以按照如下的步骤去做:
(1)启动命令行窗口,执行命令行:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
(2)ssh登陆localhost,执行命令行:
$ ssh localhost
第一次登录,会提示你无法建立到127.0.0.1的连接,是否要建立,输入yes即可,下面是能够通过无密码验证登陆的信息:
[root@localhost hadoop-0.19.2]# ssh localhost
Last login: Sun Aug  1 18:35:37 2010 from 192.168.0.104
[root@localhost ~]#

[root@localhost /]# sudo yum install ssh

firstly

2、Hadoop-0.19.0配置
下载hadoop-0.19.0.tar.gz,大约是40.3M,解压缩到Linux系统指定目录,www.linuxidc.com这里我的是/root/hadoop-0.19.2目录下。
下面按照有序的步骤来说明配置过程:
(1)修改hadoop-env.sh配置
将Java环境的配置进行修改后,并取消注释“#”,修改后的行为:
export JAVA_HOME=/usr/java/jdk1.6.0_16
(2)修改hadoop-site.xml配置
在<configuration>与</configuration>加上3个属性的配置,修改后的配置文件内容为:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

//生成密钥 

we should start ssh service inside your Linux

3运行wordcount实例
wordcount例子是hadoop发行包中自带的实例,通过运行实例可以感受并尝试理解hadoop在执行MapReduce任务时的执行过程。按照官方的“Hadoop Quick Start”教程基本可以容易地实现,下面简单说一下我的练习过程。
导航到hadoop目录下面,我的是/root/hadoop-0.19.0。
(1)格式化HDFS
执行格式化HDFS的命令行:
[root@localhost hadoop-0.19.2]# bin/hadoop namenode -format
格式化执行信息如下所示:
10/08/01 19:04:02 INFO namenode.NameNode: STARTUP_MSG:

[root@localhost /]# ssh-keygen 

 

Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y
Format aborted in /tmp/hadoop-root/dfs/name
10/08/01 19:04:05 INFO namenode.NameNode: SHUTDOWN_MSG:

(可以一路回车)

[root@localhost conf]# service sshd start

(2)启动Hadoop相关后台进程
执行命令行:
[root@localhost hadoop-0.19.2]# bin/start-all.sh
启动执行信息如下所示:
starting namenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-namenode-localhost.out
localhost: starting datanode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-datanode-localhost.out
localhost: starting secondarynamenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-secondarynamenode-localhost.out
starting jobtracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-jobtracker-localhost.out
localhost: starting tasktracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-tasktracker-localhost.out
(3)准备执行wordcount任务的数据
首先,这里在本地创建了一个数据目录input,并拷贝一些文件到该目录下面,如下所示:
[root@localhost hadoop-0.19.2]# mkdir input
[root@localhost hadoop-0.19.2]# cp CHANGES.txt LICENSE.txt NOTICE.txt README.txt input/
然后,将本地目录input上传到HDFS文件系统上,执行如下命令:
[root@localhost hadoop-0.19.2]# bin/hadoop fs -put input/ input
(4)启动wordcount任务
执行如下命令行:
[root@localhost hadoop-0.19.2]# bin/hadoop jar hadoop-0.19.2-examples.jar wordcount input output
元数据目录为input,输出数据目录为output。
任务执行信息如下所示:
10/08/01 19:06:15 INFO mapred.FileInputFormat: Total input paths to process : 4
10/08/01 19:06:15 INFO mapred.JobClient: Running job: job_201008011904_0002
10/08/01 19:06:16 INFO mapred.JobClient:  map 0% reduce 0%
10/08/01 19:06:22 INFO mapred.JobClient:  map 20% reduce 0%
10/08/01 19:06:24 INFO mapred.JobClient:  map 40% reduce 0%
10/08/01 19:06:25 INFO mapred.JobClient:  map 60% reduce 0%
10/08/01 19:06:27 INFO mapred.JobClient:  map 80% reduce 0%
10/08/01 19:06:28 INFO mapred.JobClient:  map 100% reduce 0%
10/08/01 19:06:38 INFO mapred.JobClient:  map 100% reduce 26%
10/08/01 19:06:40 INFO mapred.JobClient:  map 100% reduce 100%
10/08/01 19:06:41 INFO mapred.JobClient: Job complete: job_201008011904_0002
10/08/01 19:06:41 INFO mapred.JobClient: Counters: 16
10/08/01 19:06:41 INFO mapred.JobClient:   File Systems
10/08/01 19:06:41 INFO mapred.JobClient:     HDFS bytes read=301489
10/08/01 19:06:41 INFO mapred.JobClient:     HDFS bytes written=113098
10/08/01 19:06:41 INFO mapred.JobClient:     Local bytes read=174004
10/08/01 19:06:41 INFO mapred.JobClient:     Local bytes written=348172
10/08/01 19:06:41 INFO mapred.JobClient:   Job Counters
10/08/01 19:06:41 INFO mapred.JobClient:     Launched reduce tasks=1
10/08/01 19:06:41 INFO mapred.JobClient:     Launched map tasks=5
10/08/01 19:06:41 INFO mapred.JobClient:     Data-local map tasks=5
10/08/01 19:06:41 INFO mapred.JobClient:   Map-Reduce Framework
10/08/01 19:06:41 INFO mapred.JobClient:     Reduce input groups=8997
10/08/01 19:06:41 INFO mapred.JobClient:     Combine output records=10860
10/08/01 19:06:41 INFO mapred.JobClient:     Map input records=7363
10/08/01 19:06:41 INFO mapred.JobClient:     Reduce output records=8997
10/08/01 19:06:41 INFO mapred.JobClient:     Map output bytes=434077
10/08/01 19:06:41 INFO mapred.JobClient:     Map input bytes=299871
10/08/01 19:06:41 INFO mapred.JobClient:     Combine input records=39193
10/08/01 19:06:41 INFO mapred.JobClient:     Map output records=39193
10/08/01 19:06:41 INFO mapred.JobClient:     Reduce input records=10860
(5)查看任务执行结果
可以通过如下命令行:
bin/hadoop fs -cat output/*
执行结果,截取部分显示如下所示:
vijayarenu      20
violations.     1
virtual 3
vis-a-vis       1
visible 1
visit   1
volume  1
volume, 1
volumes 2
volumes.        1
w.r.t   2
wait    9
waiting 6
waiting.        1
waits   3
want    1
warning 7
warning,        1
warnings        12
warnings.       3
warranties      1
warranty        1
warranty,       1
(6)终止Hadoop相关后台进程
执行如下命令行:
[root@localhost hadoop-0.19.2]# bin/stop-all.sh
执行信息如下所示:
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
已经将上面列出的5个进程jobtracker、tasktracker、namenode、datanode、secondarynamenode
异常分析
在进行上述实践过程中,可能会遇到某种异常情况,大致分析如下:
1、Call to localhost/127.0.0.1:9000 failed on local exception异常
(1)异常描述
可能你会在执行如下命令行的时候出现:
[root@localhost hadoop-0.19.2]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output
出错异常信息如下所示:
10/08/01 19:50:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
10/08/01 19:50:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
10/08/01 19:50:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
10/08/01 19:50:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
10/08/01 19:50:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
10/08/01 19:51:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
10/08/01 19:51:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
10/08/01 19:51:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
10/08/01 19:51:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
10/08/01 19:51:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
java.lang.RuntimeException: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused
        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:323)
        at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:295)
        at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:268)
        at org.apache.hadoop.examples.WordCount.run(WordCount.java:146)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused
        at org.apache.hadoop.ipc.Client.call(Client.java:699)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:319)
        ... 21 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
        at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
        at org.apache.hadoop.ipc.Client.call(Client.java:685)
        ... 33 more
(2)异常分析
从上述异常信息分析,这句是关键:
Retrying connect to server: localhost/127.0.0.1:9000.
是说在尝试10次连接到“server”时都无法成功,这就说明到server的通信链路是不通的。我们已经在hadoop-site.xml中配置了namenode结点的值,如下所示:
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
所以,很可能namenode进程根本就没有启动,更不必谈要执行任务了。
上述异常,过程是:
格式化了HDFS,但是没有执行bin/start-all.sh,直接启动wordcount任务,就出现上述异常。
所以,应该执行bin/start-all.sh以后再启动wordcount任务。
2、Input path does not exist异常
(1)异常描述
当你在当前hadoop目录下面创建一个input目录,并cp某些文件到里面,开始执行:
[root@localhost hadoop-0.19.2]# bin/hadoop namenode -format
[root@localhost hadoop-0.19.2]# bin/start-all.sh 
这时候,你认为input已经存在,应该可以执行wordcount任务了:
[root@localhost hadoop-0.19.2]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output
结果抛出一堆异常,信息如下:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)
        at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
上述异常,我模拟的过程是:
[root@localhost hadoop-0.19.2]# bin/hadoop fs -rmr input
Deleted hdfs://localhost:9000/user/root/input
[root@localhost hadoop-0.19.2]# bin/hadoop fs -rmr output
Deleted hdfs://localhost:9000/user/root/output
(2)异常分析
本地的input目录并没有上传到HDFS上,所出现org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input
只需要执行上传的命令即可:
[root@localhost hadoop-0.19.2]# bin/hadoop fs -put input/ input

生成下面两个文件:

Starting sshd:                                             [  OK  ]

图片 1

/root/.ssh/id_rsa

 

/root/.ssh/id_rsa.pub

create a folder in usr

[root@localhost .ssh]# cat ./id_rsa.pub>>./authorized_keys

[root@localhost usr]# mkdir hadoop

 

then put the essential sources into the folder hadoop:

[root@localhost .ssh]# cd /home

hadoop-1.0.0.tar.gz,you can download from

//配置JDK环境变量 

then change temportatry to usr/hadoop

[root@localhost opt]# vi /etc/profile

[root@localhost hadoop]# tar –zxvf  hadoop-1.0.0.tar.gz we use this command to tar it

 

then you will see another folder,

export JAVA_HOME=/opt/jdk1.6.0_31

[root@localhost hadoop]# cd hadoop-1.0.0

export PATH=$JAVA_HOME/bin:$PATH:.

[root@localhost hadoop-1.0.0]# cd conf

//使配置生效

[root@localhost conf]# ls

[root@localhost opt]# source /etc/profile

capacity-scheduler.xml      hadoop-policy.xml      slaves

 

configuration.xsl           hdfs-site.xml          ssl-client.xml.example

//安装Hadoop 1.0.3 

core-site.xml               log4j.properties       ssl-server.xml.example

[root@localhost opt]# rpm -i hadoop-1.0.3-1.x86_64.rpm

fair-scheduler.xml          mapred-queue-acls.xml  taskcontroller.cfg

 

hadoop-env.sh               mapred-site.xml

//查看安装后的Hadoop版本号信息

hadoop-metrics2.properties  masters

[root@localhost opt]# hadoop version

[root@localhost conf]# vi hadoop-env.sh

 

we will see and modify the file,pay attention to the bold sentences!!!

修改hadoop配置文件(/etc/hadoop)

#Set Hadoop-specific environment variables here.

[root@localhost hadoop]# vi hadoop-env.sh

 

 

# The only required environment variable is JAVA_HOME.  All others are

export JAVA_HOME=/opt/jdk1.6.0_31

# optional.  When running a distributed configuration it is best to

 

# set JAVA_HOME in this file, so that it is correctly defined on

 

# remote nodes.

[root@localhost hadoop]# vi core-site.xml

 

 

# The java implementation to use.  Required.

<configuration>

  export JAVA_HOME=/usr/java/jdk1.6.0_20

<property>

 

<name>fs.default.name</name>

# Extra Java CLASSPATH elements.  Optional.

<value>hdfs://192.168.1.101:9000</value>

  export HADOOP_CLASSPATH=/usr/hadoop/hadoop-1.0.0

</property>

  export PATH=$PATH:/usr/hadoop/hadoop-1.0.0/

<property>

# The maximum amount of heap to use, in MB. Default is 1000.

<name>hadoop.tmp.dir</name>

# export HADOOP_HEAPSIZE=2000

<value>/hadoop</value>

 

</property>

# Extra Java runtime options.  Empty by default.

</configuration>

# export HADOOP_OPTS=-server

 

 

[root@localhost hadoop]# vi hdfs-site.xml

# Command specific options appended to HADOOP_OPTS when specified

 

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"

<configuration>

 [root@localhost conf]# source hadoop-env.sh

<property>

[root@localhost conf]#

<name>dfs.replication</name>

[root@localhost conf]# cd ..

<value>1</value>

[root@localhost hadoop-1.0.0]# bin/hadoop

</property>

we can use this to check whether the file is modified correctly…

</configuration>

Usage: hadoop [--config confdir] COMMAND

 

where COMMAND is one of:

 

  namenode -format     format the DFS filesystem

[root@localhost hadoop]# vi mapred-site.xml

  secondarynamenode    run the DFS secondary namenode

 

  namenode             run the DFS namenode

<configuration>

  datanode             run a DFS datanode

<property>

  dfsadmin             run a DFS admin client

<name>mapred.job.tracker</name>

  mradmin              run a Map-Reduce admin client

<value>192.168.1.101:9001</value>

  fsck                 run a DFS filesystem checking utility

</property>

  fs                   run a generic filesystem user client

</configuration>

  balancer             run a cluster balancing utility

 

  fetchdt              fetch a delegation token from the NameNode

//格式化文件系统 

  jobtracker           run the MapReduce job Tracker node

[root@localhost opt]# hadoop namenode -format

  pipes                run a Pipes job

 

  tasktracker          run a MapReduce task Tracker node

//启动Hadoop相关的所有服务 

  historyserver        run job history servers as a standalone daemon

[root@localhost sbin]# start-all.sh

  job                  manipulate MapReduce jobs

 

  queue                get information regarding JobQueues

(如果没有执行权限,需要将/usr/sbin目录下的相关sh文件设置执行权限)

  version              print the version

说明:

  jar <jar>            run a jar file

start-all.sh

  distcp <srcurl> <desturl> copy file or directories recursively

stop-all.sh

  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive

start-dfs.sh

  classpath            prints the class path needed to get the

stop-dfs.sh

                       Hadoop jar and the required libraries

start-mapred.sh

  daemonlog            get/set the log level for each daemon

stop-mapred.sh

 or

 

  CLASSNAME            run the class named CLASSNAME

 //jps查看已经启动的服务进程信息

Most commands print help when invoked w/o parameters.

[root@localhost hadoop]# jps

[root@localhost hadoop-1.0.0]#

 

[root@localhost hadoop-1.0.0]#

5131 NameNode

then

5242 DataNode

Setup passphraseless ssh

5361 SecondaryNameNode

Now check that you can ssh to the localhost without a passphrase:

5583 TaskTracker

$ ssh localhost

5463 JobTracker

 

6714 Jps

If you cannot ssh to localhost without a passphrase, execute the following commands:

 

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

(访问  )

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

[root@localhost hadoop]# hadoop dfsadmin -report

 

 

then modify the configuration

 

[root@localhost conf]# vi core-site.xml

为运行例子 wordcount 作准备

<?xml version="1.0"?>

[root@localhost opt]# hadoop fs -mkdir input

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

[root@localhost opt]# echo "Hello World Bye World" > file01

<configuration>

[root@localhost opt]# echo "Hello Hadoop Goodbye Hadoop" > file02

<property>

 

<name>fs.default.name</name>

[root@localhost opt]# hadoop fs -copyFromLocal ./file0* input

<value>hdfs://localhost:9000</value>

 

</property>

 

</configuration>

运行例子 wordcount

 

[root@localhost opt]# hadoop jar /usr/share/hadoop/hadoop-examples-1.0.3.jar wordcount input output

[root@localhost conf]# vi hdfs-site.xml

 

<?xml version="1.0"?>

12/08/11 12:00:30 INFO input.FileInputFormat: Total input paths to process : 2

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

12/08/11 12:00:30 INFO util.NativeCodeLoader: Loaded the native-hadoop library

<!-- Put site-specific property overrides in this file. -->

12/08/11 12:00:30 WARN snappy.LoadSnappy: Snappy native library not loaded

<configuration>

12/08/11 12:00:31 INFO mapred.JobClient: Running job: job_201208111137_0001

<property>

12/08/11 12:00:32 INFO mapred.JobClient:  map 0% reduce 0%

<name>dfs.replication</name>

12/08/11 12:01:05 INFO mapred.JobClient:  map 100% reduce 0%

<value>1</value>

12/08/11 12:01:20 INFO mapred.JobClient:  map 100% reduce 100%

</property>

12/08/11 12:01:25 INFO mapred.JobClient: Job complete: job_201208111137_0001

</configuration>

12/08/11 12:01:25 INFO mapred.JobClient: Counters: 29

 

12/08/11 12:01:25 INFO mapred.JobClient:   Job Counters 

[root@localhost conf]# vi mapred-site.xml

12/08/11 12:01:25 INFO mapred.JobClient:     Launched reduce tasks=1

<?xml version="1.0"?>

12/08/11 12:01:25 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=49499

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

12/08/11 12:01:25 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

<!-- Put site-specific property overrides in this file. -->

12/08/11 12:01:25 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

<configuration>

12/08/11 12:01:25 INFO mapred.JobClient:     Launched map tasks=2

<property>

12/08/11 12:01:25 INFO mapred.JobClient:     Data-local map tasks=2

<name>mapred.job.tracker</name>

12/08/11 12:01:25 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=12839

<value>localhost:9001</value>

12/08/11 12:01:25 INFO mapred.JobClient:   File Output Format Counters 

</property>

12/08/11 12:01:25 INFO mapred.JobClient:     Bytes Written=41

</configuration>

12/08/11 12:01:25 INFO mapred.JobClient:   FileSystemCounters

 

12/08/11 12:01:25 INFO mapred.JobClient:     FILE_BYTES_READ=79

 

12/08/11 12:01:25 INFO mapred.JobClient:     HDFS_BYTES_READ=276

create namenode

12/08/11 12:01:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=64705

 [root@localhost hadoop-1.0.0]# bin/hadoop namenode -format

12/08/11 12:01:25 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=41

12/04/08 12:54:48 INFO namenode.NameNode: STARTUP_MSG:

12/08/11 12:01:25 INFO mapred.JobClient:   File Input Format Counters 

/************************************************************

12/08/11 12:01:25 INFO mapred.JobClient:     Bytes Read=50

STARTUP_MSG: Starting NameNode

12/08/11 12:01:25 INFO mapred.JobClient:   Map-Reduce Framework

STARTUP_MSG:   host = localhost.localdomain/127.0.0.1

12/08/11 12:01:25 INFO mapred.JobClient:     Map output materialized bytes=85

STARTUP_MSG:   args = [-format]

12/08/11 12:01:25 INFO mapred.JobClient:     Map input records=2

STARTUP_MSG:   version = 1.0.0

12/08/11 12:01:25 INFO mapred.JobClient:     Reduce shuffle bytes=85

STARTUP_MSG:   build = -r 1214675; compiled by 'hortonfo' on Thu Dec 15 16:36:35 UTC 2011

12/08/11 12:01:25 INFO mapred.JobClient:     Spilled Records=12

************************************************************/

12/08/11 12:01:25 INFO mapred.JobClient:     Map output bytes=82

12/04/08 12:54:49 INFO util.GSet: VM type       = 32-bit

12/08/11 12:01:25 INFO mapred.JobClient:     CPU time spent (ms)=4770

12/04/08 12:54:49 INFO util.GSet: 2% max memory = 19.33375 MB

12/08/11 12:01:25 INFO mapred.JobClient:     Total committed heap usage (bytes)=246751232

12/04/08 12:54:49 INFO util.GSet: capacity      = 2^22 = 4194304 entries

12/08/11 12:01:25 INFO mapred.JobClient:     Combine input records=8

12/04/08 12:54:49 INFO util.GSet: recommended=4194304, actual=4194304

12/08/11 12:01:25 INFO mapred.JobClient:     SPLIT_RAW_BYTES=226

12/04/08 12:54:52 INFO namenode.FSNamesystem: fsOwner=root

12/08/11 12:01:25 INFO mapred.JobClient:     Reduce input records=6

12/04/08 12:54:53 INFO namenode.FSNamesystem: supergroup=supergroup

12/08/11 12:01:25 INFO mapred.JobClient:     Reduce input groups=5

12/04/08 12:54:53 INFO namenode.FSNamesystem: isPermissionEnabled=true

12/08/11 12:01:25 INFO mapred.JobClient:     Combine output records=6

12/04/08 12:54:53 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

12/08/11 12:01:25 INFO mapred.JobClient:     Physical memory (bytes) snapshot=391634944

12/04/08 12:54:53 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

12/08/11 12:01:25 INFO mapred.JobClient:     Reduce output records=5

12/04/08 12:54:53 INFO namenode.NameNode: Caching file names occuring more than 10 times

12/08/11 12:01:25 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3159781376

12/04/08 12:54:54 INFO common.Storage: Image file of size 110 saved in 0 seconds.

12/08/11 12:01:25 INFO mapred.JobClient:     Map output records=8

12/04/08 12:54:54 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.

 

12/04/08 12:54:54 INFO namenode.NameNode: SHUTDOWN_MSG:

 

/************************************************************

//查看统计结果

SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1

[root@localhost opt]# hadoop fs -cat output/part-r-00000

************************************************************/

 

[root@localhost hadoop-1.0.0]#

Bye 1

go to explorer to check the log

Goodbye 1

Hadoop 2

 

Hello 2

图片 2

World 2

图片 3

 图片 4

Copy the input files into the distributed filesystem:

[root@localhost hadoop-1.0.0]# bin/hadoop fs -put conf input

 

Run some of the examples provided:

[root@localhost hadoop-1.0.0]# bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

12/04/08 13:53:16 INFO mapred.FileInputFormat: Total input paths to process : 16

12/04/08 13:53:17 INFO mapred.JobClient: Running job: job_201204081256_0001

12/04/08 13:53:18 INFO mapred.JobClient:  map 0% reduce 0%

12/04/08 13:53:58 INFO mapred.JobClient:  map 6% reduce 0%

12/04/08 13:54:01 INFO mapred.JobClient:  map 12% reduce 0%

12/04/08 13:54:17 INFO mapred.JobClient:  map 25% reduce 0%

12/04/08 13:54:36 INFO mapred.JobClient:  map 31% reduce 0%

12/04/08 13:54:41 INFO mapred.JobClient:  map 37% reduce 8%

12/04/08 13:54:45 INFO mapred.JobClient:  map 43% reduce 8%

12/04/08 13:54:49 INFO mapred.JobClient:  map 50% reduce 12%

12/04/08 13:54:53 INFO mapred.JobClient:  map 56% reduce 12%

12/04/08 13:54:56 INFO mapred.JobClient:  map 62% reduce 12%

12/04/08 13:54:59 INFO mapred.JobClient:  map 68% reduce 16%

12/04/08 13:55:03 INFO mapred.JobClient:  map 75% reduce 16%

12/04/08 13:55:06 INFO mapred.JobClient:  map 81% reduce 20%

12/04/08 13:55:09 INFO mapred.JobClient:  map 87% reduce 20%

12/04/08 13:55:13 INFO mapred.JobClient:  map 93% reduce 27%

12/04/08 13:55:16 INFO mapred.JobClient:  map 100% reduce 27%

12/04/08 13:55:22 INFO mapred.JobClient:  map 100% reduce 31%

12/04/08 13:55:28 INFO mapred.JobClient:  map 100% reduce 100%

12/04/08 13:55:35 INFO mapred.JobClient: Job complete: job_201204081256_0001

12/04/08 13:55:35 INFO mapred.JobClient: Counters: 30

12/04/08 13:55:35 INFO mapred.JobClient:   Job Counters

12/04/08 13:55:35 INFO mapred.JobClient:     Launched reduce tasks=1

12/04/08 13:55:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=175469

12/04/08 13:55:35 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/04/08 13:55:35 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/04/08 13:55:35 INFO mapred.JobClient:     Launched map tasks=16

12/04/08 13:55:35 INFO mapred.JobClient:     Data-local map tasks=16

12/04/08 13:55:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=87301

12/04/08 13:55:35 INFO mapred.JobClient:   File Input Format Counters

12/04/08 13:55:35 INFO mapred.JobClient:     Bytes Read=26846

12/04/08 13:55:35 INFO mapred.JobClient:   File Output Format Counters

12/04/08 13:55:35 INFO mapred.JobClient:     Bytes Written=180

12/04/08 13:55:35 INFO mapred.JobClient:   FileSystemCounters

12/04/08 13:55:35 INFO mapred.JobClient:     FILE_BYTES_READ=82

12/04/08 13:55:35 INFO mapred.JobClient:     HDFS_BYTES_READ=28568

12/04/08 13:55:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=367514

12/04/08 13:55:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=180

12/04/08 13:55:35 INFO mapred.JobClient:   Map-Reduce Framework

12/04/08 13:55:35 INFO mapred.JobClient:     Map output materialized bytes=172

12/04/08 13:55:35 INFO mapred.JobClient:     Map input records=760

12/04/08 13:55:35 INFO mapred.JobClient:     Reduce shuffle bytes=172

12/04/08 13:55:35 INFO mapred.JobClient:     Spilled Records=6

12/04/08 13:55:35 INFO mapred.JobClient:     Map output bytes=70

12/04/08 13:55:35 INFO mapred.JobClient:     Total committed heap usage (bytes)=3252289536

12/04/08 13:55:35 INFO mapred.JobClient:     CPU time spent (ms)=22930

12/04/08 13:55:35 INFO mapred.JobClient:     Map input bytes=26846

12/04/08 13:55:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1722

12/04/08 13:55:35 INFO mapred.JobClient:     Combine input records=3

12/04/08 13:55:35 INFO mapred.JobClient:     Reduce input records=3

12/04/08 13:55:35 INFO mapred.JobClient:     Reduce input groups=3

12/04/08 13:55:35 INFO mapred.JobClient:     Combine output records=3

12/04/08 13:55:35 INFO mapred.JobClient:     Physical memory (bytes) snapshot=2292494336

12/04/08 13:55:35 INFO mapred.JobClient:     Reduce output records=3

12/04/08 13:55:35 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=6338478080

12/04/08 13:55:35 INFO mapred.JobClient:     Map output records=3

12/04/08 13:55:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

12/04/08 13:55:37 INFO mapred.FileInputFormat: Total input paths to process : 1

12/04/08 13:55:38 INFO mapred.JobClient: Running job: job_201204081256_0002

12/04/08 13:55:39 INFO mapred.JobClient:  map 0% reduce 0%

12/04/08 13:55:55 INFO mapred.JobClient:  map 100% reduce 0%

12/04/08 13:56:11 INFO mapred.JobClient:  map 100% reduce 100%

12/04/08 13:56:16 INFO mapred.JobClient: Job complete: job_201204081256_0002

12/04/08 13:56:16 INFO mapred.JobClient: Counters: 30

12/04/08 13:56:16 INFO mapred.JobClient:   Job Counters

12/04/08 13:56:16 INFO mapred.JobClient:     Launched reduce tasks=1

12/04/08 13:56:16 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=13535

12/04/08 13:56:16 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/04/08 13:56:16 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/04/08 13:56:16 INFO mapred.JobClient:     Launched map tasks=1

12/04/08 13:56:16 INFO mapred.JobClient:     Data-local map tasks=1

12/04/08 13:56:16 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14047

12/04/08 13:56:16 INFO mapred.JobClient:   File Input Format Counters

12/04/08 13:56:16 INFO mapred.JobClient:     Bytes Read=180

12/04/08 13:56:16 INFO mapred.JobClient:   File Output Format Counters

12/04/08 13:56:16 INFO mapred.JobClient:     Bytes Written=52

12/04/08 13:56:16 INFO mapred.JobClient:   FileSystemCounters

12/04/08 13:56:16 INFO mapred.JobClient:     FILE_BYTES_READ=82

12/04/08 13:56:16 INFO mapred.JobClient:     HDFS_BYTES_READ=295

12/04/08 13:56:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=42409

12/04/08 13:56:16 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=52

12/04/08 13:56:16 INFO mapred.JobClient:   Map-Reduce Framework

12/04/08 13:56:16 INFO mapred.JobClient:     Map output materialized bytes=82

12/04/08 13:56:16 INFO mapred.JobClient:     Map input records=3

12/04/08 13:56:16 INFO mapred.JobClient:     Reduce shuffle bytes=82

12/04/08 13:56:16 INFO mapred.JobClient:     Spilled Records=6

12/04/08 13:56:16 INFO mapred.JobClient:     Map output bytes=70

12/04/08 13:56:16 INFO mapred.JobClient:     Total committed heap usage (bytes)=210763776

12/04/08 13:56:16 INFO mapred.JobClient:     CPU time spent (ms)=2270

12/04/08 13:56:16 INFO mapred.JobClient:     Map input bytes=94

12/04/08 13:56:16 INFO mapred.JobClient:     SPLIT_RAW_BYTES=115

12/04/08 13:56:16 INFO mapred.JobClient:     Combine input records=0

12/04/08 13:56:16 INFO mapred.JobClient:     Reduce input records=3

12/04/08 13:56:16 INFO mapred.JobClient:     Reduce input groups=1

12/04/08 13:56:16 INFO mapred.JobClient:     Combine output records=0

12/04/08 13:56:16 INFO mapred.JobClient:     Physical memory (bytes) snapshot=179437568

12/04/08 13:56:16 INFO mapred.JobClient:     Reduce output records=3

12/04/08 13:56:16 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=749031424

12/04/08 13:56:16 INFO mapred.JobClient:     Map output records=3

[root@localhost hadoop-1.0.0]#

Examine the output files:

 

Copy the output files from the distributed filesystem to the local filesytem and examine them:

 

[root@localhost hadoop-1.0.0]# bin/hadoop fs -get output output

[root@localhost hadoop-1.0.0]# cat output/*

cat: output/_logs: Is a directory

1       dfs.replication

1       dfs.server.namenode.

1       dfsadmin

[root@localhost hadoop-1.0.0]#

OK

The pseudo system is installed succefully

更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

图片 5

版权声明:本文由澳门新葡亰平台游戏发布于网络时代,转载请注明出处:Linux单机运行hadoop平台,2上安装过程