Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。
配置
使用如下的conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.0.101:9000</value>
</property>
</configuration>
conf/hdfs-site.xml:
<configuration>
<property>
<name>fs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.0.101:9001</value>
</property>
</configuration>
首先,请求 namenode 对 DFS 文件系统进行格式化。在安装过程中完成了这个步骤,但是了解是否需要生成干净的文件系统是有用的。
bin/hadoop namenode -format
输出:
11/11/30 09:53:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu1/192.168.0.101
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
11/11/30 09:53:56 INFO namenode.FSNamesystem: fsOwner=root,root
11/11/30 09:53:56 INFO namenode.FSNamesystem: supergroup=supergroup
11/11/30 09:53:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/11/30 09:53:56 INFO common.Storage: Image file of size 94 saved in 0 seconds.
11/11/30 09:53:57 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
11/11/30 09:53:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu1/192.168.0.101
************************************************************/
执行:bin/start-all.sh
starting namenode, logging to /usr/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-ubuntu1.out
localhost: starting datanode, logging to /usr/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-ubuntu1.out
localhost: starting secondarynamenode, logging to /usr/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-ubuntu1.out
starting jobtracker, logging to /usr/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-ubuntu1.out
localhost: starting tasktracker, logging to /usr/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-ubuntu1.out
检查hdfs :bin/hadoopfs
-ls /
输出目录文件则正常。
hadoop文件系统操作:
bin/hadoop
fs -mkdir test
bin/hadoop
fs -ls test
bin/hadoop
fs -rmr test
测试hadoop:
bin/hadoop
fs -mkdir input
自己建立两个文本文件:file1和file2放在/opt/hadoop/sourcedata下
执行:bin/hadoopfs
-put/opt/hadoop/sourcedata/file*
input
执行:bin/hadoop
jar hadoop-0.20.2-examples.jar wordcount input output
输出:
11/11/30 10:15:38 INFO input.FileInputFormat: Total input paths to process : 2
11/11/30 10:15:52 INFO mapred.JobClient: Running job: job_201111301005_0001
11/11/30 10:15:53 INFO mapred.JobClient: map 0% reduce 0%
11/11/30 10:19:07 INFO mapred.JobClient: map 50% reduce 0%
11/11/30 10:19:14 INFO mapred.JobClient: map 100% reduce 0%
11/11/30 10:19:46 INFO mapred.JobClient: map 100% reduce 100%
11/11/30 10:19:54 INFO mapred.JobClient: Job complete: job_201111301005_0001
11/11/30 10:19:59 INFO mapred.JobClient: Counters: 17
11/11/30 10:19:59 INFO mapred.JobClient: Job Counters
11/11/30 10:19:59 INFO mapred.JobClient: Launched reduce tasks=1
11/11/30 10:19:59 INFO mapred.JobClient: Launched map tasks=2
11/11/30 10:19:59 INFO mapred.JobClient: Data-local map tasks=2
11/11/30 10:19:59 INFO mapred.JobClient: FileSystemCounters
11/11/30 10:19:59 INFO mapred.JobClient: FILE_BYTES_READ=146
11/11/30 10:19:59 INFO mapred.JobClient: HDFS_BYTES_READ=64
11/11/30 10:19:59 INFO mapred.JobClient: FILE_BYTES_WRITTEN=362
11/11/30 10:19:59 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=60
11/11/30 10:19:59 INFO mapred.JobClient: Map-Reduce Framework
11/11/30 10:19:59 INFO mapred.JobClient: Reduce input groups=9
11/11/30 10:19:59 INFO mapred.JobClient: Combine output records=13
11/11/30 10:19:59 INFO mapred.JobClient: Map input records=2
11/11/30 10:19:59 INFO mapred.JobClient: Reduce shuffle bytes=102
11/11/30 10:19:59 INFO mapred.JobClient: Reduce output records=9
11/11/30 10:19:59 INFO mapred.JobClient: Spilled Records=26
11/11/30 10:19:59 INFO mapred.JobClient: Map output bytes=120
11/11/30 10:19:59 INFO mapred.JobClient: Combine input records=14
11/11/30 10:19:59 INFO mapred.JobClient: Map output records=14
11/11/30 10:19:59 INFO mapred.JobClient: Reduce input records=13
执行成功!
其他查看结果命令:
bin/hadoop fs -ls /user/root/output
bin/hadoop fs -cat output/part-r-00000
bin/hadoop fs -cat output/part-r-00000 | head -13
bin/hadoop fs -get output/part-r-00000 output.txt
cat output.txt | head -5
bin/hadoop fs -rmr output
也可以使用浏览器查看,地址:
http://192.168.0.101:50030(mapreduce的web页面)
http://192.168.0.101:50070(hdfs的web页面)
下面执行grep的mapreduce任务:
执行:bin/hadoop
fs -rmr output
执行:bin/hadoop
jar hadoop-0.20.2-examples.jar
grep input output 'hadoop'
输出:
11/11/30 10:28:37 INFO mapred.FileInputFormat: Total input paths to process : 2
11/11/30 10:28:40 INFO mapred.JobClient: Running job: job_201111301005_0002
11/11/30 10:28:41 INFO mapred.JobClient: map 0% reduce 0%
11/11/30 10:34:16 INFO mapred.JobClient: map 66% reduce 0%
11/11/30 10:37:40 INFO mapred.JobClient: map 100% reduce 11%
11/11/30 10:37:50 INFO mapred.JobClient: map 100% reduce 22%
11/11/30 10:37:54 INFO mapred.JobClient: map 100% reduce 66%
11/11/30 10:38:15 INFO mapred.JobClient: map 100% reduce 100%
11/11/30 10:38:30 INFO mapred.JobClient: Job complete: job_201111301005_0002
11/11/30 10:38:32 INFO mapred.JobClient: Counters: 18
11/11/30 10:38:32 INFO mapred.JobClient: Job Counters
11/11/30 10:38:32 INFO mapred.JobClient: Launched reduce tasks=1
11/11/30 10:38:32 INFO mapred.JobClient: Launched map tasks=3
11/11/30 10:38:32 INFO mapred.JobClient: Data-local map tasks=3
11/11/30 10:38:32 INFO mapred.JobClient: FileSystemCounters
11/11/30 10:38:32 INFO mapred.JobClient: FILE_BYTES_READ=40
11/11/30 10:38:32 INFO mapred.JobClient: HDFS_BYTES_READ=77
11/11/30 10:38:32 INFO mapred.JobClient: FILE_BYTES_WRITTEN=188
11/11/30 10:38:32 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=109
11/11/30 10:38:32 INFO mapred.JobClient: Map-Reduce Framework
11/11/30 10:38:32 INFO mapred.JobClient: Reduce input groups=1
11/11/30 10:38:32 INFO mapred.JobClient: Combine output records=2
11/11/30 10:38:32 INFO mapred.JobClient: Map input records=2
11/11/30 10:38:32 INFO mapred.JobClient: Reduce shuffle bytes=46
11/11/30 10:38:32 INFO mapred.JobClient: Reduce output records=1
11/11/30 10:38:32 INFO mapred.JobClient: Spilled Records=4
11/11/30 10:38:32 INFO mapred.JobClient: Map output bytes=30
11/11/30 10:38:32 INFO mapred.JobClient: Map input bytes=64
11/11/30 10:38:32 INFO mapred.JobClient: Combine input records=2
11/11/30 10:38:32 INFO mapred.JobClient: Map output records=2
11/11/30 10:38:32 INFO mapred.JobClient: Reduce input records=2
11/11/30 10:38:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
执行:bin/hadoop fs -cat output/part-00000
输出:2hadoop
成功完成伪分布式的部署及测试。如有问题,请留言!
相关推荐
1、hadoop单机模式和伪分布式 2、hadoop集群 3、hadoop运行WordCount程序 4、编码实践并在Hadoop上运行 题目:输入两个文件,一个代表工厂表,包含工厂名列和地址编号列;另一个代表地址表,包含地址名列和地址编号...
linux:centos6.5 hadoop版本:2.6.0 集群方式:伪分布式 已经通过 内容比较详尽
Hadoop安装教程_单机_伪分布式配置
Hadoop安装教程_伪分布式配置_CentOS6.4/Hadoop2.6.0_厦大数据库实验室博客总结、分享、收获大数据 (http://dblab.xmu.
详细的hadoop2 伪分布式环境搭建以及eclipse部署。demo示例代码测试运行。文中有插件包。资源包等参考链接参考下载。
01-Hadoop安装手册(伪分布式).docx
Hadoop技术-Hadoop伪分布式安装.pptx
Hadoop伪分布式、完全分布式搭建和测试(详细版)-附件资源
hadoop2.6.5伪分布式搭建hadoop2.6.5伪分布式搭建hadoop2.6.5伪分布式搭建hadoop2.6.5伪分布式搭建
2、大数据环境-安装Hadoop2.5.2伪分布式傻瓜教程 原创
Hadoop单节点伪分布式搭建中文版 个人翻译的Hadoop搭建文档,这里只是翻译了伪分布式搭建。 如果是测试、学习hadoop,伪分布式足够了。
如果用的是 CentOS/RedHat 系统,请查看相应的CentOS安装Hadoop教程_单机伪分布式配置。 本教程基于原生 Hadoop 2,在 Hadoop 2.6.0 (stable) 版本下验证通过,可适合任何 Hadoop 2.x.y 版本,如 Hadoop 2.7.1、...
1.Hadoop入门进阶课程_第1周_Hadoop1.X伪分布式安装.pdf
hadoop伪分布式安装!
伪分布式Hadoop安装配置测试全过程
hadoop伪分布式集群搭建
练习搭建伪分布Hadoop3.X集群,只用于刚刚开始学习搭建hadoo伪分布式集群的人群,帮助大家快速搭建Hadoop3.X伪分布式集群,快速入门大数据为日后的学习打下坚实的基础
Hadoop伪分布式集群环境搭建 Hadoop伪分布式集群环境搭建
成功实现Ubuntu11.10下安装Hadoop0.20.2(单机伪分布式)。涉及内容:1. 安装虚拟机VMware Workstation v7.1.4 2. 在虚拟机上安装ubuntu11.10 3. 为ubuntu11.10安装vmware tool 4. 安装JAVA6 5. 在ubuntu中安装ssh 6...
Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是...