WEB开发网
开发学院软件开发Java 用 Hadoop 进行分布式数据处理,第 2 部分: 进阶 阅读

用 Hadoop 进行分布式数据处理,第 2 部分: 进阶

 2010-07-06 00:00:00 来源:WEB开发网   
核心提示: 清单 13. 生成输入数据root@master:~#hadoop-0.20fs-mkdirinputroot@master:~#hadoop-0.20fs-put\/usr/src/linux-source-2.6.27/Doc*/memory-barriers.txtinputroot@ma

清单 13. 生成输入数据

root@master:~# hadoop-0.20 fs -mkdir input 
root@master:~# hadoop-0.20 fs -put \ 
 /usr/src/linux-source-2.6.27/Doc*/memory-barriers.txt input 
root@master:~# hadoop-0.20 fs -put \ 
 /usr/src/linux-source-2.6.27/Doc*/rt-mutex-design.txt input 
root@master:~# hadoop-0.20 fs -ls input 
Found 2 items 
-rw-r--r-- 2 root supergroup 78031 2010-05-12 14:16 /user/root/input/memory-barriers.txt 
-rw-r--r-- 2 root supergroup 33567 2010-05-12 14:16 /user/root/input/rt-mutex-design.txt 
root@master:~# 

下一步,启动 wordcount MapReduce 作业。与在伪分布式模型中一样,指定输入子目录(包含输入文件)和输出目录(不存在,但会由名称节点创建并用结果数据填充):

清单 14. 在集群上运行 MapReduce wordcount 作业

root@master:~# hadoop-0.20 jar \ 
 /usr/lib/hadoop-0.20/hadoop-0.20.2+228-examples.jar wordcount input output 
10/05/12 19:04:37 INFO input.FileInputFormat: Total input paths to process : 2 
10/05/12 19:04:38 INFO mapred.JobClient: Running job: job_201005121900_0001 
10/05/12 19:04:39 INFO mapred.JobClient: map 0% reduce 0% 
10/05/12 19:04:59 INFO mapred.JobClient: map 50% reduce 0% 
10/05/12 19:05:08 INFO mapred.JobClient: map 100% reduce 16% 
10/05/12 19:05:17 INFO mapred.JobClient: map 100% reduce 100% 
10/05/12 19:05:19 INFO mapred.JobClient: Job complete: job_201005121900_0001 
10/05/12 19:05:19 INFO mapred.JobClient: Counters: 17 
10/05/12 19:05:19 INFO mapred.JobClient:  Job Counters 
10/05/12 19:05:19 INFO mapred.JobClient:   Launched reduce tasks=1 
10/05/12 19:05:19 INFO mapred.JobClient:   Launched map tasks=2 
10/05/12 19:05:19 INFO mapred.JobClient:   Data-local map tasks=2 
10/05/12 19:05:19 INFO mapred.JobClient:  FileSystemCounters 
10/05/12 19:05:19 INFO mapred.JobClient:   FILE_BYTES_READ=47556 
10/05/12 19:05:19 INFO mapred.JobClient:   HDFS_BYTES_READ=111598 
10/05/12 19:05:19 INFO mapred.JobClient:   FILE_BYTES_WRITTEN=95182 
10/05/12 19:05:19 INFO mapred.JobClient:   HDFS_BYTES_WRITTEN=30949 
10/05/12 19:05:19 INFO mapred.JobClient:  Map-Reduce Framework 
10/05/12 19:05:19 INFO mapred.JobClient:   Reduce input groups=2974 
10/05/12 19:05:19 INFO mapred.JobClient:   Combine output records=3381 
10/05/12 19:05:19 INFO mapred.JobClient:   Map input records=2937 
10/05/12 19:05:19 INFO mapred.JobClient:   Reduce shuffle bytes=47562 
10/05/12 19:05:19 INFO mapred.JobClient:   Reduce output records=2974 
10/05/12 19:05:19 INFO mapred.JobClient:   Spilled Records=6762 
10/05/12 19:05:19 INFO mapred.JobClient:   Map output bytes=168718 
10/05/12 19:05:19 INFO mapred.JobClient:   Combine input records=17457 
10/05/12 19:05:19 INFO mapred.JobClient:   Map output records=17457 
10/05/12 19:05:19 INFO mapred.JobClient:   Reduce input records=3381 
root@master:~# 

上一页  4 5 6 7 8 9 10  下一页

Tags:Hadoop 进行 分布式

编辑录入:爽爽 [复制链接] [打 印]
赞助商链接