开发学院软件开发 Java 用 Hadoop 进行分布式数据处理，第 3 部分: 应用程... 阅读

用 Hadoop 进行分布式数据处理，第 3 部分: 应用程序开发

　2010-08-11 00:00:00　来源：WEB开发网　　　

核心提示： 下一步，查看 reduce 应用程序，用 Hadoop 进行分布式数据处理，第 3 部分: 应用程序开发(5)，虽然此应用程序稍微有些复杂，但是使用 Ruby hash（关联阵列）可简化 reduce 操作（请参考清单 5），记得要使用 chmod +x 将这些文件更改为可执行，通过生成输入文件

下一步，查看 reduce 应用程序。虽然此应用程序稍微有些复杂，但是使用 Ruby hash（关联阵列）可简化 reduce 操作（请参考清单 5）。此脚本可通过来自 stdin （通过流实用工具传递）的输入数据再次工作且将该行分割成一个单词或值。而后该 hash 会检查该单词；如果发现，则将计数添加到元素。否则，您需要在该单词的 hash 中创建新的条目，然后加载计数（应该是来自 mapper 过程的 1）。在所有输入都被处理以后，通过 hash 可简单迭代且将键值对发送到 stdout。

清单 5. Ruby reduce 脚本（reduce.rb）

#!/usr/bin/env　ruby　　 #　Create　an　empty　word　hash　 wordhash　=　{}　　 #　Our　input　comes　from　STDIN,　operating　on　each　line　 STDIN.each_line　do　|line|　　　#　Each　line　will　represent　a　word　and　count　　word,　count　=　line.strip.split　　　#　If　we　have　the　word　in　the　hash,　add　the　count　to　it,　otherwise　　#　create　a　new　one.　　if　wordhash.has_key?(word)　　　wordhash[word]　+=　count.to_i　　else　　　wordhash[word]　=　count.to_i　　end　　 end　　 #　Iterate　through　and　emit　the　word　counters　 wordhash.each　{|record,　count|　puts　"#{record}\t#{count}"}