Hadoop-3 Mapreduce和Wordcount
- 场景描述
存在像diyishuai hello hi hadoop
spark kafka flume zookeeper
...
这样的单词,现在要用空格把他们分离开,并统计每个单词出现的次数
2. 编码
- mapper
- reducer
代码见–>https://github.com/BestBurning/myworld/tree/master/hadoop/src/main/java/com/diyishuai/hadoop/mr/wcdemo
3. 打包并上传到一个datanode客户端
4. 启动hdsf和yarn(已经启动的可以略过)
start-dsf.sh |
在hdfs创建目标目录并存入待分析文件
hadoop fs -mkdir -p /wordcount/input
hadoop fs -put LICENSE.txt NOTICE.txt README.txt /wordcount/input在http://server01:50070中check一下,之后要运行的wordcount可在http://server01:8088中看到
运行wordcount
hadoop jar wordcount.jar com.diyishuai.hadoop.mr.wcdemo.WordcountDriver /wordcount/input /wordcount/output
可以在hdsf的/wordcount/output中查看运行结果
问题
如果遇到这个Container [pid=3058,containerID=container_1515314973658_0001_01_000005] is running beyond virtual memory limits. Current usage: 107.9 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container.
在全部节点的hadoop-2.x.x/etc/hadoop/mapre-site.xml配置文件中添加
<property> |
并重启yarn
本文标题:Hadoop-3 Mapreduce和Wordcount
文章作者:Shea
原始链接:https://di1shuai.com/hadoop-3Mapreduce和WordcountDemo.html
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC-SA 3.0 CN 许可协议。转载请注明出处!