周末了 发现一本好书《Hadoop数据分析》送福利了

周末了 发现一本好书《Hadoop数据分析》 送福利了 关注我,点赞、分享,私信我(一定私信我文章名称),就能获得这个电子版书,仅供自己学习。一起讨论大数据技术吧


前言..........................................................................................................................................................ix

第一部分 分布式计算入门

第 1 章 数据产品时代 .......................................................................................................................2

1.1 什么是数据产品 .........................................................................................................................2

1.2 使用 Hadoop 构建大规模数据产品 ..........................................................................................4

1.2.1 利用大型数据集 ............................................................................................................4

1.2.2 数据产品中的 Hadoop ...................................................................................................5

1.3 数据科学流水线和 Hadoop 生态系统 ......................................................................................6

1.4 小结 .............................................................................................................................................8

第 2 章 大数据操作系统 ..................................................................................................................9

2.1 基本概念 ...................................................................................................................................10

2.2 Hadoop 架构 .............................................................................................................................11

2.2.1 Hadoop 集群 .................................................................................................................12

2.2.2 HDFS ............................................................................................................................14

2.2.3 YARN............................................................................................................................15

2.3 使用分布式文件系统 ...............................................................................................................16

2.3.1 基本的文件系统操作 ..................................................................................................16

2.3.2 HDFS 文件权限 ...........................................................................................................18

2.3.3 其他 HDFS 接口 ..........................................................................................................19

2.4 使用分布式计算 .......................................................................................................................20

2.4.1 MapReduce:函数式编程模型 ...................................................................................20

v2.4.2 MapReduce:集群上的实现 .......................................................................................22

2.4.3 不止一个 MapReduce:作业链 ..................................................................................27

2.5 向 YARN 提交 MapReduce 作业 ............................................................................................28

2.6 小结 ...........................................................................................................................................30

第 3 章 Python 框架和 Hadoop Streaming .............................................................................31

3.1 Hadoop Streaming .....................................................................................................................32

3.1.1 使用 Streaming 在 CSV 数据上运行计算 ..................................................................34

3.1.2 执行 Streaming 作业 ....................................................................................................38

3.2 Python 的 MapReduce 框架 .....................................................................................................39

3.2.1 短语计数 ......................................................................................................................42

3.2.2 其他框架 ......................................................................................................................45

3.3 MapReduce 进阶 .......................................................................................................................46

3.3.1 combiner .......................................................................................................................46

3.3.2 partitioner ......................................................................................................................47

3.3.3 作业链 ..........................................................................................................................47

3.4 小结 ...........................................................................................................................................50

第 4 章 Spark 内存计算 .................................................................................................................52

4.1 Spark 基础.................................................................................................................................53

4.1.1 Spark 栈 ........................................................................................................................54

4.1.2 RDD ..............................................................................................................................55

4.1.3 使用 RDD 编程 ............................................................................................................56

4.2 基于 PySpark 的交互性 Spark .................................................................................................59

4.3 编写 Spark 应用程序................................................................................................................61

4.4 小结 ...........................................................................................................................................67

第 5 章 分布式分析和模式 ............................................................................................................69

5.1 键计算 .......................................................................................................................................70

5.1.1 复合键 ..........................................................................................................................71

5.1.2 键空间模式 ..................................................................................................................74

5.1.3 pair 与 stripe .................................................................................................................78

5.2 设计模式 ...................................................................................................................................80

5.2.1 概要 ..............................................................................................................................81

5.2.2 索引 ..............................................................................................................................85

5.2.3 过滤 ..............................................................................................................................90

5.3 迈向最后一英里分析 ...............................................................................................................95

5.3.1 模型拟合 ......................................................................................................................96

5.3.2 模型验证 ......................................................................................................................97

5.4 小结 ...........................................................................................................................................98

vi 目录第二部分 大数据科学的工作流和工具

第 6 章 数据挖掘和数据仓储......................................................................................................102

6.1 Hive 结构化数据查询 ............................................................................................................103

6.1.1 Hive 命令行接口(CLI) ...........................................................................................103

6.1.2 Hive 查询语言 ............................................................................................................104

6.1.3 Hive 数据分析 ............................................................................................................108

6.2 HBase ......................................................................................................................................113

6.2.1 NoSQL 与列式数据库 ...............................................................................................114

6.2.2 HBase 实时分析 .........................................................................................................116

6.3 小结 .........................................................................................................................................122

第 7 章 数据采集 ............................................................................................................................123

7.1 使用 Sqoop 导入关系数据 .....................................................................................................124

7.1.1 从 MySQL 导入 HDFS ..............................................................................................124

7.1.2 从 MySQL 导入 Hive.................................................................................................126

7.1.3 从 MySQL 导入 HBase ..............................................................................................128

7.2 使用 Flume 获取流式数据 .....................................................................................................130

7.2.1 Flume 数据流 .............................................................................................................130

7.2.2 使用 Flume 获取产品印象数据 ................................................................................133

7.3 小结 .........................................................................................................................................136

第 8 章 使用高级 API 进行分析 .................................................................................................137

8.1 Pig............................................................................................................................................137

8.1.1 Pig Latin ......................................................................................................................138

8.1.2 数据类型 ....................................................................................................................142

8.1.3 关系运算符 ................................................................................................................142

8.1.4 用户定义函数 ............................................................................................................143

8.1.5 Pig 小结 ......................................................................................................................144

8.2 Spark 高级 API .......................................................................................................................144

8.2.1 Spark SQL...................................................................................................................146

8.2.2 DataFrame ...................................................................................................................148

8.3 小结 .........................................................................................................................................153

第 9 章 机器学习 ............................................................................................................................154

9.1 使用 Spark 进行可扩展的机器学习......................................................................................154

9.1.1 协同过滤 ....................................................................................................................156

9.1.2 分类 ............................................................................................................................161

9.1.3 聚类 ............................................................................................................................163

9.2 小结 .........................................................................................................................................166

目录 vii

图灵社区会员 ChenyangGao(2339083510@qq.com) 专享 尊重版权第 10 章 总结:分布式数据科学实战 ......................................................................................167

10.1 数据产品生命周期 ...............................................................................................................168

10.1.1 数据湖泊 .................................................................................................................169

10.1.2 数据采集 .................................................................................................................171

10.1.3 计算数据存储 .........................................................................................................172

10.2 机器学习生命周期 ...............................................................................................................173

10.3 小结 .......................................................................................................................................175

展开阅读全文

页面更新:2024-05-04

标签:数据   作业   分布式   小结   框架   福利   模型   周末   机器   模式   发现   科学   产品

1 2 3 4 5

上滑加载更多 ↓
推荐阅读:
友情链接:
更多:

本站资料均由网友自行发布提供,仅用于学习交流。如有版权问题,请与我联系,QQ:4156828  

© CopyRight 2008-2024 All Rights Reserved. Powered By bs178.com 闽ICP备11008920号-3
闽公网安备35020302034844号

Top