tanxinz / SparkStreamingDemo

准实时分布式流处理框架

SparkStreamingDemo

准实时分布式流处理框架，单位以分钟来算

Spark入门学习
http://www.cnblogs.com/shishanyuan/p/4699644.html
https://www.cnblogs.com/shishanyuan/p/4747735.html
英文官网：
http://spark.apache.org/docs/latest/streaming-programming-guide.html

SSH的 The authenticity of host xxx.xxx.xxx.xxx can't be established. 问题,authorized_keys的权限为600，不是400:
http://blog.csdn.net/lina791211/article/details/11818825

Hadoop出现namenode running as process 18472. Stop it first.解决方法:
http://www.linuxidc.com/Linux/2014-07/104315.htm

linux 如何把文件夹里面文件权限修改:
https://zhidao.baidu.com/question/139006065.html

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
http://blog.csdn.net/qyl445/article/details/50684691

增加spark worker的内存和datanode的内存方法:
http://blog.csdn.net/wonder4/article/details/52476008

scala.MatchError异常以及StructType Schema设置:
http://blog.csdn.net/gangchengzhong/article/details/70153932
spark streaming向oracle插入日期的问题：
http://www.aboutyun.com/thread-18282-1-1.html

spark 各类算子：
https://www.cnblogs.com/zlslch/p/5723857.html

foreachPartitions遍历插入数据,用java编写的连接池，scala编写的有问题：
java编写的连接池可以使用，SparkStreaming程序有问题，可以参见第三个连接:
http://blog.csdn.net/erfucun/article/details/52312682
scala编写的连接池有问题，SparkStreaming程序正常:
http://blog.csdn.net/legotime/article/details/51836039
oracle ORA-01000: maximum open cursors exceeded问题的解决方法:
https://www.cnblogs.com/qinjunli/p/4588089.html

oracle批量插入数据操作:
http://blog.csdn.net/yuanzexi/article/details/50912630

spark 连接池c3p0使用:
http://www.jianshu.com/p/65c1b319b70a

scala中 object 和 class的区别:
http://blog.csdn.net/wangxiaotongfan/article/details/48242029

scala基础语法学习：
http://www.yiibai.com/scala/scala_overview.html

spark读取配置文件中的配置
http://blog.csdn.net/u012307002/article/details/53308937

Spark Streaming使用Kafka保证数据零丢失：
https://www.cnblogs.com/jacksu-tencent/p/5135869.html
https://www.jianshu.com/p/8603ba4be007
http://blog.csdn.net/lsshlsw/article/details/51133217
https://www.iteblog.com/archives/1591.html

Spark Streaming消费Kafka Direct方式数据零丢失实现：
https://www.cnblogs.com/hd-zg/p/6841249.html
: Spark streaming接收Kafka数据, 偏移量记录的方式有checkpoint、数据库或文件记录或者回写到zookeeper中进行记录:
https://www.cnblogs.com/xlturing/p/6246538.html

spark Direct 偏移量保存在zookeeper上 https://github.com/xlturing/spark-journey/blob/master/SparkStreamingKafka/src/main/scala/com/sparkstreaming/main/KafkaManager.scala
http://blog.csdn.net/lw_ghy/article/details/50926855

Spark Streaming -2. Kafka集成指南（Kafka版本0.10.0或更高版本）
http://blog.csdn.net/zhongguozhichuang/article/details/53282858
Spark2.x学习笔记：1、Spark2.2快速入门（本地模式）
http://blog.csdn.net/chengyuqiang/article/details/77671748?locationNum=4&fps=1
Spark2.11 两种流操作 + Kafka
http://blog.csdn.net/zeroder/article/details/73650731

foreachPartition和mapPartition的区别，一个Transformation运算，一个action运算：
http://blog.csdn.net/u010454030/article/details/78897150

spark心跳与集群机器时间有关，如果集群每台机器时间不一致，会导致spark心跳失衡，故而报错

About

准实时分布式流处理框架