tknishh / Twitter-Spark-Streaming

Streaming data from data related to a certain topic.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Twitter-Spark-Streaming

Apache Spark Streaming is quite popular. Due to its integrated technology, Spark Streaming outperforms previous systems in terms of data stream quality and comprehensive approach.

Python and Spark Streaming do wonders for industry giants when used together. Netflix is an excellent Python/Spark Streaming representation: the people behind the popular streaming platform have produced multiple articles about how they use the technique to help us enjoy Netflix even more. Let’s get started with the basics.

Structure

flow

Now that we have gone through building a real-life solution of spark streaming pipeline, let’s list down some pros and cons of using this approach.

Pros

  • For difficult jobs, it offers exceptional speed.
  • Sensitivity to faults.
  • On cloud platforms, it’s simple to execute.
  • Support for multiple languages.
  • Integration with major frameworks.
  • The capability to connect databases of various types.

Cons

  • Massive volumes of storage are required.
  • It’s difficult to use, debug, and master.
  • There is a lack of documentation and instructional resources.
  • Visualization of data is unsatisfactory.
  • Unresponsive when dealing with little amounts of data
  • There have only been a few machine learning techniques.

Conclusion

Spark Streaming is indeed a technology for collecting and analyzing large amounts of data. Streaming data is likely to become more popular in the near future, so you should start learning about it now. Remember that data science is more than just constructing models; it also entails managing a full pipeline.

The basics of Spark Streaming were discussed in this post, as well as how to use it on a real-world dataset. We suggest you work with another sample or take real-time data to put everything we’ve learned into practice.

About

Streaming data from data related to a certain topic.


Languages

Language:Jupyter Notebook 98.8%Language:Python 1.2%