uannabi / PySparkExercise

Pyspark Basic problem solve using Jupyter Notebook

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PySparkExercise: Walmart Stock (2012-2017)

Welcome to the Spark Analysis repository, focusing on Walmart's stock data from 2012 to 2017. This project is designed to provide a practical, hands-on experience with Apache Spark DataFrames, exploring various arithmetic and logical operations through a series of questions and exercises.

About This Repository

In this repository, we dive into the world of big data analysis using Apache Spark, a leading platform for large-scale SQL, batch processing, stream processing, and machine learning. Using Walmart stock data spanning five years, we'll explore fundamental DataFrame operations, data manipulation techniques, and basic analytics.

Features

  • Data Exploration: Understand the structure and characteristics of the dataset.
  • Arithmetic Operations: Perform calculations and aggregations to derive insights.
  • Logical Operations: Apply logical operations to filter and refine the data analysis.
  • Question-Based Learning: Each exercise is framed as a question to guide your analysis.
  • Dataset
  • The dataset consists of Walmart's stock prices from 2012 to 2017. It includes columns like Date, Open, High, Low, Close, Volume, and Adjusted Close.

Getting Started

  • Prerequisites
  • Apache Spark (preferably the latest version)
  • Basic knowledge of Python and SQL
  • Installation and Setup
  • Clone the Repository
git clone https://github.com/uannabi/PySparkExercise.git

Navigate to the Project Directory

cd PySparkExercise

Running the Exercises

The exercises are designed as Jupyter notebooks that you can run in your Spark environment. Ensure you have Jupyter installed and configured for use with Spark.

Contributing

This repository aims to provide a foundational understanding of Spark DataFrames through practical exercises. Contributions, suggestions, and improvements are warmly welcomed. Feel free to fork the repository and submit your pull requests.

About

Pyspark Basic problem solve using Jupyter Notebook


Languages

Language:Jupyter Notebook 100.0%