PySparkExercise: Walmart Stock (2012-2017)

Welcome to the Spark Analysis repository, focusing on Walmart's stock data from 2012 to 2017. This project is designed to provide a practical, hands-on experience with Apache Spark DataFrames, exploring various arithmetic and logical operations through a series of questions and exercises.

About This Repository

In this repository, we dive into the world of big data analysis using Apache Spark, a leading platform for large-scale SQL, batch processing, stream processing, and machine learning. Using Walmart stock data spanning five years, we'll explore fundamental DataFrame operations, data manipulation techniques, and basic analytics.

Features

Data Exploration: Understand the structure and characteristics of the dataset.
Arithmetic Operations: Perform calculations and aggregations to derive insights.
Logical Operations: Apply logical operations to filter and refine the data analysis.
Question-Based Learning: Each exercise is framed as a question to guide your analysis.
Dataset
The dataset consists of Walmart's stock prices from 2012 to 2017. It includes columns like Date, Open, High, Low, Close, Volume, and Adjusted Close.

Getting Started

Prerequisites
Apache Spark (preferably the latest version)
Basic knowledge of Python and SQL
Installation and Setup
Clone the Repository

git clone https://github.com/uannabi/PySparkExercise.git

Navigate to the Project Directory

cd PySparkExercise

Running the Exercises

The exercises are designed as Jupyter notebooks that you can run in your Spark environment. Ensure you have Jupyter installed and configured for use with Spark.

Contributing

This repository aims to provide a foundational understanding of Spark DataFrames through practical exercises. Contributions, suggestions, and improvements are warmly welcomed. Feel free to fork the repository and submit your pull requests.

uannabi / PySparkExercise