kevinknights29 / Concurrent-and-Parallel-Programming-in-Python

This project implements the code examples from the course: `Concurrent and Parallel Programming in Python` by Maximilian Schallwig

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Concurrent-and-Parallel-Programming-in-Python

This project implements the code examples from the course: Concurrent and Parallel Programming in Python by Maximilian Schallwig.

Topics

  1. Overview
  2. Goals
  3. Scope and Context
  4. System Design
  5. Alternatives Considered
  6. Learning Logs
  7. Resources

Overview

The project involves building a system that fetches the list of companies from the S&P 500 and retrieves the stock information for each of those companies using Yahoo Finance.

The stock data is then inserted into a PostgreSQL database. The system leverages concurrent and parallel programming in Python to efficiently manage the flow of data between different components: fetching the list of companies, retrieving stock prices, and storing the data in the database.

Pipeline Feature

This project now includes a pipeline feature that allows the process to be configured from a configuration file.

The pipeline executor initializes and manages queues, workers, and schedulers based on the provided configuration, making the system highly flexible and easy to modify.

Processing Logs

image

Results from DB

select * from public.prices;

image

Goals

The primary goal of this project is to learn and demonstrate the concepts of concurrent and parallel programming in Python.

By building a simple yet practical application, we aim to understand how to manage multiple tasks simultaneously, efficiently handle inter-process communication, and effectively utilize system resources.

The project showcases the use of Python's built-in queue functionality, multiprocessing, and logging modules to create a robust and scalable application.

Scope and Context

The scope of this project includes:

  1. Fetching the list of S&P 500 companies from Wikipedia.
  2. Using multiple worker instances to retrieve stock price information concurrently from Yahoo Finance.
  3. Storing the retrieved stock data in a PostgreSQL database using multiple database worker instances.
  4. Implementing a logging system to monitor and debug the application.

The project is designed to be a learning exercise, focusing on the practical application of concurrent and parallel programming concepts. It provides a hands-on approach to understanding how to build and manage a system that performs multiple tasks simultaneously, highlighting the challenges and solutions associated with such an approach. The context of this project is educational, aimed at enhancing the developer's skills in Python and system design.

System Design

High Level Process

graph TD;
    A[Wikipedia] -->|Fetches tickers| B[WikiWorker]
    B -->|Puts tickers into| C[Tickers Queue]
    C -->|Fetches tickers| D[YahooFinancePriceScheduler]
    D -->|Puts stock data into| E[Postgres Queue]
    E -->|Fetches stock data| F[PostgresScheduler]
    D -->|STOP_SIGNAL| C

    subgraph System Design
        direction TB
        B --> C
        D --> E
        E --> F
    end

    subgraph Scheduler Instances
        direction LR
        D[YahooFinancePriceScheduler]
        F[PostgresScheduler]
    end

    subgraph Queues
        direction TB
        C[Tickers Queue]
        E[Postgres Queue]
    end
Loading

Pipeline Design

graph TD;
    A[PipelineExecutor Initialization] --> B[Initialize Queues]
    B --> C[Create Queues from Config]
    C --> D[Assign Queue Instances]

    A --> E[Initialize Workers]
    E --> F[Create Workers from Config]
    F --> G[Import Worker Classes]
    G --> H[Assign Input/Output Queues]
    H --> I[Instantiate Worker Classes]

    A --> J[Initialize Schedulers]
    J --> K[Create Schedulers from Config]
    K --> L[Import Scheduler Classes]
    L --> M[Assign Input/Output Queues]
    M --> N[Instantiate Scheduler Instances]

    A --> O[Join Schedulers]
    O --> P[Collect Schedulers]
    P --> Q[Join Scheduler Instances]

    A --> R[Setup Pipeline]
    R --> B
    R --> E
    R --> J
Loading

Alternatives Considered

Learning Logs

Date Learning
06-08-2024 Psycopg3 (psycopg) has been realized and offers optimizations for psycopg2.
06-09-2024 DB_HOST can be set to db (service) name in the docker-compose.yaml for connectivity between the db and the app.

Resources

About

This project implements the code examples from the course: `Concurrent and Parallel Programming in Python` by Maximilian Schallwig

License:MIT License


Languages

Language:Python 88.1%Language:Dockerfile 10.0%Language:Shell 1.8%