joxborrow / jto_py_env

This repo contains the environment file for the conda enviroment that I use for adhoc analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MY-PY-ENV

A long time ago in a galaxy far, far away there were statistical/programming packages where individual packages did not need to be individually installed and loaded. This environment file is meant to be a collection of common packages for analysis and data science. This README is meant to be the manual, linking to the documentation for each key package.

This “manual” is organized by topic. If you feel something should be added, please feel free to submit a pull request.

Environment Documentation

Programming Modules

Module Description
argparse* Makes it easy to write user-friendly command-line interfaces
re* Provides regular expression matching operations similar to those found in Perl
smtplib* Defines an SMTP client session object that can be used to send mail to any internet machine with an SMTP or ESMTP listener daemon
datetime* Supplies classes for manipulating dates and times
collections* Implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.

Data compression

Module Description
zlib* Functions in this module allow compression and decompression, using the zlib library.
gzip* Provides a simple interface to compress and decompress files just like the GNU programs gzip and gunzip.
bz2* Provides a comprehensive interface for compressing and decompressing data using the bzip2 compression algorithm.
lzma* Provides classes and convenience functions for compressing and decompressing data using the LZMA compression algorithm
zipfile* Provides tools to create, read, write, append, and list a ZIP file.
tarfile* Makes it possible to read and write tar archives, including those using gzip, bz2 and lzma compression.

Performance Management & Profiling

Module Description
timeit A simple way to time small bits of Python code.
profile A pure Python module whose interface is imitated by cProfile, but which adds significant overhead to profiled programs. If you’re trying to extend the profiler in some way, the task might be easier with this module
cProfile A C extension with reasonable overhead that makes it suitable for profiling long-running programs. Based on lsprof.
pstats Formats reports from profile and cProfile modules.

Quality Control

Module Description
doctest Searches for pieces of text that look like interactive Python sessions, and then executes those sessions to verify that they work exactly as shown. Form of unit testing in docstrings
unittest Unit testing framework

System & Environment

Package Description
os* Provides a portable way of using operating system dependent functionality
sys* Provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter.
shutil* Offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal.
glob* Finds all the pathnames matching a specified pattern according to the rules used by the Unix shell
pip package installer for python
jupyter interactive environment for computing
python-dotenv read key-value pairs from a .env file and can set them as environment variables
pylint analyses your code without actually running it. It checks for errors, enforces a coding standard, looks for code smells, and can make suggestions about how the code could be refactored

Data Input and Output

Package Description
requests an elegant and simple HTTP library for Python, built for human beings
urllib.request* Defines functions and classes which help in opening URLs
pyodbc an open source Python module that makes accessing ODBC databases simple
sqlalchemy Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL
deltalake provides the capability to read, write, and manage Delta Lake tables
pyarrow A cross-language development platform for in-memory analytics
xlsxwriter Module that can be used to write text, numbers, formulas and hyperlinks to multiple worksheets in an Excel 2007+ XLSX file. It supports features such as formatting and much more.
openpyxl Library to read/write Excel 2010 xlsx/xlsm/xltx/xltm files.
json* manage JSON files

Data Munging & Validation

Package Description
polars[all] Embarassingly parallel data munging
pandas data munging
snowflake-snowpark-python data munging in Snowflake
pandera data validation
miceforest Impute missing values

Exploratory Data Analysis (EDA)

Package Description
pyskim a package for EDA at the commandline
sweetviz Open-source Python library that generates beautiful, high-density visualizations to kickstart EDA.

Math, Statistics, ML, & Data Analytics

Module Submodule Description
math* Provides access to the mathematical functions defined by the C standard. Part of the python standard library.
sympy Library for symbolic mathematics
statistics* Provides functions for calculating mathematical statistics of numeric real valued data. Part of the python standard library
random* Implements pseudo-random number generators for various distributions. Part of the python standard library.
statsmodels Estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration
scipy SciPy is a collection of mathematical algorithms and convenience functions
scipy.stats This module contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, masked statistics, kernel density estimation, quasi-Monte Carlo functionality, and more. This is a supplement to the base packages statistics.
scipy.linalg linear algebra library
patsy Package for describing statistical models and building design matrices
numpy NumPy is the fundamental package for scientific computing in Python
scikit-learn Simple and efficient tools for predictive data analysis, including ML
sktime A unified framework for machine learning with time series
igraph A fast open source python library to analyze graphs/networks
networkx A python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
nltk A leading platform for building Python programs to work with human language data
quantecon Open source python library for economic modeling

Visualization & Reporting

Module Description
altair A declarative statistical visualization library for Python, based on Vega and Vega-Lite.
matplotlib Comprehensive library for creating static, animated, and interactive visualizations
seaborn Provides a high-level interface for drawing attractive and informative statistical graphics.
plotnine An implementation of a grammar of graphics in Python based on ggplot2.
patchworklib A universal composer of matplotlib-related plots (simple matplotlib plots, Seaborn plots (both axis-level and figure-level), and plotnine plots).
itables Display your tables as interactive DataTables that you can sort, paginate, scroll or filter.
missingno A small toolset of flexible and easy-to-use missing data visualizations and utilities
quarto An open-source scientific and technical publishing system

Diagrams and Technical Drawings

Modules Description
mermaid-py an interface for the famous mermaid-js library that uses scripts to create diagrams
graphviz Facilitates the creation and rendering of graph descriptions in the DOT language of the Graphviz graph drawing software
diagrams Draw the cloud system architecture in Python code.

Image & Video

Modules Description
pillow adds image processing capabilities to your Python interpreter.
ffmpeg-python Use ffmpeg from python.

* Part of the python standard library

About

This repo contains the environment file for the conda enviroment that I use for adhoc analysis