ananas-des / Python_BI_2022

Repository for Python Course Homeworks in Bioinformatics Institute 2022-2023

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Repository for Python Course Homeworks in Bioinformatics Institute

Python NumPy Pandas Matplotlib Shell Script

We have acquired many basic and advanced programming skills in the Python course at Bioinformatics Institute. During the course, we have learned how to work with lots of Python labraries, such as:

  • sys, os;
  • numpy, pandas;
  • matplotlib, seaborn;
  • re, io, argparse;
  • dataclasses, functools;
  • concurrent, multiprocessing, threading;
  • requests, bs4, python-dotenv, etc.

Here are brief descriptions of the tasks that we performed to master the topics. Folders contains tasks solutions, their descriptions, and required system and Python packages.

Homework12. Parallel programming πŸš₯

Here some solutions for Python Parallel Programming Tasks using sklearn, numpy, and some built-in packages for parallel programming in Python, such as concurrent, multiprocessing, and threading. During the tasks, we parallelized fit(), predict_proba(), and predict() methods for custom implementation of Random Forest Classifier, created decorator function memory_usage() to limit the memory usage of a target function and a class ParallelMap that parallelizes the map function using multiprocessing.

Homework11. Internet and API 🍜

Here some solutions for Python Internet Tasks using requests, bs4, re, io, python-dotenv, and other modules. For these Homework, we parsed Top 250 IMDb film list to obtain certain information from web page using bs4 module, created decorator telegram_logger for calculating function execution time, creating .log file, and sending it to Telegram bot, and created Python API for GENSCAN web tool.

Homework10. Decorators and iterators 🎭 ➿

Here some solutions for Python decorators and iterators Tasks using os, dataclasses, and functools modules. During the tasks, we create a derivative class of a regular dictionary, with the exception of iteration giving both keys and values, function that appends new element to the list iterator returning iterator, decorator for class methods to return current class objects, context manager for reading .fasta files and storing them in dataclass FastaRecord() format, replaced all public methods to private, and vice versa, using decorator switch_privacy(), and wrote a script for finding all possible (non-unique) genotypes when crossbreeding two organisms and calculated the probability of a certain genotype (its expected share in the offspring.

Homework9. Object oriented programming 🐍

Here some solutions for Python object oriented programming (OOP) Tasks using datetime, abc, and numpy modules. For these tasks, we created classes Chat, Message, and User for sending messages to chat and retrieving their information, class Args to call function via specific syntax, subclass StrangeFloat derived from float, and some classes for dealing with Biological Sequences. Also, in some function we replaced as much as possible Python basic syntax on synonymous dunder methods, attributes, and variables, just for fun:)

Homework8. UNIX command copycats πŸš‚

These scripts are equivalents of the eponymous bash commands that also could be combined into pipeline. They were written using standard python modules, such as os, sys, argparse and shutil. Here some guidlines for how to use them and how do they work.

Homework7. Functional programming πŸ”€

Utility functional.py provides some functions for functional programming tasks, such as sequential_map, consensus_filter, conditional_reduce, func_chain, multiple_partial, and nothing_to_print. This functions based on bare Python source and built-in sys package.

Homework6. Regular expressions πŸ“Ž

Here some solutions for Python regular expressions Tasks using re module. During the tasks, we parsed FTP links, analyzed 2430 A.D. short story by Isaac Asimov text, wrote script for translator from Russian to "brick language", etc.

Homework5. Pandas and plots πŸ“ˆ

Here some solutions for Python pandas and plot customization Tasks using matplotlib and seaborn. During the task, we created function for reading .gff and .bed files and converting them into pandas.DataFrame, reconstructed bedtools intersect using pandas, created and customized volcano plot based on differential expression data, created and customized bar of pie chart where the first slice of the pie is "exploded" into a bar chart with a further breakdown of certain slice's characteristics.

Homework4. Numpy πŸ”’ @ πŸ”‘

Utility numpy_challenge.py provides some functions for dealing with one and two dimensional matrices given in numpy.array format, such as matrix multiplication, multidimensional distance computation, and distance matrix generation. This functions based on Python NumPy package. You may use numpy_challenge.py functions in other programm after importing the utility as module import numpy_challenge as {name}. If you run numpy_challenge.py as the main program, it will also generate three different numpy.arrays and put them into std.output.

Homework3. Virtual environments πŸ”§

Programmer Mikhail published comprehensive study focused on python modules and virtual environments. In Supplementary he attached a link to the repository on Github with his code. Unfortunately, the utility ultraviolence.py was not adapted for widespread usage. Here some guidlines with requirements.txt for proper launch of Mikhail's script on Ubuntu 22.04.1 LTS with Python v3.11.0a7.

Homework2. FastQ filtrator πŸ›‚

Utility fastq_filtrator.py deals with .fastq files. It opens input .fastq file, calculates Q-score, GC-content, and length for every read in file. According to calculated read Q-score, GC-content, and length, it performs three types of read filtering using values of _bound variables. As the result, fastq_filtrator.py creates _passed.fastq file with passed reads using your output_file_prefix. If you pass to save_filtered variable boolean True, it also allows to generate _failed.fastq file with failed reads. This utility is conveniet for multiple filtering of the same input .fastq file.

Homework1. Collections 🐳

Utility hw1_Collections.py provides some basic tools for working with nucleic acid sequences. Based on python dictionaries and the complementarity principle, this script allows to make complement sequences for DNA and RNA, to reverse them, and to transcribe RNA sequence from DNA template. For your convenience hw1_Collections.py is not case-sensitive, and all charachters in output sequences in accordance with input register.

About

Repository for Python Course Homeworks in Bioinformatics Institute 2022-2023


Languages

Language:Jupyter Notebook 94.8%Language:Python 5.2%Language:Shell 0.0%