We have acquired many basic and advanced programming skills in the Python course at Bioinformatics Institute. During the course, we have learned how to work with lots of Python labraries, such as:
sys
,os
;numpy
,pandas
;matplotlib
,seaborn
;re
,io
,argparse
;dataclasses
,functools
;concurrent
,multiprocessing
,threading
;requests
,bs4
,python-dotenv
, etc.
Here are brief descriptions of the tasks that we performed to master the topics. Folders contains tasks solutions, their descriptions, and required system and Python packages.
Homework12. Parallel programming π₯
Here some solutions for Python Parallel Programming Tasks using sklearn
, numpy
, and some built-in packages for
parallel programming in Python, such as concurrent
, multiprocessing
, and threading
. During the tasks, we parallelized fit()
,
predict_proba()
, and predict()
methods for custom implementation of Random Forest Classifier, created decorator function
memory_usage()
to limit the memory usage of a target function and a class ParallelMap
that parallelizes the map function using
multiprocessing.
Homework11. Internet and API π
Here some solutions for Python Internet Tasks using requests
, bs4
, re
, io
, python-dotenv
, and other modules.
For these Homework, we parsed Top 250 IMDb film list to obtain certain information from web page using bs4
module,
created decorator telegram_logger
for calculating function execution time, creating .log
file, and sending it to Telegram bot, and
created Python API for GENSCAN web tool.
Homework10. Decorators and iterators π βΏ
Here some solutions for Python decorators and iterators Tasks using os
, dataclasses
, and functools
modules.
During the tasks, we create a derivative class of a regular dictionary, with the exception of iteration giving both keys and values,
function that appends new element to the list iterator returning iterator, decorator for class methods to return current class objects,
context manager for reading .fasta
files and storing them in dataclass FastaRecord()
format, replaced all public methods to private,
and vice versa, using decorator switch_privacy()
, and wrote a script for finding all possible (non-unique) genotypes when crossbreeding
two organisms and calculated the probability of a certain genotype (its expected share in the offspring.
Homework9. Object oriented programming π
Here some solutions for Python object oriented programming (OOP) Tasks using datetime
, abc
, and numpy
modules.
For these tasks, we created classes Chat
, Message
, and User
for sending messages to chat and retrieving their information,
class Args
to call function via specific syntax, subclass StrangeFloat
derived from float
, and some classes for dealing with
Biological Sequences. Also, in some function we replaced as much as possible Python basic syntax on synonymous dunder methods, attributes,
and variables, just for fun:)
Homework8. UNIX command copycats π
These scripts are equivalents of the eponymous bash commands that also could be combined into pipeline. They were written using
standard python modules, such as os
, sys
, argparse
and shutil
. Here some guidlines for how to use them
and how do they work.
Homework7. Functional programming π
Utility functional.py provides some functions for functional programming tasks, such as
sequential_map, consensus_filter, conditional_reduce, func_chain, multiple_partial, and nothing_to_print.
This functions based on bare Python source and built-in sys
package.
Homework6. Regular expressions π
Here some solutions for Python regular expressions Tasks using re
module. During the tasks, we parsed FTP links,
analyzed 2430 A.D. short story by Isaac Asimov text, wrote script for translator from Russian to "brick language", etc.
Homework5. Pandas and plots π
Here some solutions for Python pandas
and plot customization Tasks using matplotlib
and seaborn
.
During the task, we created function for reading .gff
and .bed
files and converting them into pandas.DataFrame
,
reconstructed bedtools intersect
using pandas
, created and customized volcano plot based on differential expression data,
created and customized bar of pie chart where the first slice of the pie is "exploded" into a bar chart with a further breakdown of
certain slice's characteristics.
Homework4. Numpy π’ @ π‘
Utility numpy_challenge.py provides some functions for dealing with one and two dimensional matrices given
in numpy.array
format, such as matrix multiplication, multidimensional distance computation, and distance matrix generation.
This functions based on Python NumPy package. You may use numpy_challenge.py functions in other programm
after importing the utility as module import numpy_challenge as {name}
. If you run numpy_challenge.py as
the main program, it will also generate three different numpy.arrays and put them into std.output
.
Homework3. Virtual environments π§
Programmer Mikhail published comprehensive study focused on python modules and virtual environments. In Supplementary he attached a link to the repository on Github with his code. Unfortunately, the utility ultraviolence.py was not adapted for widespread usage. Here some guidlines with requirements.txt for proper launch of Mikhail's script on Ubuntu 22.04.1 LTS with Python v3.11.0a7.
Homework2. FastQ filtrator π
Utility fastq_filtrator.py deals with .fastq
files. It opens input .fastq
file,
calculates Q-score, GC-content, and length for every read in file. According to calculated read Q-score, GC-content, and length,
it performs three types of read filtering using values of _bound
variables.
As the result, fastq_filtrator.py creates _passed.fastq file with passed reads
using your output_file_prefix
. If you pass to save_filtered
variable boolean True
, it also allows to generate
_failed.fastq file with failed reads. This utility is conveniet for multiple filtering of the same input .fastq
file.
Homework1. Collections π³
Utility hw1_Collections.py provides some basic tools for working with nucleic acid sequences. Based on python dictionaries and the complementarity principle, this script allows to make complement sequences for DNA and RNA, to reverse them, and to transcribe RNA sequence from DNA template. For your convenience hw1_Collections.py is not case-sensitive, and all charachters in output sequences in accordance with input register.