Introduction

This repo contains all graded work done by me (Shafiq al-Shaar) for the course of "Bioinformatics Specialization" given by University of California San Diego.

Tools used

Python3 - language used in implementing the course work / code challenges.
Sublime - text editor.
OneNote - for note taking.
AnkiApp - for creating flash cards.
OriFinder - http://tubic.tju.edu.cn/Ori-Finder/

Work

Week 1 - Finding Hidden Messages in DNA (Bioinformatics I)

`genome_pattern.py`

Hurdles

I wasn't able to get correct answer on large datasets as I pasted the dataset through the terminal.

I got 117 because I copy-pasted the input from the browser into terminal, and there is a limitation on how many bytes (4096) can be read in one line from the terminal. If instead I run the program like:

	`xclip -o | python3 pattern_count.py`

Then I get the correct answer.
I copy-pasted the input, because the format of the extra dataset is different from the input of actual test sets, which is very unfortunate and should be changed. If I download the extra dataset instead, I need to delete the first line and strip carriage returns. But there's no such need with data challenge test sets.

See: https://stepik.org/lesson/2/step/7?discussion=291620&reply=291822&unit=8231 by Arkadiusz Olek

Which made me just load the file instead of pasting it.

`frequent_words.py`

Hurdles

A bit complicated.

The psuedocode may throw you off a little bit so you're better about reading over the problem again and the comments to get a better undestanding of the situation.

`reverse_complement.py`

`pattern_occurence.py`

`computing_frequencies.py`

test_computing_frequencies.py
computing_frequencies.py

`pattern_to_number.py`

test_pattern_to_number.py
pattern_to_number.py

`number_to_pattern.py`

test_number_to_pattern.py
number_to_pattern.py

Why am I doing it?

The replication of cells and the intericate maintainability all of our cells go through has always bewildered me.

Bioinformatics, I believe, is a gateway to understanding this mechanism.

My hope and endgoal is that one day I may contribute to the research of cancer studies.

My favorite bit about the course has been:

An error during replication can lead to various diseases, including cancer. To understand how replication initiation works and what causes it to malfunction, we must first know where to look for replication origins. For this reason, we must accurately locate ori sites in the genome to study their replication initiation. Things are made even more difficult when we move from bacteria to more complex organisms; the human genome has thousands of origins of replication.

My background

My qualifications are 6 IGCSEs and TOEFL.
Couldn't continue A-Levels / University due the responsbility of taking care of my siblings.
I'm an IT software developer having done pretty large projects.
I was always fascinated of the bioinformatics realm since I first heared of it in 2012 as an emerging sector.
I'm hoping to go back to university after taking the Bioinformatics Specialization course.

About me

I'm Shafiq, born in 1994 and been a software developer since 2009.

You can reach me via Twitter @spacemudd.

You can also find me on LinkedIn at Shafiq al-Shaar.

And for any web dev related jobs, you can reach me on Upwork.

spacemudd / ucsd-bioinformatics

Introduction

Tools used

Work

Week 1 - Finding Hidden Messages in DNA (Bioinformatics I)

`genome_pattern.py`

Hurdles

I wasn't able to get correct answer on large datasets as I pasted the dataset through the terminal.

`frequent_words.py`

Hurdles

A bit complicated.

`reverse_complement.py`

`pattern_occurence.py`

`computing_frequencies.py`

`pattern_to_number.py`

`number_to_pattern.py`

Why am I doing it?

My background

About me

About

Languages