Cathy272272272's repositories
HKEX-web-scraping
A piece of code which helps to scrape data from the official website of Hong Kong Stock Exchange(HKEX).
Dijkstra
Calculate all source shortest path problem by applying Dijkstran=|V|times with both serial and OpenMP. In serial.c and OpenMP.c, I wrote a void printArr(int dist[]) function to print the result of shortest paths for each source and each vertexs. I call the function in line 174(serial.c), and line 177(OpenMP.c). You can choose to use this function when you are testing tiny graph(tiny.txt) to test the correctness. And choose not to call this function when testing test_graph_small.txt and test_graph.txt, because printing is very time consuming. Also, you can set the num_of_thread for OpenMP.c in line 225. Mine OpenMp version is slower than serial one, because I only create one minHeap for serial one, and keep refreshing it for each vertex source. For OpenMp, I create minHeap for each vertex source to avoid crashing, so it is slower. Timing: For serial.c: test_graph_small.txt: 4.410999s test_graph.txt: 5319.476639s For OpenMP.c: test_graph_small.txt: 8.666312s when number_of_threads = 3 7.423817s when number_of_threads = 5 7.758672s when number_of_threads = 10 9.605392s when number_of_threads = 100
Implementation-of-async-and-packaged_task-in-C-
Implement async with the help of packaged_task and implement packaged_task with the help of promise.
Morris-Traversal
Inorder, Preorder, Postorder The only difference between Inorder and Preorder is the time to add. Inorder: add when left.right == root Preorder: add when left.right == null. However, both need to add when left == null. Postorder: Symmetric to Inorder and Preorder and use addFirst instead of add. The time to addFirst is right.left == null and right == null. We store each node from root to leaf, since we use addFirst, we need to store the right child first and then left child, them we get Postorder.
Graph-Period
For an irreducible (strongly connected) graph G we can calculate its period as follows: Perform a depth-first search of G For each e in G that connects a vertex on level i of the depth-first search tree to a vertex on level j, let ke = j − i − 1. Compute the greatest common divisor of the set of numbers ke To find the graph period, we need to find the smallest gcd for all pairs of vertex.
Image-Processing-System
Implement a mini image processing system for PPM images
Speaker-Recognition-System
I implement a speaker recognition system in haskell. The methodology is like, given 2 speakers' speech, we can build a markov model for both of them, and then we are given an unknown speaker's speech, then the system could tell us which speaker the unknown speaker is mostly to be.
Universal-Hashing
I implement a universal family of hash function to achieve better performance in an adversarial environment if someone knows what hash function you use in advance. For instance, if someone who knows you are using a specific hash function on a set table size might construct a pathological set of keys that they know will all hash to the same value, so that the hash table only fills a single bin and lookup performance decays from O(1) to O(n) time. I provide two ways to implement universal hashing. 1.Using a fixed hashing vector. The fixed vector is dependent on a specific hashing function, and the vector is updated only if the hash table touches its rehashing conditon. 2.Without a fixed hashing vector. This could be achieved by fixing a hashing seed, and calculate the hashing vector in place each time when hashing. When the hash table touches its rehashing condition, the hashing seed is updated.
Comparision-among-serial-pthread-and-OpenMP-via-Julia-Set
We utilize Julia sets to compare thr performance of serial, pthread and OpenMP. We can compare the performance of pthread and OpenMP by ploting a graph with respect to running time and number of threads(See Julia Set Comparison.pdf).
Dictionary
Implementing dictionary in 2 ways. Naive BST Dictionary Self Balancing BST Dictionary(Red-Black Tree) To compare the performance of two implmentations, I use a alphabetized dictionary(alphabetized_dictionary.txt) and record the total time to import the whole dictionary and the average time to find each word in the dictionary. Naive BST Dictionary Time to import (sec):2.72051 Avg. time per find (us): 47.52343 Self Balancing BST Dictionary Time to import (sec): 0.11404 Avg. time per find (us): 0.89738 The difference between the two implementations is because the dictionary is in alphabetical order, so the Naive tree has much more levels, so the time is much more longer.
Fit-Plan-Database
This database is utilized for people who wants to keep fit, by doing sports and control their daily calories. Intuitively, a user could record what they eat, how much calories they have and what sports they did each day. They can also record their initial basic and health information, including some fixed information like name, address, blood type, gender, and some variable information like weight, height. After having recorded their information, they can take a health plan, with specified starting date, and each health plan have its own categories, like reduce fat or increase muscle. Moreover, a meal plan is embedded inside each health plan, so user could quantify how much calorie they take each day(eat), and how much calorie they exhaust each day(sports). Also, user could record their change in weight and other health information periodically, to check whether they have made progress. To fully utilized this database, a clever user should browse the food within each food category to keep a balanced diet. For further analysis, I could use the database to analyze the relationships between people's health information and people's basic information. For example, this database has recorded users' basic information(click "User Basic Information") including their address, and some basic health information(click "User Health Indormation") like blood type, age, etc. If the amount of data is large, we can build a relation between the basic information and health information, e.g. Are people in Chicago heavier than people in New York? Also, we can include the user's progress(click "Progress" ) to analyze user's progress with respect to their basic and health information. To utilize those analysis, user's progress could be quantitatively analyzed, so we need to include more variables to quantify user's progress. For example, not only weight and height should be recorded, BMI should also be recorded as well. Also, we can analyze from another aspect, that is which plan is more efficient? We can build a relationship between health plan(click "Plan"), meal plan(Click "Meal") and users' progress(Click "Progress"). To be detailed, we could analyze according to the types of health plan(increase muscle or reduce fat?), analyze from the duration(ending date minus starting date), and meal plans included to better build to relationship. However, to analyze further, we need to further categorize our health plan, to categorize them in more detailed type, and categorize meal plans in more details as well. For example, we could categorize our meals plan according to the nutrition facts of food included in the meal plan. e.g. 0-fat meal plan, no gluten meal plan, etc.
KeJiang
An app for testing how much your destiny is bound with the famous soccer player Diego Costa (Sheng Shi Mei Yan)
nand2tetris
Implementation of a general purpose computer and OS built from first principles.
ThreadPool
A simple C++11 Thread Pool implementation
Viola-Jones
Python implementation of the face detection algorithm by Paul Viola and Michael J. Jones