This is the initial branch of the main pipeline for data analysis at the Hussaini Lab. It will eventually be forked to Hussaini Lab's GitHub account when it's ready to be used. This repository is designed to be collaborateively developed according to the guidelines laid out below.
There are four main software modules called core
, library
, scripts
and widgets
. There are also several additional bridge / firmware modules that are all named by the convention x_{}
(e.g. x_io
, x_cli
, etc.).
In the io layer, there are submodules for reading and writing different data formats, and tools for getting data from users. Additionally, there are two modules for creating the data classes Session and Study. These classes serve as a bridge and formatting bottleneck between the io layer and the core data types. They store data in a in a standardized format that is easy to convert into the core data types. This format is isomorphic to dictionaries with keys and contents as outlined below (i.e. every io module must generate a subset of the following keys and contents).
- animal
- animal_id
- species
- sex
- age
- weight
- genotype
- animal_notes
- devices
- implants
- implant_id
- implant_type
- tetrode
- tetrode_array_{int}
- sillicone_shank
- ...
- implant_geometry
- channel_{int}: (x, y, z)
- ...
- implant_data
- sample_rate OR irregular_sampling: True
- event_times: [float]
- event_labels: [string]
- channel_{int}: [float]
- units: string
- axona_led_tracker
- led_tracker_id
- led_location
- led_position_data: {
time
:float,x
:float,y
:float}
- implants
The core
module contains the canonical data types and functions for managing them. This module has no dependencies other than the version of python being used and numpy. Some of the data structures are inspired by the Neurodata Without Borders project, though the code is not directly based on that project.
The library
module contains data types, classes and functions for exploring, analyzing and visualizing the core data types. Each of these functions does only one thing.
The scripts
module contains use cases where settings are configured up front and then several functions are called in sequence to perform the desired task. There are three main types of scripts. (1) Batch processing scripts that run on a large number of files. (2) Parameter exploration / optimization scripts that run on a single file. (3) Automations for smaller tasks that require multiple library functions, are repeated often, and should be performed the same way everywhere. The third type of script may be called by the first two.
The widgets
module contains framework-agnostic widgets for interactive analysis. Each widget is constituted of two parts. The first part is a View
class that contains all the states of the widget, as well as any text that would be displayed to the user. The second part is a Controller
class that contains all the logic for the widget. The Controller
class is responsible for updating the View
class when the user interacts with the widget and for calling the appropriate functions in the library
or scripts
module. Each widget is structured as a microservice, so there are no dependencies between the widgets. Any information that is needed by a widget is passed as a parameter to the Controller
class. Communication between two widgets (if ever necessary) would be done through separate bridge submodules in the widgets
module.
The x_{}
modules contain interfaces for frameworks and APIs. Moreover, they contain adapters, gateways and other glue code for the core and library modules.
The x_cli
module contains the command line interfaces for running scripts and launching widgets.
The x_gui
module bridges GUI framework(s) for the widgets.
The x_io
module contains the functions for reading and writing data from various formats in various frameworks (e.g. databases, os file systems, server APIs, etc.).
If you would like to contribute to the project, please read all the guidelines first.
- Generally follow the PEP 8 style conventions.
- Additionally, please name functions and classes in such a way that there are no names that are substrings of other names. For example,
get_waveform
andget_waveform_from_file
are not allowed. Instead, use something likeget_waveform
andwaveform_from_file_path
. This is to allow global search and replace when changing the name of a function or class. - When importing modules:
- Group all the internal dependencies together at the top under the header "# Internal Dependencies".
- Group all the external dependencies together below header the internal dependencies.
- Divide them into subgroups under the headers "# A+ Grade External Dependencies," "# A Grade External Dependencies" and "# Other External Dependencies".
- See the External Dependencies section for more details on the lists.
- TDD is required for contributors.
- Use
pytest
for unit tests. - Store test files for a given module (top folder level) in a folder of the form
{module}_tests
. - If there are folder level submodules, create a subfolder named with the pattern
{submodule}_tests
. - For each submodule in the module, create a test file and name it
test_{submodule}.py
. This convention is very important because it allowspytest
to find the tests. - Write your test(s) before writing your new function or class.
- A function or class should either have an automated test or contain only code that requires manual testing.
- e.g.
View
andController
classes in thewidgets
module are segregated from theWindow
classes in thex_gui
module; - the
Window
classes do exactly and only two things---display the state of aView
object and feed user input to theController
.
- e.g.
- Dependencies between layers flow only one way:
core
<library
<scripts
<widgets
<x_{}
.- e.g.
library
calls onlycore
,scripts
can directly calllibrary
orcore
, ect.
- e.g.
- Limit interdependency among modules in the same layer. Segregate large scripts or widgets into microservices that are independent of each other (no common databases or global states).
- For function modules, create roughly one public function per module. Name helper functions using an underscore as the first character (e.g.
_helper_function
). - For class modules, use only a handfull of abstract classes per module (preferably no more than one).
core
should not depend on any external libraries except numpy.library
should depend only oncore
and a few A+ safe libraries (see below).scripts
andwidgets
should depend onlibrary
,core
and A safe libraries.- The
x_{}
modules can have dependencies on any framework or firmware that is reasonably well maintained.- Whenever possible, a submodule in a given
x_{}
module (e.g.x_gui.pyside6
) should depend only on the framework listed. - Exceptions exist. For example, the
x_cli
module contains submodules that launch widgets using dependencies onx_gui
submodules. When doing this, try to keep the dependencies as minimal as possible.
- Whenever possible, a submodule in a given
- numpy
- pandas
- matplotlib
- scipy
- statsmodels
- plotly
- PIL / pillow
- Qt / PySide6
- Axona / TINT file formats (.set, .bin, .X, .cut, .pos, .eeg, .egf)
- .X file extensions denote files where there is an integer after the dot (e.g. .1, .2, .10, etc.)
- Intan file formats (.rhd, .rdh?)
If you'd like to propose adding a library to the A Safe or A+ Safe list, please contact the current maintainer Oliver Shetler at cos2112@cumc.columbia.edu.
- atomic commits---commit every change to every function whenever the relevant test is passing
- only commit the relevant module (and test)
- frequent commits---DO NOT build an entire feature and then commit. This can lead to merge conflicts
- frequent pulls---pull every time before committing and notify partner if there is a merge conflict (tell them to pause on that module)
- separate hard to test parts of a class or function from the easy parts
core
is the most fundamental module. It should not depend on any other module. It should only contain classes and functions that pertain to basic data types that could conceivably have been read from a file.
When defining new class, please consider the following:
- Metadata: Are the data in the class metadata for an object that was used in the experiment?
- If yes, then the class should be in
core
. - If no, then the class might belong in
library
.
- If yes, then the class should be in
- Unit Homogenaity: Does the class contain an attriute that contains one principle data structure? Are all the data within the principle attribute of the same unit type?
- If yes to both, then the class should be in
core
. - If no, then the class might belong in
library
.
- If yes to both, then the class should be in
When defining a new function, please consider the following:
- Does the function operate on a single data type with homogeneous units?
- If yes, then the function should be in
core
. It might belong inside the pertainant class. - If no, then the function might belong in
library
.
- If yes, then the function should be in
library
is the module that contains the bulk of the code. It should depend only on core
and a few A and A+ safe libraries (see above). It contains the data structures and functions that are used to analyze, visualize and derive new data types from the data objects in the core
module.
When defining new classes, please consider the following:
- Does the class combine data from multiple core
classes?
- If yes, then the class should be in library
.
- If no, then the class might belong in core
.
- Does the class or function perform a non-invertable transformation on a single data type and/or does the operation change the units? (e.g. converting locations to speeds or velocities)
- If yes, then the class should be in library
.
- If no, then the class might belong in core
.
- Does the class store a single object for a single task?
- If yes, then the class should be in library
.
- If no, then the class might belong in scripts
or widgets
.
- Does the class contain only methods that do not cause side effects? (e.g. no direct manipulation of data; only creation and deletion of data attributes; e.g. no read/write or display methods)
- If yes, then the class should be in library
.
- If no, then the class might belong in scripts
or widgets
.
When defining new functions, please consider the following:
- Does the function perform a single analysis, reformatting or visualization task (without displaying)?
- If yes, then the function should be in library
.
- If no, then the function might belong in scripts
or widgets
.
Architecture rules:
- Horizontal Segregation: Scripts should not depend on each other.
- Vertical Segregation: Scripts should not depend on
widgets
or anyx_{}
modules. - No Global States: Scripts should not have any global states.
- No Interfaces: Scripts should not contain any adapters or interfaces.
- One Configuration Dictionary: Scripts should always take in exactly ONE configuration dictionary.
- Data Structures: Aside from one configuration dictionary, scripts should only take in data structures from the
core
andlibrary
modules.
When defining new classes or functios in the scripts
module, please consider the following:
- Does the class or function perform a sequence of steps that automate a use case?
- If yes, then the class or function should be in scripts
.
- If no, then the class or function might belong in library
or widgets
.
- Does the class or function require user input only at the beginning of the use case?
- If yes, then the class or function should be in scripts
.
- If no, then the class or function might belong in widgets
.
When defining new classes in the widgets
module, please consider the following:
- Does the class store part of an abstract user interface (e.g. a View
or Controller
)?
- If yes, then the class should be in widgets
.
- If no, then the class might belong in scripts
or library
.
- Does the class contain