bids-standard / pyedf

Python package to read from and write EEG data to European Data Format files.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python 3.x conversion for this module

bsandeepan opened this issue · comments

@robertoostenveld
I checked the code repo and it seems to be compatible with python 2.x. Since Python 3.x is getting more welcomed, I forked the repo and made a few edits to get it working with Python 3.x. If you have any similar plans, maybe I can collaborate with you for this project?
check out - https://github.com/bsandeepan95/pyedf

Without having looked at your fork -- you can write code that works with Py2+3 simultaneously with six and a few other tricks. If you do this, we can integrate it with pull request.

Without having looked at your fork -- you can write code that works with Py2+3 simultaneously with six and a few other tricks. If you do this, we can integrate it with pull request.

Did Py2+3 compatible coding before with "tricks". Okay, I will check out "six" module.

Hi @palday
Few queries -

  1. You asked me to look into six module but the lines of code requiring backward compatibility support is few. Also, I am proposing to support only Python 2.7, not older deprecated versions (2.6 or lower). Py2.7 will be officially discontinued from 2020. Therefore, it seems only logical to use "tricks" to write few lines of backward compatible code rather than using a large dedicated module, given that this library may only read data from and write data to EDF/EDF+ files. That, however, is my personal opinion. I might be missing some aspects here. If so, please inform me about them.
  2. In the latest version, the author is not yet sure about the order of Calibration and Offset. I would be thankful if this confusion is explained to me. I am a newcomer in medical S/W field so things are a bit confusing to me as well.

Requesting for your help in these issues.
Thank you. 😃

Regarding (1): I have no strong preference for how to make it compatible with both simultaneous. Py2.7 + Py3 with just a few try blocks or the like is perfectly fine with me.

Regarding (2): I don't know off the top of my head and probably won't have time to take a closer look this week.

Why would you make an effort to support Python 2.7, instead of focusing all efforts on a good Python 3 package?

Because it's already Python 2.7 code, so it's trivial in terms of effort, and for better or worse, Python 2.7 is still around in a lot of places.

Hi @palday
I made the changes to have Py2.7+3 support in EDF.py (literally one try block).
Should be working fine. Kindly check and confirm. Thanks.
https://github.com/bsandeepan95/pyedf

It has been 12 days and I made more changes -

  1. Added detailed documentation of the classes to the best of my ability. I may have missed something. If so, kindly check and confirm.
  2. Formatted code according to PEP 8 Guidelines.
  3. Added a test EDF file from https://www.teuniz.net/edf_bdf_testfiles/index.html
  4. Edited ReadMe.md
  5. Changed a irregular variable name in code. meas_info['magic'] is now meas_info['file_ver'] pointing to the file version which, according to EDF Website, is solved as such -

The only incompatibility with EDF is, that signals may be recorded discontinuously. Therefore, we have decided that the EDF+ 'version' field must still read '0' like in EDF. In this way, old EDF viewers will still work and display EDF+ files (be they continuous or discontinuous) as continuous EDF files. EDF+ software will know the difference between continuous and discontinuous files from the mentioned 'reserved' field.

I tested the code in py2.7 and it seems to be running with no issues. Let me know your results. Then we can go ahead with pull requests.

Thanks for putting in so much effort.

This is multiple pull requests. If you separate out the addition of the doc strings (for existing functions, not for new ones) into a separate feature branch, that would be a relatively straightforward pull request you could make now.

I'm not sure how I feel about re-organizing the README.

I'm not sure changing the magic field to file_ver is necessary. "Magic numbers" are a concept in computer science for indicating file type (e.g. 0xDEADBEE) and it's not really a file version -- after all EDF and EDF+, which are different not entirely compatible formats, both use a zero in that field.

I don't know how I feel about adding the test file to the repository -- even though it's small for EEG, it's big for code. An alternative is to set up the configuration for the test suite that the file is downloaded as necessary. @robertoostenveld Thoughts?

For some of the PEP8 stuff with linebreaks: you were a bit zealous with parentheses. Binary operators (such as or) imply parentheses and having an open parenthesis right before a line break is not nice. If you need to add parentheses for something of that sort, then put it around the entire conditional.

For write_byte: I'm not sure assuming UTF-8 is the right decision there. I think the far better thing to do would be to treat everything as bytes and not as strings and use the associated bytes methods in Python 3, instead of posthoc converting things. Your solution will probably work most of the time if you read in bytes, represent them as UTF-8 strings, and then write them back out, but it's misleading about what's actually going on and will cause some problems when you have bytes sequences that correspond to non-existent Unicode codepoints or 0x00.

I think I can make a pull request for the docstrings but I think a better one is when it's done as expected.

I did not know the concept of magic numbers but now that I have read about it, I understand the purpose and will revert this change asap. Also, I will leave a note about it in docstring so that anyone in future do not get confused over this like me.

I included the test file for ease, but I think we can add the download link in ReadMe and ask the devs to download it or any other test file, and change the file name in code for testing (or create a new .py file for this as I did once).

I was not particularly having obsession for parentheses as much as I was trying to maintain the 80-char line limit and represent the code in a cleaner and readable way. Open Parentheses is kind of similar to having open curly braces before writing any operations in loop or conditional statements in other languages. I will see how much I can do about styling the binary operators.

about write_byte, I do also think that ASCII conversion is the better path since EDF file format reeks of it along with the Y2K bug it has due to being a format of pre-2000 era. I used UTF-8 absentmindedly and since it did not raise any issues (I tested the newly created file with EDFBrowser and it seem identical), I was not bothered about it. So, do we only change it to ASCII chars, or have any Exception handling for possible use of Non-ASCII chars?

The docstrings are a separate issue from the other things and really should be a separate pull request. That will allow us to review/comment on the docstrings as a purely documentation issue and not on a Python 2/3 issue. There is no need to comment on the 'magic' part of the magic number, but generally adding references to the various fields in the EDF header would not be a bad idea.

Although I'm not entirely onboard with the PEP8 obsession of 79 chars ("consistency is the hobgoblin of little minds" and all that), I understand why you had parentheses there and instead wanted to suggest a different solution. The more common one is to put the whole condition in parentheses (i.e everything between if and :).

I wouldn't call the dates in EDF a Y2K bug because they don't suffer from the same ambiguity as say financial accounts because there were no digitally recorded EEG signals before the 1970s and it's a fair assumption that no recording in EDF format existed more than a few years before the standard was approved in 1992. Note that EDF+ addresses this in various ways: the birthdate field in the patient info uses a full 4-digit number for year and there is an additional field for fully specifying the recording year.

By design, UTF-8 will pass undetected throw systems expecting but not interpreting ASCII (such as various email systems). The problem is that Python 3's String type isn't required to use any particular coding scheme internally to represent Unicode codepoints. So now the bytes are being interpreted and converted to codepoints and then re converted later on and it's this application of 'semantics' that causes problems. The correct thing to do here is to read it as Bytes and only decode the actual text fields (which are specified to use a very small subset of ASCII related to the usual way of representing dates) from ASCII and then likewise for writing encode ASCII strings to bytes. Any UnicodeErrors or the like I would treat as fatal.

Apologies for such delay. I was occupied in office projects. Will make a branch to add the documentation separately soon.

Hey @palday ,

I hope you are still online. Apologies for the EXTREME delay. I have removed the docstrings for the time being (They are backed up elsewhere). The PEP8 formatting is there but I am hoping it will help you to check codes with clarity. Do let me know if the latest commit is okay to merge. Afterwards, I will create a separate pull request for the docstrings.

The project link - https://github.com/bsandeepan95/pyedf

Why not move to Python 3?

As of January 1st, 2020 Python 2 is no longer supported. Most Python libraries out there have moved to Python 3 and have dropped Python 2 support.

@DimitriPapadopoulos

As pointed out in the README, there is a Python 3 version available:

An experimental Python 3 compatible version is available from the `py3k` branch.

Notably, that work began before Python 2 support EOL.

But as to why not make the Python 3 branch the default?

  • The support EOL for Python 2 does not magically stop all existing Python 2 interpreters from running, nor does it upgrade all machines, especially older machines that are not connected to the internet. In my experience, such machines are surprisingly common in medical settings, where a lot of EEG tends to be collected.
  • The present codebase doesn't really have any tests or CI, so it's really hard for me to be sure that the Python 3 migration didn't break anything. I would like to add tests and set up CI, but....
  • I have a day job that keeps me pretty busy and I'm not compensated in any way for working on this codebase.
  • If you really want to use Python 3 to analyze EEG data stored as EDF, then MNE-Python has a well-supported EDF reader. While it would be nice to have a simpler tool than MNE for reading and writing EDF data, Python 3 users nonetheless have at least two options for reading and writing EDFs (the py3k branch here and MNE-Python).

I would be happy to consider pull requests to provide better Python 3 support for this package but only once there are tests. This means that potential contributors either need to wait for me to write tests and set up CI or they need to contribute tests first. (In the latter case, I would be happy to help set up CI -- that's the easy part!)

The very vast majority of Python libraries such as NumPy or SciPy have dropped support for Python 2 and moved to Python3. They cannot be collectively wrong.

Applications still in need of Python2 may use old releases of pyedf that run with Python 2. You could tag a last Python2 compatible release for that purpose, perhaps one with very basic fixes I could provide, such as PEP8 compliance. Besides, why care so much about machines that run Python 2 if you're not compensated for working on this codebase?

That said, I agree tests would be great. Is Python 2 still available for CI tests?

I'l try MNE in the short term.

I believe this package is in need of new maintainers; I have little interest in Python and EDF, and I think that @palday has moved on to mainly working on/with Julia. @DimitriPapadopoulos and @bsandeepan would you be interested?

@robertoostenveld Sure. I could help with basic maintenance: prepare a cleaned up Python 2 release, move to Python 3, fix bugs and adapt to new releases of dependencies. I probably won't have time for large changes myself, but I can review contributions.