Convert code base for Python 3.x

Question

Convert code base for Python 3.x

dipanjanS opened this issue 7 years ago · comments

Dipanjan (DJ) Sarkar commented 7 years ago

Python 3 is the future and even though a lot of legacy code and systems run on Python 2 (including our applications, which is why I had written this book in Python 2 in the first place). We need to slowly start migrating and building our code, apps and systems based on Python 3.

Looking for experts in Python 3.x as well as NLP and text analytics who could help out in migrating each chapter's codebase to Python 3.x, since I am occupied for a major part of this year on other projects. I do have some parts of it ready for Python 3.x and can offer help and support whenever needed.

Successful codebase migrations will make sure you are mentioned as a contributor in the acknowledgements & contributor list of this repository and project. Also you will get a mention in future versions of the book whenever that is in the pipeline.

Anubhav Bhardwaj · Answer 1 · Tue May 30 2017 02:38:06 GMT+0800 (China Standard Time)

I think I can help you with this.

Shashank Saxena · Answer 2 · Mon Jun 05 2017 01:25:24 GMT+0800 (China Standard Time)

If anyone is interested, I have updated almost all of chapters 1 to 4. Chapters 5 to 7 are displaying some 'lfs' error. I will try to resolve that later, but feel free to fork and make pull requests.

Here's the repo: text-analytics-with-python

Dipanjan (DJ) Sarkar · Answer 3 · Mon Jun 05 2017 02:00:03 GMT+0800 (China Standard Time)

Thanks, but like I said we need to follow a structured workflow and approach instead of working in an ad-hoc manner for this to have hassle free merges. Please wait before further conversions because I need to restructure the current repo sand put out a plan. I will do so in a couple of days.

Shashank Saxena · Answer 4 · Mon Jun 05 2017 03:28:25 GMT+0800 (China Standard Time)

Okay, will hold off on it. I would like to note that the module "pattern" does not seem to have support for Python 3 yet. This will hopefully change in the future, but Chapters 5-7 (and I think some of Chapter 4) are at a roadblock for now.

Dipanjan (DJ) Sarkar · Answer 5 · Mon Jun 05 2017 03:42:08 GMT+0800 (China Standard Time)

Sure and yeah I'm aware of the issue with pattern. There is an unofficial port but it's incomplete sadly since the last couple of years. I've thought of some strategies to tackle the same. Let me restructure the current repository then we can get started on this in more detail. I'll update once that is done then we can port and merge chapter by chapter.

Dipanjan (DJ) Sarkar · Answer 6 · Tue Jun 06 2017 18:43:40 GMT+0800 (China Standard Time)

Here is the first phase of the plan, once each step is done it will be checked off to keep track. I am currently on vacation so will update you guys as soon as the re-structuring is done.

Re-structure current repository @dipanjanS
Contributors to pull in latest changes
Port code for chapters 1-3 and send pull requests for each chapter separately
Merge subsequent pull requests to main repository after review @dipanjanS
Look into the pattern repository and necessary modules needed @dipanjanS
Discuss strategies for porting remaining chapters and post the plan for the same

Taras Vozniuk · Answer 7 · Mon Jul 03 2017 20:46:33 GMT+0800 (China Standard Time)

@dipanjanS
This idea might sound a bit weird... but do you think it makes sense adding type hints into Python 3.x code examples?

it just might be a bit easier to read through the code in the book.
and code completion / correct jump to definition within PyCharm...

Dipanjan (DJ) Sarkar · Answer 8 · Tue Jul 04 2017 00:13:56 GMT+0800 (China Standard Time)

@ambientlight Sorry I'm a bit tied up with work and a couple of other things so I'm not getting time to look into this. Maybe I will sometime soon. With regard to your query, are you talking about the type hints as in specifying the data type per variable in the code? If so maybe we can look into it once the entire code is ported.

Taras Vozniuk · Answer 9 · Tue Jul 04 2017 00:22:15 GMT+0800 (China Standard Time)

@dipanjanS got it! thanks a lot!
method parameters and return types I think would be good enough. normally variable is evident enough from the rhs of the expression.

I ported few things up to CH4. Something like this:

class Normalizer:

    stopwords: List[str] = nltk.corpus.stopwords.words('english')
    wnl = WordNetLemmatizer()

    @staticmethod
    def tokenize_text(text: str) -> List[str]:
        tokens: List[str] = nltk.word_tokenize(text)
        tokens = [token.strip() for token in tokens]
        return tokens

    @staticmethod
    def expand_contractions(text: str, contraction_mapping: Dict[str, str]) -> str:
        contractions_pattern = re.compile('({})'.format('|'.join(contraction_mapping.keys())),
                                          flags=re.IGNORECASE | re.DOTALL)

        def expand_match(contraction):
            match = contraction.group(0)
            first_char = match[0]
            expanded_contraction = \
                contraction_mapping.get(match) \
                if contraction_mapping.get(match) \
                else contraction_mapping.get(match.lower())

            expanded_contraction = first_char + expanded_contraction[1:]
            return expanded_contraction

        expanded_text = contractions_pattern.sub(expand_match, text)
        expanded_text = re.sub("'", "", expanded_text)
        return expanded_text

    # Annotate text tokens with POS tags
    @staticmethod
    def pos_tag_text(text: str) -> List[Tuple[str, str]]:
        # convert Penn treebank tag to wordnet tag
        def penn_to_wn_tags(pos_tag):
            if pos_tag.startswith('J'):
                return wn.ADJ
            elif pos_tag.startswith('V'):
                return wn.VERB
            elif pos_tag.startswith('N'):
                return wn.NOUN
            elif pos_tag.startswith('R'):
                return wn.ADV
            else:
                return None

        tagged_text = nltk.pos_tag(Normalizer.tokenize_text(text))
        tagged_lower_text = [(word.lower(), penn_to_wn_tags(pos_tag)) for word, pos_tag in tagged_text]
        return tagged_lower_text

I can contribute the typing later on if it would be appropriate.

Dr Bhushan Bonde · Answer 10 · Sun Jul 16 2017 20:31:05 GMT+0800 (China Standard Time)

I am not sure if all is now ported to python 3, if not I can contribute, I will checkout repo and add some tests for python 3
Bhushan

Dipanjan (DJ) Sarkar · Answer 11 · Sun Jul 16 2017 20:36:11 GMT+0800 (China Standard Time)

@ambientlight @pribond

Sure, thanks for the interest. Code is currently in Python 2. Unfortunately I am a bit pre-occupied with several things at work and one of my books. I'm planning to resume this around end of August hopefully or even earlier.

I still need to refactor the repository so that we have the code separate for Python 2 and 3. I will notify all in this thread once we are ready to start porting.

Bryce Freshcorn · Answer 12 · Mon Oct 30 2017 02:13:28 GMT+0800 (China Standard Time)

@dipanjanS What's the status of this issue? I'd be happy to help out.

Ivan Atanasov · Answer 13 · Mon Nov 13 2017 21:52:22 GMT+0800 (China Standard Time)

@dipanjanS is there any plan to convert this to Jupyter notebook?

Dipanjan (DJ) Sarkar · Answer 14 · Mon Nov 13 2017 21:57:28 GMT+0800 (China Standard Time)

Sorry folks, a bit tied up with multiple engagements at the moment. Following is what I promise as soon as I can get to it.

Code in both Python 2 and 3
Jupyter notebooks besides normal code files

Collaborating with some folks from work for better output and ease of communication. In case I need additional help I will update here.

Prakritidev Verma · Answer 15 · Tue Nov 14 2017 15:56:20 GMT+0800 (China Standard Time)

@dipanjanS I can help you with this is this issue is still open. I think creating Jupyter notebooks will be more interactive. Let me know if you need help on this.

Thanks

akhilap · Answer 16 · Wed Nov 15 2017 20:05:08 GMT+0800 (China Standard Time)

Hi,

Can you please help me with latest code for python 3.5 64bit operating system? I am using visual studio 2017 to run the code.

Prakritidev Verma · Answer 17 · Wed Nov 15 2017 20:17:10 GMT+0800 (China Standard Time)

I would say, use Jupiter notebook rather than Visual studio. Converting python 2 into python 3 is simple

Dipanjan (DJ) Sarkar · Answer 18 · Wed Nov 15 2017 22:54:52 GMT+0800 (China Standard Time)

Kindly go through the book to get details of what have been used. For now the code runs on Python 2.7.x and you can use the anaconda distribution. The same is mentioned in the book. There is a work in progress to convert the code into Python 3 as well as jupyter notebooks. Once that is done it will be updated here.

akhilap · Answer 19 · Tue Nov 21 2017 19:50:18 GMT+0800 (China Standard Time)

Can you please give any steps guideline documents on how to convert the code in python 2.X to 3.X using jupitor notebook?

Dipanjan (DJ) Sarkar · Answer 20 · Wed Nov 22 2017 02:39:51 GMT+0800 (China Standard Time)

Jupyter notebook is not used for code conversion, it is a mechanism to run code, document your findings and share it across with others easily if needed. You need to use your own logic and utility libraries like 2to3 or six to convert the code.

José Pedro Cordero · Answer 21 · Wed Jan 03 2018 01:48:10 GMT+0800 (China Standard Time)

Any plans to port the code to Python 3 in 2018?

Dipanjan (DJ) Sarkar · Answer 22 · Wed Jan 03 2018 02:55:29 GMT+0800 (China Standard Time)

@peterotool Thanks for bringing this up! Yep, work is already underway on this, we are planning to bring out a new revised edition of this book with all code in Python 3 and also adding new examples, use-cases and so on. Stay tuned! The book is going to come back better and with more content!

José Pedro Cordero · Answer 23 · Wed Jan 03 2018 05:33:34 GMT+0800 (China Standard Time)

@dipanjanS, Do you have any twitter account where i can follow any news regarding this?

…

On Tue, Jan 2, 2018 at 3:55 PM, Dipanjan Sarkar ***@***.***> wrote: @peterotool <https://github.com/peterotool> Thanks for bringing this up! Yep, work is already underway on this, we are planning to bring out a new revised edition of this book with all code in Python 3 and also adding new examples, use-cases and so on. Stay tuned! The book is going to come back better and with more content! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABZ9AYvrzVf-FKm74CDAiKnK482gS_E8ks5tGnuigaJpZM4Nhpzn> .

José Pedro Cordero · Answer 24 · Sat Apr 07 2018 02:59:26 GMT+0800 (China Standard Time)

@dipanjanS it is possible to create a chatbot using some deep learning architecture?

Dipanjan (DJ) Sarkar · Answer 25 · Tue Jun 05 2018 00:52:45 GMT+0800 (China Standard Time)

@samuelxmli Can you please stop spamming the same question everywhere? You have already created two issues\comments. Closing this issue since I have replied on the other thread and soon we will be doing a revised version of this book in Python 3.x