ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI

Home Page:https://simpletransformers.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feedback on new documentation hosted on Github Pages

ThilinaRajapakse opened this issue · comments

As Simple Transformers grows, the single page README documentation has gotten quite bloated and difficult to use. Because of this, I've decided that it's time (if not a little late already) to move the documentation to a more user-friendly Github Pages hosted website at the link below.

https://thilinarajapakse.github.io/simpletransformers/

As of now, only the text classification section is live but it should be enough to get an idea of how the final documentation will look like. If you guys have any feedback, ideas, concerns, or mistakes/typos to report, I'd love to hear from you. Since it is still being written, incorporating feedback and fixing issues will be much easier at this stage!

I think this is great and it was the best thing to do. Now I would redo the README file to get rid of most of the things and do a clean file that links to some documentation chapters for people who want advanced explanations on some topics.

Great!

Yes, I agree. Once the website is ready, the readme should be trimmed down to the basics only with links to the docs. As a rough idea, I think the setup instructions, a clear link to the docs, some of the minimal starts (not sure about these), acknowledgements, and the contributors' section should be enough.

It would be helpful to have the sample scripts log something to console that we can verify our results against. Currently not sure if my setup is working because I don't know what values to expect.

Good point. I'll add the outputs to the scripts so that users can check against them. The outputs probably won't be correct though. I'll also add the links to the medium articles as they are real-world examples with verifiable results.

Hello sir, I am facing trouble while running the code for convAI on google colab.
I am unable to run model.train_model() .The root being CUDA out of memory

I think it would be great if you also add a few words regarding unbalanced datasets. I'm new and I would like to understand if my dataset for multi-class classification needs to be balanced or not. Thank you!

Hello sir, I am facing trouble while running the code for convAI on google colab.
I am unable to run model.train_model() .The root being CUDA out of memory

You can try lowering the train_batch_size.

P.S. Please make your comment on a related issue (or a new issue if no related issue exists)

I think it would be great if you also add a few words regarding unbalanced datasets. I'm new and I would like to understand if my dataset for multi-class classification needs to be balanced or not. Thank you!

Thank you for your suggestion! While I agree that it will be useful, that sort of information is generic to deep learning and not specific to Simple Transformers. Because of this, I feel that adding this kind of information is going to make the whole thing too complicated.

Regarding unbalanced datasets, it really depends on a lot of factors. Generally speaking, if your classes can be clearly differentiated and you don't have too many labels, you can usually get away with unbalanced data. If one or more of the classes only have a handful of samples, the model might not learn to predict those. One way to deal with such issues is to use class weights as described here.

@ThilinaRajapakse: Oh I'm sorry. this is exactly what I needn't but I was not looking for the right term! Thank you!!

Nothing to be sorry about, we've all been there! 🤷‍♂️

@ThilinaRajapakse any reason not to use sphinx and readthedoc?

There's no objective reason. But, subjectively and in no particular order,

  • I don't want to deal with sphinx
  • Jekyll seems to have the best support on GitHub pages
  • sphinx + readthedocs looks a little dated 🤷‍♂️

Hey! It'd be helpful for the installation page to specify how to do a minimal isolated install (for a container or GitHub Actions), ideally without using anaconda.

This is specifically for a forward-propagation-only workflow that will run CPU only and only for a handful of inputs, so dealing with GPU/drivers/etc isn't important, and having a quick install is important as the environment needs to be recreated from scratch each go.

Hey. Not sure if this is the right space or should raise issue elsewhere. In the new docs on the site, 'Configuring the classification model' section needs a small correction. For the arguments lazy_text_a_column, lazy_text_b_column, the description should read "for lazy loading sentence pair datasets" instead of "single sentence datasets", if i'm not mistaken.

I'm trying to contribute to the docs on the github pages, but struggling to figure out how to render them locally to see my changes, I think the final version of the readme(w the docs removed) should contain the steps to render the docs locally

I'm trying to contribute to the docs on the github pages, but struggling to figure out how to render them locally to see my changes, I think the final version of the readme(w the docs removed) should contain the steps to render the docs locally

I figured how to do it, if you want can open a PR for adding the instructions to the readme @ThilinaRajapakse

@pablonm3 I would like to contribute to the docs too. How would I do it?

I'm trying to contribute to the docs on the github pages, but struggling to figure out how to render them locally to see my changes, I think the final version of the readme(w the docs removed) should contain the steps to render the docs locally

I figured how to do it, if you want can open a PR for adding the instructions to the readme @ThilinaRajapakse

That would be great! I agree that it's a little confusing. My web development skills are pretty mediocre so I've had trouble setting it up myself. 😅

We can put it in a proper contributions guideline later on.

I'm trying to contribute to the docs on the github pages, but struggling to figure out how to render them locally to see my changes, I think the final version of the readme(w the docs removed) should contain the steps to render the docs locally

I figured how to do it, if you want can open a PR for adding the instructions to the readme @ThilinaRajapakse

That would be great! I agree that it's a little confusing. My web development skills are pretty mediocre so I've had trouble setting it up myself. 😅

We can put it in a proper contributions guideline later on.

@ThilinaRajapakse I plan to write a small guide for how to edit docs, do you think it should be included in the repo's readme or in the Jekyll doc?

I think it's better to have it in the repo as that's the place where people will look when they want to contribute. I'm open to other suggestions though.

I think the same

@pablonm3 I would like to contribute to the docs too. How would I do it?

@aakashdusane I just opened a PR w the instructions: #605

@ThilinaRajapakse what are the remaining tasks for getting rid of the docs from the readme?

  • Multi-Modal Classification
  • Language Generation
  • ConvAI

I think that's all the tasks.

Thanks @ThilinaRajapakse, I'll try to work on a PR this week to start moving some of the docs that are left

Sounds good, thanks. Just a heads up, but I might make my own changes to any submitted docs.

commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.