uab-cgds-worthey / DITTO

Variant Deleteriousness prediction tool using AI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ditto pipeline just after CAGI6 project - [merged]

ManavalanG opened this issue · comments

In GitLab by @tkmamidi on Oct 18, 2021, 17:09

Merges CAGI6-version -> master

This is a pipeline to run Ditto predictions for sample/s. We successfully used this to analyze samples from CAGI6 project.

Note: The commit used for CAGI6 challenge pipeline is be97cf5. It was used along with annotated VCFs and exomiser scores obtained from rgp_cagi6 workflow.

Things to check -

  • Check the ReadMe
  • Check if scripts for VEP annotation are present
  • Check if scripts for VEP annotation parsing are present
  • Check if scripts for Ditto filtering and parsing are present

Expanding on this a bit would be helpful as this is one of the first things typically looked at to learn what the repo is about.

This repo uses git submodules (for example, variant_annotation/configs/snakemake_slurm_profile). In order to retrieve them as well, we need to use flag --recurse-submodules here.

How about moving this file to src dir?

Minor: pip libraries are second class citizens in the conda world and they need to be avoided if and when possible. Since black and lz4 are available from conda-forge, I would recommend switching to them.

Minor: Missing version of gpy

Minor: A quick note on why this get removed would be helpful.

Minor: As testing conda env already contains bcftools, we could just use that and skip use of module loading.

Minor: Including conda env used here would be helpful. Applies to rest of the doc as well.

Referring to cagi6-rgp repo as an example would be helpful here for reference purposes. Or, in the next section Cohort level analysis, you could mention that exomiser scores were utilized.

Please refer to [CAGI6-RGP](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/mana/mini_projects/rgp_cagi6) project for filtering and annotation.

You may want to clarify what filtering refers to here so as to avoid the downstream filtering as part of ditto.

@tkmamidi Ready for your eyes :)

Comments marked as minor are good to be taken care of at this point, but if you prefer to do it after this MR is merged, I'm okay with that as well.

Few questions. When do you plan to adopt the following?

  • Auto-formatting python scripts using black
  • Linting using pylint, eslint, markdown lint, etc.

If you prefer to do it post this MR, please create issues to keep track of them.

Commit used for CAGI6 pipeline - be97cf5

I would recommend clarifying this part in MR description. Perhaps something like:

Note: The commit used for CAGI6 challenge pipeline is be97cf5. It was used along with annotated VCFs and exomiser scores obtained from rgp_cagi6 workflow.

In GitLab by @tkmamidi on Oct 21, 2021, 24:14

Commented on README.md line 20

changed this line in version 2 of the diff

In GitLab by @tkmamidi on Oct 21, 2021, 24:14

Commented on predict_variant_score.sh line 1

changed this line in version 2 of the diff

In GitLab by @tkmamidi on Oct 21, 2021, 24:14

Commented on configs/envs/testing.yaml line 22

changed this line in version 2 of the diff

In GitLab by @tkmamidi on Oct 21, 2021, 24:14

Commented on configs/envs/testing.yaml line 21

changed this line in version 2 of the diff

In GitLab by @tkmamidi on Oct 21, 2021, 24:14

Commented on README.md line 70

changed this line in version 2 of the diff

In GitLab by @tkmamidi on Oct 21, 2021, 24:14

Commented on README.md line 73

changed this line in version 2 of the diff

In GitLab by @tkmamidi on Oct 21, 2021, 24:14

Commented on README.md line 125

changed this line in version 2 of the diff

In GitLab by @tkmamidi on Oct 21, 2021, 24:14

added 1 commit

  • 06e3d2e - updating readme and linting

Compare with previous version

In GitLab by @tkmamidi on Oct 21, 2021, 24:17

Frankly, I don't like formatting the scripts but did it anyway for this MR. It makes the script look so confusing. Maybe I'll get adapted to it eventually? only time will tell :D

In GitLab by @tkmamidi on Oct 21, 2021, 24:20

added 1 commit

Compare with previous version

marked the checklist item Check the ReadMe as completed

I see you added this to Readme.md, which is not a bad idea. I was referring to clarifying it in MR description though.

Btw, I modified the "note" in the parent comment for better clarity.

What was the python code formatter used? I don't see some typical modifications that black would make. Hence the question.

Formatting can be subjective sometimes and there are times I don't like the choices that black makes. But they are rather infrequent and so I chose to live with it. If you don't like black, check to see if you like other code formatters out there. In my experience, have some combination of linters and formatters go a long way in making your codebase manageable over time, irrespective of number of code contributors.

Linting using pylint, eslint, markdown lint, etc.

could you comment on these?

added 1 commit

Compare with previous version

In GitLab by @tkmamidi on Oct 21, 2021, 18:02

marked the checklist item Check if scripts for VEP annotation are present as completed

In GitLab by @tkmamidi on Oct 21, 2021, 18:02

marked the checklist item Check if scripts for VEP annotation parsing are present as completed

In GitLab by @tkmamidi on Oct 21, 2021, 18:02

marked the checklist item Check if scripts for Ditto filtering and parsing are present as completed

In GitLab by @tkmamidi on Oct 25, 2021, 13:55

Commented on README.md line 93

we have "activate conda environment" before these steps. Do we still need to mention for every step?

In GitLab by @tkmamidi on Oct 25, 2021, 13:57

Commented on README.md line 116

mentioned in the next section "Cohort level analysis"

In GitLab by @tkmamidi on Oct 25, 2021, 13:57

I'm using pylint for now.

In GitLab by @tkmamidi on Oct 25, 2021, 13:58

added to MR description

In GitLab by @tkmamidi on Oct 25, 2021, 14:00

Commented on configs/envs/environment.yaml line 26

A lot of changes need to be done with tuning. Can I get back to this later? Also, this Readme doesn't talk about this envi or training pipeline. Thoughts?

So all the commands that follow would be run in that conda environment?

Oh good.. How about eslint and markdown lint?

In GitLab by @tkmamidi on Oct 25, 2021, 14:10

Commented on README.md line 93

Yes sir

In GitLab by @tkmamidi on Oct 25, 2021, 14:10

I'll try them after my Quals?

Sure but please add an issue(s) for them.

Could you update that text to match as that in MR description? The current one is a bit confusing (I know I was the source of that confusion 😆 ).

Please file issues as needed to track them. Also, I would suggest creating an empty section for those topics in the readme doc, with just todo as text in them.

A lot of changes need to be done with tuning. Can I get back to this later?

Sounds good, but please file it as an issue.

In GitLab by @tkmamidi on Oct 25, 2021, 15:17

done

Awesome!

PS- #9

In GitLab by @tkmamidi on Oct 25, 2021, 15:28

Commented on README.md line 116

I have the same note in ReadMe

**Note**: The commit used for CAGI6 challenge pipeline is [be97cf5d](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/ditto/-/merge_requests/3/diffs?commit_id=be97cf5dbfcb099ac82ef28d5d8b0919f28aed99). It was used along with annotated VCFs  and exomiser scores obtained from [rgp_cagi6 workflow](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/mana/mini_projects/rgp_cagi6).

I added a suggestion to highlight what I was referring to.

In GitLab by @tkmamidi on Oct 25, 2021, 15:54

Commented on README.md line 133

changed this line in version 5 of the diff

In GitLab by @tkmamidi on Oct 25, 2021, 15:54

added 1 commit

  • 202afa7 - Apply 1 suggestion(s) to 1 file(s)

Compare with previous version

In GitLab by @tkmamidi on Oct 25, 2021, 15:54

Commented on README.md line 116

applied the suggestion

In GitLab by @tkmamidi on Oct 25, 2021, 16:28

Commented on configs/envs/environment.yaml line 26

I have created an issue #10 but mentioning it in ReadMe confuses people as we're only talking about the test envi for predictions.

resolved all threads

marked this merge request as ready

@tkmamidi MR is ready to be merged now 🎉 🎉

In GitLab by @tkmamidi on Oct 25, 2021, 16:43

Awesome!!

In GitLab by @tkmamidi on Oct 25, 2021, 16:43

approved this merge request

In GitLab by @tkmamidi on Oct 25, 2021, 16:43

mentioned in commit bcf8822