dotmesh-io / dotmesh

dotmesh (dm) is like git for your data volumes (databases, files etc) in Docker and Kubernetes

Home Page:https://dotmesh.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems viewing notebook diffs

alaric-dotmesh opened this issue · comments

FuseMachines are reporting multiple issues with the notebook diff view breaking, eg https://cloud.dotscience.com/project/229bf822-d3cb-403e-af6d-fde9a5c07218/runs/file/audio-tag%252Fnotebooks%252FCNN.ipynb/cf2d5227-61a0-4e51-85c3-cfd30f867e12?page=0

Seems to be a problem with the S3 interface used to fetch the notebook files; the FSMs are going into the backoff state and we're not sure why.

Gotcha:

time="2019-10-14T09:19:09Z" level=info msg="<transition> 66429de2-487e-4df3-839e-3e1c7a1f7a66 to active waiting (from active waiting, 0.83s ago)"
time="2019-10-14T09:19:09Z" level=info msg="entering active state for 66429de2-487e-4df3-839e-3e1c7a1f7a66"
time="2019-10-14T09:19:09Z" level=debug msg="[activeState] moving to next state" from=fileOutputIO
time="2019-10-14T09:19:09Z" level=info msg="<transition> 66429de2-487e-4df3-839e-3e1c7a1f7a66 to backoff pausing (from active waiting, 0.00s ago)"
2019/10/14 09:19:09 entering backoff state for 66429de2-487e-4df3-839e-3e1c7a1f7a66

Further logging reveals the next stage in the problem:

time="2019-10-14T15:04:40Z" level=error msg="[readFile] Error statting" error="stat /var/lib/dotmesh/mnt/dmfs/66429de2-487e-4df3-839e-3e1c7a1f7a66@3965e857-246c-4e8a-931a-6860000dd80a/__default__/audio-tag/notebooks/CNN.ipynb: no such file or directory" filename=audio-tag/notebooks/CNN.ipynb sourcePath=/var/lib/dotmesh/mnt/dmfs/66429de2-487e-4df3-839e-3e1c7a1f7a66@3965e857-246c-4e8a-931a-6860000dd80a/__default__/audio-tag/notebooks/CNN.ipynb

The directory is there, but no CNN.ipynb inside it.

...is the problem that we're comparing the snapshot that CREATED CNN.ipynb to the snapshot before, when it didn't exist?

It seems so! There's two snapshots mounted:

  • 6ef5e876-bddf-44ee-bbb7-7ed3cbfeae78 HAS CNN.ipynb
  • 3965e857-246c-4e8a-931a-6860000dd80a does not have CNN.ipynb

The diff URL that's showing the 500 error is:

https://cloud.dotscience.com/diff?
base=%2Fv2%2Fprojects%2F229bf822-d3cb-403e-af6d-fde9a5c07218%2Ffiles%2Fsnapshot%2F3965e857-246c-4e8a-931a-6860000dd80a%2Faudio-tag%252Fnotebooks%252FCNN.ipynb%3FshortLivedToken%3DeyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2NvdW50X2lkIjoiOGE5ODA4MjAtOTM5MS00NTdlLTkyMDktZjczZTRjOGU5MjM3IiwiZW1haWwiOiJrc2hpdGl6Lm1hbmRhbEBmdXNlbWFjaGluZXMuY29tIiwiZXhwIjoxNTcxMTUxODc3LCJpYXQiOjE1NzEwNjU0NzcsInN1YiI6IjhhOTgwODIwLTkzOTEtNDU3ZS05MjA5LWY3M2U0YzhlOTIzNyIsInVzZXJuYW1lIjoia3NoaXRpem1hbmRhbCJ9._fiA2Ym4bxd5OtC0Iy5WFj36tHWZPtD3kjzJnaI_61MRTx0q1BhQIQLSNNXA95vJ36qALjMGFFwxN8c47ASJyg
&remote=%2Fv2%2Fprojects%2F229bf822-d3cb-403e-af6d-fde9a5c07218%2Ffiles%2Fsnapshot%2F6ef5e876-bddf-44ee-bbb7-7ed3cbfeae78%2Faudio-tag%252Fnotebooks%252FCNN.ipynb%3FshortLivedToken%3DeyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2NvdW50X2lkIjoiOGE5ODA4MjAtOTM5MS00NTdlLTkyMDktZjczZTRjOGU5MjM3IiwiZW1haWwiOiJrc2hpdGl6Lm1hbmRhbEBmdXNlbWFjaGluZXMuY29tIiwiZXhwIjoxNTcxMTUxODc3LCJpYXQiOjE1NzEwNjU0NzcsInN1YiI6IjhhOTgwODIwLTkzOTEtNDU3ZS05MjA5LWY3M2U0YzhlOTIzNyIsInVzZXJuYW1lIjoia3NoaXRpem1hbmRhbCJ9._fiA2Ym4bxd5OtC0Iy5WFj36tHWZPtD3kjzJnaI_61MRTx0q1BhQIQLSNNXA95vJ36qALjMGFFwxN8c47ASJyg

...and those are the snapshot IDs embedded in that URL.

Suggestions:

  1. We probably don't need to go into backoff state when a file is not found in this case, but it's probably not doing any harm (now we have readFile logging). I'll remove the moving to next state logging but propose the readFile logging for master, and add the same sort of logging in saveFile.

  2. We also need to return a useful 404 error when we request a nonexistant file via S3, to help spot this kind of problem faster in future. I'll see if I can do that too.

  3. We need to think about the desired UX for looking at a diff that creates (or deletes?) a file and implement that!

PR for point 1: #758

PR for point 2: #759

Point 3 is to be resolved by:

  1. Adding HEAD support to the S3 API so we can cheaply check for nonexistant files
  2. Making the nbdiff proxy in the gateway use the S3 HEAD support to detect when a diff will fail in this case (see https://github.com/dotmesh-io/gateway/compare/dotmesh-757?expand=1 for the code implementing this)
  3. Changing the frontend to detect the 400 status code returned by the nbdime proxy and displaying something else (TBD)