GAE debugging
baidut opened this issue · comments
App.yaml
Html responses
The server encountered an error and could not complete your request.
Error: Server Error
The server encountered an error and could not complete your request.
Please try again in 30 seconds.
Check log
[log](https://cloud.google.com/appengine/docs/flexible/custom-runtimes/build#logging
https://console.cloud.google.com/logs/viewer?project=paq2piq&folder&organizationId&minLogLevel=0&expandAll=false×tamp=2020-03-12T02:48:17.580000000Z&customFacets=&limitCustomFacetWidth=true&dateRangeStart=2020-03-12T01:48:17.832Z&dateRangeEnd=2020-03-12T02:48:17.832Z&interval=PT1H&resource=gae_app&logName=projects%2Fpaq2piq%2Flogs%2Fstderr&logName=projects%2Fpaq2piq%2Flogs%2Fappengine.googleapis.com%252Fstdout&logName=projects%2Fpaq2piq%2Flogs%2Fappengine.googleapis.com%252Fstderr&logName=projects%2Fpaq2piq%2Flogs%2Fappengine.googleapis.com%252Fnginx.request&logName=projects%2Fpaq2piq%2Flogs%2Fappengine.googleapis.com%252Frequest_log&scrollTimestamp=2020-03-12T02:45:13.712809000Z)
Set the right entry point, main.py by default
ModuleNotFoundError: No module named 'main'"
entrypoint: gunicorn -b :$PORT main:app.server
app.yaml memory_gb
seem to be helpless for the standard version
CPU version
pytorch/pytorch#26340 (comment)
numpy==1.17.2
pandas==0.25.2
-f https://download.pytorch.org/whl/torch_stable.html
torch==1.3.1+cpu
https://pytorch.org/get-started/locally/
pip install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
Build failure
OSError: [Errno 12] Cannot allocate memory
use cpu version
https://cloud.google.com/appengine/docs/standard/#instance_classes
Set enough memeory
instance_class: F4_1G
"OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k"
Exceeded soft memory limit of 256 MB with 663 MB after servicing 0 requests total. Consider setting a larger instance class in app.yaml.
Avoid downloading the pretrained model or pretrained imagenet
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
2020-03-11 22:16:20.853 CST
While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.
"Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/checkpoints/resnet18-5c106cde.pth"
Download it first and deploy to GAE
wget -O RoIPoolModel.pth -N https://github.com/baidut/PaQ-2-PiQ/releases/download/v1.0/RoIPoolModel-fit.10.bs.120.pth
Avoid writing to storage
"Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/checkpoints/resnet18-5c106cde.pth"