karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

hellaswag.py - Github Access to this site has been restricted.

aidando73 opened this issue · comments

Anyone else getting this error?

(myenv) coder@abc-2:~/work/llm.c (master)$ python dev/data/hellaswag.py
Downloading https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_val.jsonl to /home/coder/work/llm.c/dev/data/hellaswag/hellaswag_val.jsonl...
/home/coder/work/llm.c/dev/data/hellaswag/hellaswag_val.jsonl: 1.77kiB [00:00, 13.6MiB/s]
Traceback (most recent call last):
  File "/home/coder/work/llm.c/dev/data/hellaswag.py", line 174, in <module>
    evaluate(args.model_type, args.device)
  File "/home/coder/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/coder/work/llm.c/dev/data/hellaswag.py", line 123, in evaluate
    for example in iterate_examples("val"):
  File "/home/coder/work/llm.c/dev/data/hellaswag.py", line 107, in iterate_examples
    example = json.loads(line)
  File "/home/coder/miniconda3/envs/myenv/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/coder/miniconda3/envs/myenv/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/coder/miniconda3/envs/myenv/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

It seems like I'm getting an error from github:

/home/coder/work/llm.c/dev/data/hellaswag/hellaswag_val.jsonl

<!DOCTYPE html>
<html>
  <head>
    <meta content="origin" name="referrer">
    <title>Forbidden &middot; GitHub</title>
    <style type="text/css" media="screen">
      body {
        background-color: #f1f1f1;
        margin: 0;
      }
      body,
      input,
      button {
        font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
      }
      .container { margin: 30px auto 40px auto; width: 800px; text-align: center; }
      a { color: #4183c4; text-decoration: none; font-weight: bold; }
      a:hover { text-decoration: underline; }
      h1, h2, h3 { color: #666; }
      ul { list-style: none; padding: 25px 0; }
      li {
        display: inline;
        margin: 10px 50px 10px 0px;
      }
      .logo { display: inline-block; margin-top: 35px; }
      .logo-img-2x { display: none; }
      @media
      only screen and (-webkit-min-device-pixel-ratio: 2),
      only screen and (   min--moz-device-pixel-ratio: 2),
      only screen and (     -o-min-device-pixel-ratio: 2/1),
      only screen and (        min-device-pixel-ratio: 2),
      only screen and (                min-resolution: 192dpi),
      only screen and (                min-resolution: 2dppx) {
        .logo-img-1x { display: none; }
        .logo-img-2x { display: inline-block; }
      }
    </style>
  </head>
  <body>

    <div class="container">
      <h1>Access to this site has been restricted.</h1>

      <p>
        <br>
        If you believe this is an error,
        please contact <a href="https://support.github.com">Support</a>.
      </p>

      <div id="s">
        <a href="https://githubstatus.com">GitHub Status</a> &mdash;
        <a href="https://twitter.com/githubstatus">@githubstatus</a>
      </div>
    </div>
  </body>
</html>

Seems like this could be a bot control mechanism on GitHub's side? https://stackoverflow.com/questions/63628053/github-your-access-to-this-site-has-been-restricted-in-go-http-client

It works on my local machine but not my box. Might be something specific to that environment 🤔. I've been able to unblock by downloading these files:

hellaswags = {
    "train": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_train.jsonl",
    "val": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_val.jsonl",
    "test": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_test.jsonl",
}

On my local machine and then copying it over