ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

Home Page:http://ludwig.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot run/install finetuning colab notebook

dotXem opened this issue · comments

Describe the bug

The demo colab notebook for finetuning Llama-2-7b is crashing at the third runnable cell when trying to import torch.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-3-dac5961b998e>](https://localhost:8080/#) in <cell line: 5>()
      3 import logging
      4 import os
----> 5 import torch
      6 import yaml
      7 

12 frames
[/usr/lib/python3.10/_pyio.py](https://localhost:8080/#) in __init__(self, buffer, encoding, errors, newline, line_buffering, write_through)
   2043                 encoding = "utf-8"
   2044             else:
-> 2045                 encoding = locale.getpreferredencoding(False)
   2046 
   2047         if not isinstance(encoding, str):

TypeError: <lambda>() takes 0 positional arguments but 1 was given

To Reproduce

  1. Go to https://colab.research.google.com/drive/1r4oSEwRJpYKBPM0M0RSh0pBEYK_gBKbe
  2. Connect T4 GPU
  3. Run the first three cells
  4. Last cell should fail with the error message

Expected behavior
It should work!

Environment (please complete the following information):

(not sure if relevant)

Hi @dotXem! Thanks for reporting the issue - I can confirm that I'm able to repro it with the steps you've provided. Let me get back to you with a root cause and fix soon! Apologies that this didn't work as expected out of the box.

@dotXem I've found the issue and I've updated the notebook(s) on the Ludwig README including the one you're trying - are you able to give it a quick run through to see if the issue is fixed?

For context, it seems like the way we were setting UTF8 encoding as the default wasn't interplaying nicely with torch 2.1, and it seems like we weren't using the recommended way. I just updated it to use the preferred method and it seems to work well.

This is what I changed

Current:

import locale; locale.getpreferredencoding = lambda: "UTF-8"

New:

import locale; locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')

Let me know how it goes!