GPTBigCode Support?

Question

GPTBigCode Support?

JustinMeans opened this issue a year ago · comments

Out of sheer curiosity I tried to export bigcode/starcoder to CoreML and got the following error after downloading the weights:
"gpt_bigcode is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'gpt2', 'gpt_neo', 'levit', 'm2m_100', 'marian', 'mobilebert', 'mobilevit', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos']

I understand GPTBigCode is an optimized GPT2 Model with support for Multi-Query Attention.
https://huggingface.co/docs/transformers/model_doc/gpt_bigcode

Python isn't my strong suit but I just wanted to flag this here. Would running Starcoder on CoreML even be feasible or is it too large?

JustinMeans · Answer 1 · Fri May 05 2023 04:36:26 GMT+0800 (China Standard Time)

I attempted to patch features.py by adding the following object (just copied the same spec as GPT2), and got pretty far through the conversion process which ran for around an hour.
"gpt_bigcode": supported_features_mapping( "feature-extraction", #"feature-extraction-with-past", "text-generation", #"text-generation-with-past", "text-classification", "token-classification", coreml_config_cls="models.gpt2.GPT2CoreMLConfig", ),
However, toward the end of the conversion, the script failed with the following error:

Some weights of the model checkpoint at bigcode/starcoder were not used when initializing GPTBigCodeModel: ['lm_head.weight']
- This IS expected if you are initializing GPTBigCodeModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTBigCodeModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 1.12.0
Overriding 1 configuration item(s)
	- use_cache -> False
/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:573: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if batch_size <= 0:
Skipping token_type_ids input
Patching PyTorch conversion 'full' with <function GPT2CoreMLConfig.patch_pytorch_ops.<locals>._fill at 0x7fd7ea3915e0>
Converting PyTorch Frontend ==> MIL Ops:   2%|█▏                                                                           | 32/1976 [00:00<00:13, 139.17 ops/s]
Traceback (most recent call last):
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/__main__.py", line 166, in main
    convert_model(
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
    mlmodel = export(
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/convert.py", line 680, in export
    return export_pytorch(preprocessor, model, config, quantize, compute_units)
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/convert.py", line 553, in export_pytorch
    mlmodel = ct.convert(
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/_converters_entry.py", line 492, in convert
    mlmodel = mil_convert(
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 285, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
    return load(*args, **kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 63, in load
    return _perform_torch_convert(converter, debug)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 102, in _perform_torch_convert
    prog = converter.convert()
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 284, in convert
    convert_nodes(self.context, self.graph)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 88, in convert_nodes
    add_op(context, node)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4130, in masked_fill
    res = mb.select(cond=mask, a=value, b=x, name=node.name)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 182, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/builder.py", line 166, in _add_op
    new_op = op_cls(**kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/operation.py", line 187, in __init__
    self._validate_and_set_inputs(input_kv)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/operation.py", line 496, in _validate_and_set_inputs
    self.input_spec.validate_inputs(self.name, self.op_type, input_kvs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/input_type.py", line 137, in validate_inputs
    raise ValueError(msg)
ValueError: In op, of type select, named input.1, the named input `b` must have the same data type as the named input `a`. However, b has dtype int32 whereas a has dtype fp32.

Will keep investigating as perhaps it's a PyTorch / Python version issue.

Pedro Cuenca · Answer 2 · Fri May 05 2023 17:00:20 GMT+0800 (China Standard Time)

I got the same problem, I don't think it's a versioning issue. Looking into it :)

Matthijs Hollemans · Answer 3 · Mon May 08 2023 17:15:23 GMT+0800 (China Standard Time)

Isn't this model going to be way too big to fit into a protobuf file (max size 2 GB)?

Pedro Cuenca · Answer 4 · Tue May 09 2023 17:00:17 GMT+0800 (China Standard Time)

Apparently the 2GB limitation is resolved in macOS somehow, see for example this section: https://github.com/apple/ml-stable-diffusion#-converting-models-to-core-ml

I've tested a couple of large language models and they seem to work on macOS too.

Pedro Cuenca · Answer 5 · Mon Jul 17 2023 14:32:22 GMT+0800 (China Standard Time)

Fixed by #45. Note that you currently need transformers @ main and coremltools 7.0b1.