huggingface / exporters

Export Hugging Face models to Core ML and TensorFlow Lite

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Converting EleutherAI/Pythia Models

kendreaditya opened this issue · comments

I was wondering if its possible to support the conversion of the Pythia models to coreml. Naively I ran python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/which gave me this error:

Original Ouput
python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   4%|█████▏                                                                                                                                  | 86/2272 [00:00<00:01, 2038.49 ops/s]
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/python@3.10/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 166, in main
    convert_model(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
    mlmodel = export(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 687, in export
    return export_pytorch(preprocessor, model, config, quantize, compute_units)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 552, in export_pytorch
    mlmodel = ct.convert(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py", line 530, in convert
    mlmodel = mil_convert(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 286, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
    return load(*args, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 63, in load
    return _perform_torch_convert(converter, debug)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 102, in _perform_torch_convert
    prog = converter.convert()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 439, in convert
    convert_nodes(self.context, self.graph)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
    add_op(context, node)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4502, in gather
    res = mb.gather_along_axis(x=inputs[0], indices=inputs[2], axis=inputs[1], name=node.name)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 183, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/builder.py", line 182, in _add_op
    new_op.type_value_inference()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py", line 253, in type_value_inference
    output_types = self.type_inference()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/scatter_gather.py", line 312, in type_inference
    assert self.x.shape[i] == self.indices.shape[i]
AssertionError

I tried bypassing this error by commenting the line out, which results in sometimes a memory leak (I think, as my memory usage goes to 60 GB), but I was able to export it one time but it fails the performance report in xcode. When commenting out the line I get this output:

Check bypassed Output
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                                                                                                                                                                                         | 0/2272 [00:00<?, ? ops/s](is13, 1, 2048, 64) (is11, 1, is12, 64)
(is14, 1, 2048, 64) (is11, 1, is12, 64)
(is53, 1, 2048, 64) (is51, 1, is52, 64)
(is54, 1, 2048, 64) (is51, 1, is52, 64)
Converting PyTorch Frontend ==> MIL Ops:  11%|███████████████████████████████▎                                                                                                                                                                                                                                                             | 250/2272 [00:00<00:00, 2499.35 ops/s](is107, 1, 2048, 64) (is105, 1, is106, 64)
(is108, 1, 2048, 64) (is105, 1, is106, 64)
(is161, 1, 2048, 64) (is159, 1, is160, 64)
(is162, 1, 2048, 64) (is159, 1, is160, 64)
Converting PyTorch Frontend ==> MIL Ops:  23%|████████████████████████████████████████████████████████████████▎                                                                                                                                                                                                                            | 513/2272 [00:00<00:00, 2575.44 ops/s](is215, 1, 2048, 64) (is213, 1, is214, 64)
(is216, 1, 2048, 64) (is213, 1, is214, 64)
Converting PyTorch Frontend ==> MIL Ops:  34%|████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                                                                                            | 771/2272 [00:00<00:00, 2514.44 ops/s](is269, 1, 2048, 64) (is267, 1, is268, 64)
(is270, 1, 2048, 64) (is267, 1, is268, 64)
(is323, 1, 2048, 64) (is321, 1, is322, 64)
(is324, 1, 2048, 64) (is321, 1, is322, 64)
Converting PyTorch Frontend ==> MIL Ops:  45%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                                                                                                                            | 1023/2272 [00:00<00:00, 2458.22 ops/s](is377, 1, 2048, 64) (is375, 1, is376, 64)
(is378, 1, 2048, 64) (is375, 1, is376, 64)
(is431, 1, 2048, 64) (is429, 1, is430, 64)
(is432, 1, 2048, 64) (is429, 1, is430, 64)
Converting PyTorch Frontend ==> MIL Ops:  56%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                                                                            | 1274/2272 [00:00<00:00, 2413.73 ops/s](is485, 1, 2048, 64) (is483, 1, is484, 64)
(is486, 1, 2048, 64) (is483, 1, is484, 64)
(is539, 1, 2048, 64) (is537, 1, is538, 64)
(is540, 1, 2048, 64) (is537, 1, is538, 64)
Converting PyTorch Frontend ==> MIL Ops:  67%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                                              | 1516/2272 [00:00<00:00, 2176.52 ops/s](is593, 1, 2048, 64) (is591, 1, is592, 64)
(is594, 1, 2048, 64) (is591, 1, is592, 64)
Converting PyTorch Frontend ==> MIL Ops:  76%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                                  | 1738/2272 [00:00<00:00, 2144.58 ops/s](is647, 1, 2048, 64) (is645, 1, is646, 64)
(is648, 1, 2048, 64) (is645, 1, is646, 64)
(is701, 1, 2048, 64) (is699, 1, is700, 64)
(is702, 1, 2048, 64) (is699, 1, is700, 64)
Converting PyTorch Frontend ==> MIL Ops:  87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                     | 1969/2272 [00:00<00:00, 2149.72 ops/s](is755, 1, 2048, 64) (is753, 1, is754, 64)
(is756, 1, 2048, 64) (is753, 1, is754, 64)
(is809, 1, 2048, 64) (is807, 1, is808, 64)
(is810, 1, 2048, 64) (is807, 1, is808, 64)
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 2271/2272 [00:01<00:00, 2253.81 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 36.95 passes/s]
Running MIL default pipeline:  14%|██████████████████████████████████████████▋                                                                                                                                                                                                                                                                | 9/63 [00:00<00:03, 17.14 passes/s]/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:262: UserWarning: Output, '2680', of the source model, has been renamed to 'var_2680' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  38%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                                                                                                                                        | 24/63 [00:01<00:01, 28.21 passes/s](1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
Running MIL default pipeline:  59%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                                                                           | 37/63 [00:01<00:00, 28.56 passes/s](1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
Running MIL default pipeline:  92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                       | 58/63 [00:03<00:00, 12.22 passes/s](1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:04<00:00, 14.28 passes/s]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 190.00 passes/s]

Any ideas?

`huggingface-cli env`
Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.15.1
- Platform: macOS-13.4-arm64-arm-64bit
- Python version: 3.10.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /Users/kendreaditya/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: osxkeychain
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.0.0
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- numpy: 1.24.2
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /Users/kendreaditya/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /Users/kendreaditya/.cache/huggingface/assets
- HF_TOKEN_PATH: /Users/kendreaditya/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
`pip freeze`
appnope==0.1.3
asttokens==2.2.1
attrs==23.1.0
backcall==0.2.0
cattrs==23.1.2
certifi==2023.5.7
charset-normalizer==3.1.0
comm==0.1.3
coremltools==7.0b1
debugpy==1.6.7
decorator==5.1.1
einops==0.6.1
exceptiongroup==1.1.1
executing==1.2.0
-e git+https://github.com/huggingface/exporters.git@d83cf6268fcaf1c6259511ddbd32dc9dcd79bc03#egg=exporters
fancycompleter==0.9.1
filelock==3.12.2
fsspec==2023.6.0
huggingface-hub==0.15.1
idna==3.4
ipykernel==6.23.2
ipython==8.14.0
jedi==0.18.2
Jinja2==3.1.2
jupyter_client==8.2.0
jupyter_core==5.3.1
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
mpmath==1.3.0
nest-asyncio==1.5.6
networkx==3.1
numpy==1.24.2
packaging==23.1
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==3.6.0
prompt-toolkit==3.0.38
protobuf==3.20.1
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyaml==23.5.9
Pygments==2.15.1
pyrepl==0.9.0
python-dateutil==2.8.2
PyYAML==6.0
pyzmq==25.1.0
regex==2023.6.3
requests==2.31.0
six==1.16.0
stack-data==0.6.2
sympy==1.12
tokenizers==0.13.3
torch==2.0.0
tornado==6.3.2
tqdm==4.65.0
traitlets==5.9.0
transformers==4.29.2
typing_extensions==4.6.3
urllib3==2.0.3
wcwidth==0.2.6
wmctrl==0.4

Hi @kendreaditya! As we discussed via email, conversion worked for me. Thanks for sending your environment details, I'll try to identify where the incompatibility would be.

No problem, thank you for looking into it. It seem I might have gotten it working with downgrading to transformers==4.26.1
Any thoughts?

certifi==2023.5.7
charset-normalizer==3.1.0
coremltools==6.2
-e git+https://github.com/huggingface/exporters.git@d83cf6268fcaf1c6259511ddbd32dc9dcd79bc03#egg=exporters
filelock==3.12.2
fsspec==2023.6.0
huggingface-hub==0.15.1
idna==3.4
Jinja2==3.1.2
MarkupSafe==2.1.3
mpmath==1.3.0
networkx==3.1
numpy==1.25.0
packaging==23.1
protobuf==3.20.3
PyYAML==6.0
regex==2023.6.3
requests==2.31.0
safetensors==0.3.1
sympy==1.12
tokenizers==0.13.3
torch==1.13.1
tqdm==4.65.0
transformers==4.26.1
typing_extensions==4.6.3
urllib3==2.0.3
Original Ouput
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 1.13.1
Overriding 1 configuration item(s)
	- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:488: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:260: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:212: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:219: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 2262/2263 [00:01<00:00, 1731.14 ops/s]
Running MIL Common passes:  10%|███████████████                                                                                                                                        | 4/40 [00:01<00:16,  2.12 passes/s]/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '2655', of the source model, has been renamed to 'var_2655' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:09<00:00,  4.44 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00,  5.80 passes/s]
Validating Core ML model...
	-[✓] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[✓] (1, 128, 2048) matches (1, 128, 2048)
		-[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/python@3.10/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/exporters/src/exporters/coreml/__main__.py", line 166, in main
    convert_model(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/exporters/src/exporters/coreml/__main__.py", line 70, in convert_model
    validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/exporters/src/exporters/coreml/validate.py", line 220, in validate_model_outputs
    raise ValueError(
ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.001491546630859375

Yes, I observed the same, it works using transformers==4.27.3 but not 4.28.1. We'll check it out! Meanwhile, you can downgrade transformers as you did, or use the conversion Space which I just upgraded with the latest version of exporters.

Thanks for your report!

Sounds good, thank you for you help!