Converting EleutherAI/Pythia Models
kendreaditya opened this issue · comments
I was wondering if its possible to support the conversion of the Pythia models to coreml. Naively I ran python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/
which gave me this error:
Original Ouput
python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 4%|█████▏ | 86/2272 [00:00<00:01, 2038.49 ops/s]
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.10/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/homebrew/Cellar/python@3.10/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
main()
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 166, in main
convert_model(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
mlmodel = export(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 687, in export
return export_pytorch(preprocessor, model, config, quantize, compute_units)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 552, in export_pytorch
mlmodel = ct.convert(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py", line 530, in convert
mlmodel = mil_convert(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
proto, mil_program = mil_convert_to_proto(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 286, in mil_convert_to_proto
prog = frontend_converter(model, **kwargs)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
return load(*args, **kwargs)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 63, in load
return _perform_torch_convert(converter, debug)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 102, in _perform_torch_convert
prog = converter.convert()
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 439, in convert
convert_nodes(self.context, self.graph)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
add_op(context, node)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4502, in gather
res = mb.gather_along_axis(x=inputs[0], indices=inputs[2], axis=inputs[1], name=node.name)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 183, in add_op
return cls._add_op(op_cls_to_add, **kwargs)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/builder.py", line 182, in _add_op
new_op.type_value_inference()
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py", line 253, in type_value_inference
output_types = self.type_inference()
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/scatter_gather.py", line 312, in type_inference
assert self.x.shape[i] == self.indices.shape[i]
AssertionError
I tried bypassing this error by commenting the line out, which results in sometimes a memory leak (I think, as my memory usage goes to 60 GB), but I was able to export it one time but it fails the performance report in xcode. When commenting out the line I get this output:
Check bypassed Output
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/2272 [00:00<?, ? ops/s](is13, 1, 2048, 64) (is11, 1, is12, 64)
(is14, 1, 2048, 64) (is11, 1, is12, 64)
(is53, 1, 2048, 64) (is51, 1, is52, 64)
(is54, 1, 2048, 64) (is51, 1, is52, 64)
Converting PyTorch Frontend ==> MIL Ops: 11%|███████████████████████████████▎ | 250/2272 [00:00<00:00, 2499.35 ops/s](is107, 1, 2048, 64) (is105, 1, is106, 64)
(is108, 1, 2048, 64) (is105, 1, is106, 64)
(is161, 1, 2048, 64) (is159, 1, is160, 64)
(is162, 1, 2048, 64) (is159, 1, is160, 64)
Converting PyTorch Frontend ==> MIL Ops: 23%|████████████████████████████████████████████████████████████████▎ | 513/2272 [00:00<00:00, 2575.44 ops/s](is215, 1, 2048, 64) (is213, 1, is214, 64)
(is216, 1, 2048, 64) (is213, 1, is214, 64)
Converting PyTorch Frontend ==> MIL Ops: 34%|████████████████████████████████████████████████████████████████████████████████████████████████▋ | 771/2272 [00:00<00:00, 2514.44 ops/s](is269, 1, 2048, 64) (is267, 1, is268, 64)
(is270, 1, 2048, 64) (is267, 1, is268, 64)
(is323, 1, 2048, 64) (is321, 1, is322, 64)
(is324, 1, 2048, 64) (is321, 1, is322, 64)
Converting PyTorch Frontend ==> MIL Ops: 45%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 1023/2272 [00:00<00:00, 2458.22 ops/s](is377, 1, 2048, 64) (is375, 1, is376, 64)
(is378, 1, 2048, 64) (is375, 1, is376, 64)
(is431, 1, 2048, 64) (is429, 1, is430, 64)
(is432, 1, 2048, 64) (is429, 1, is430, 64)
Converting PyTorch Frontend ==> MIL Ops: 56%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 1274/2272 [00:00<00:00, 2413.73 ops/s](is485, 1, 2048, 64) (is483, 1, is484, 64)
(is486, 1, 2048, 64) (is483, 1, is484, 64)
(is539, 1, 2048, 64) (is537, 1, is538, 64)
(is540, 1, 2048, 64) (is537, 1, is538, 64)
Converting PyTorch Frontend ==> MIL Ops: 67%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 1516/2272 [00:00<00:00, 2176.52 ops/s](is593, 1, 2048, 64) (is591, 1, is592, 64)
(is594, 1, 2048, 64) (is591, 1, is592, 64)
Converting PyTorch Frontend ==> MIL Ops: 76%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 1738/2272 [00:00<00:00, 2144.58 ops/s](is647, 1, 2048, 64) (is645, 1, is646, 64)
(is648, 1, 2048, 64) (is645, 1, is646, 64)
(is701, 1, 2048, 64) (is699, 1, is700, 64)
(is702, 1, 2048, 64) (is699, 1, is700, 64)
Converting PyTorch Frontend ==> MIL Ops: 87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 1969/2272 [00:00<00:00, 2149.72 ops/s](is755, 1, 2048, 64) (is753, 1, is754, 64)
(is756, 1, 2048, 64) (is753, 1, is754, 64)
(is809, 1, 2048, 64) (is807, 1, is808, 64)
(is810, 1, 2048, 64) (is807, 1, is808, 64)
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 2271/2272 [00:01<00:00, 2253.81 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 36.95 passes/s]
Running MIL default pipeline: 14%|██████████████████████████████████████████▋ | 9/63 [00:00<00:03, 17.14 passes/s]/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:262: UserWarning: Output, '2680', of the source model, has been renamed to 'var_2680' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline: 38%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 24/63 [00:01<00:01, 28.21 passes/s](1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
Running MIL default pipeline: 59%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 37/63 [00:01<00:00, 28.56 passes/s](1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
Running MIL default pipeline: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 58/63 [00:03<00:00, 12.22 passes/s](1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:04<00:00, 14.28 passes/s]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 190.00 passes/s]
Any ideas?
`huggingface-cli env`
Copy-and-paste the text below in your GitHub issue.
- huggingface_hub version: 0.15.1
- Platform: macOS-13.4-arm64-arm-64bit
- Python version: 3.10.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /Users/kendreaditya/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: osxkeychain
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.0.0
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- numpy: 1.24.2
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /Users/kendreaditya/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /Users/kendreaditya/.cache/huggingface/assets
- HF_TOKEN_PATH: /Users/kendreaditya/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
`pip freeze`
appnope==0.1.3
asttokens==2.2.1
attrs==23.1.0
backcall==0.2.0
cattrs==23.1.2
certifi==2023.5.7
charset-normalizer==3.1.0
comm==0.1.3
coremltools==7.0b1
debugpy==1.6.7
decorator==5.1.1
einops==0.6.1
exceptiongroup==1.1.1
executing==1.2.0
-e git+https://github.com/huggingface/exporters.git@d83cf6268fcaf1c6259511ddbd32dc9dcd79bc03#egg=exporters
fancycompleter==0.9.1
filelock==3.12.2
fsspec==2023.6.0
huggingface-hub==0.15.1
idna==3.4
ipykernel==6.23.2
ipython==8.14.0
jedi==0.18.2
Jinja2==3.1.2
jupyter_client==8.2.0
jupyter_core==5.3.1
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
mpmath==1.3.0
nest-asyncio==1.5.6
networkx==3.1
numpy==1.24.2
packaging==23.1
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==3.6.0
prompt-toolkit==3.0.38
protobuf==3.20.1
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyaml==23.5.9
Pygments==2.15.1
pyrepl==0.9.0
python-dateutil==2.8.2
PyYAML==6.0
pyzmq==25.1.0
regex==2023.6.3
requests==2.31.0
six==1.16.0
stack-data==0.6.2
sympy==1.12
tokenizers==0.13.3
torch==2.0.0
tornado==6.3.2
tqdm==4.65.0
traitlets==5.9.0
transformers==4.29.2
typing_extensions==4.6.3
urllib3==2.0.3
wcwidth==0.2.6
wmctrl==0.4
Hi @kendreaditya! As we discussed via email, conversion worked for me. Thanks for sending your environment details, I'll try to identify where the incompatibility would be.
No problem, thank you for looking into it. It seem I might have gotten it working with downgrading to transformers==4.26.1
Any thoughts?
certifi==2023.5.7
charset-normalizer==3.1.0
coremltools==6.2
-e git+https://github.com/huggingface/exporters.git@d83cf6268fcaf1c6259511ddbd32dc9dcd79bc03#egg=exporters
filelock==3.12.2
fsspec==2023.6.0
huggingface-hub==0.15.1
idna==3.4
Jinja2==3.1.2
MarkupSafe==2.1.3
mpmath==1.3.0
networkx==3.1
numpy==1.25.0
packaging==23.1
protobuf==3.20.3
PyYAML==6.0
regex==2023.6.3
requests==2.31.0
safetensors==0.3.1
sympy==1.12
tokenizers==0.13.3
torch==1.13.1
tqdm==4.65.0
transformers==4.26.1
typing_extensions==4.6.3
urllib3==2.0.3
Original Ouput
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 1.13.1
Overriding 1 configuration item(s)
- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:488: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:260: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:212: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:219: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 2262/2263 [00:01<00:00, 1731.14 ops/s]
Running MIL Common passes: 10%|███████████████ | 4/40 [00:01<00:16, 2.12 passes/s]/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '2655', of the source model, has been renamed to 'var_2655' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:09<00:00, 4.44 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00, 5.80 passes/s]
Validating Core ML model...
-[✓] Core ML model output names match reference model ({'last_hidden_state'})
- Validating Core ML model output "last_hidden_state":
-[✓] (1, 128, 2048) matches (1, 128, 2048)
-[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.10/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/homebrew/Cellar/python@3.10/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
main()
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/exporters/src/exporters/coreml/__main__.py", line 166, in main
convert_model(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/exporters/src/exporters/coreml/__main__.py", line 70, in convert_model
validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/latest-venv/exporters/src/exporters/coreml/validate.py", line 220, in validate_model_outputs
raise ValueError(
ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.001491546630859375
Yes, I observed the same, it works using transformers==4.27.3
but not 4.28.1
. We'll check it out! Meanwhile, you can downgrade transformers as you did, or use the conversion Space which I just upgraded with the latest version of exporters
.
Thanks for your report!
Sounds good, thank you for you help!