issue with CropSignal in Audio notebook
turgut090 opened this issue · comments
Hi. The notebooks are very helpful! However, I could not find the following function in this notebook:
CropSignal(1000)
Is it changed to CropTime(1000)
?
However, in this case I get:
>>> get_y = lambda x: x.name[0]
>>>
>>> aud_digit = DataBlock(blocks=(AudioBlock, CategoryBlock),
... get_items=get_audio_files,
... splitter=RandomSplitter(),
... item_tfms = item_tfms,
... get_y=get_y)
>>>
>>> dls = aud_digit.dataloaders(path_dig, bs=64)
>>> dls.show_batch(max_n=3)
RuntimeError: stack expects each tensor to be equal size, but got [1, 128, 1076] at entry 0 and [1, 128, 431] at entry 1
Look like the fastai audio library has been redone/moved around. I'll look towards updating this notebook here soon
ResizeSignal
is the new name as the signal isn't always cropped to the time passed. If you were to apply CropSignal(1000)
to a 800ms audio clip it would pad the signal to make it 1000ms
Interesting. one_batch
works but show_batch
not.
>>> dls.show_batch(max_n=3)
AttributeError: 'Axes' object has no attribute 'get_array'
>>> dls.one_batch()
(AudioSpectrogram([[[[ -8.3181, -10.2000, -12.6481, ..., -22.9956, -22.1275, -21.1268],
[-14.6939, -9.6992, -5.7920, ..., -26.1172, -20.2113, -19.8419],
[-11.3864, -9.1018, -6.8128, ..., -28.3644, -22.6686, -19.2496],
...,
[-37.3387, -38.2659, -40.5929, ..., -23.8105, -25.2935, -26.5386],
[-38.1808, -39.1065, -41.6412, ..., -22.2388, -24.5692, -27.2292],
[-43.2318, -43.8571, -45.6337, ..., -26.2471, -28.8644, -31.5725]]],
[[[ -0.6156, -1.5698, -7.2658, ..., -8.8658, -3.9938, -2.8227],
[ 1.2372, -1.2855, -4.1364, ..., -7.8216, -3.2151, -1.4409],
[ 1.0344, 1.8436, -1.6778, ..., -4.4782, -2.6898, -2.9348],
...,
[-22.0949, -22.1624, -22.8304, ..., -12.0152, -13.1652, -13.7102],
[-19.2713, -19.8037, -21.3614, ..., -10.6371, -11.5922, -11.7593],
[-20.5059, -20.6659, -21.2461, ..., -9.1825, -11.1106, -12.0782]]],
[[[-18.3703, -19.5959, -24.2047, ..., -14.5802, -12.5391, -11.7561],
[-20.9951, -22.8386, -28.1964, ..., -20.4750, -19.0906, -19.6360],
[-22.2578, -22.9181, -25.7134, ..., -25.1796, -42.6287, -37.7331],
...,
[-49.6489, -49.1947, -48.2453, ..., -49.5460, -49.6013, -49.4587],
[-49.0422, -48.9614, -49.0389, ..., -51.5584, -51.6974, -51.9679],
[-50.9010, -51.6192, -53.4695, ..., -56.1152, -56.0454, -55.8673]]],
...,
[[[ -6.9652, -8.0364, -11.4502, ..., -33.1305, -35.5177, -42.4881],
[ -9.6232, -9.7865, -10.7225, ..., -33.2689, -32.6359, -32.6976],
[-24.9720, -14.7013, -12.5733, ..., -35.2582, -35.6559, -35.3164],
...,
[-20.1046, -20.1812, -20.5957, ..., -44.7309, -42.6575, -41.8783],
[-22.2047, -21.5025, -20.6947, ..., -51.7986, -52.2079, -52.0918],
[-30.9515, -29.7638, -28.4658, ..., -50.2453, -49.8501, -49.8834]]],
[[[-49.3383, -46.1890, -40.8572, ..., -17.6733, -12.8186, -11.3199],
[-45.6537, -42.6323, -41.6800, ..., -17.2358, -11.9648, -11.0973],
[-36.1794, -37.3899, -40.3495, ..., -14.3630, -11.4240, -8.9613],
...,
[-37.8178, -37.6950, -37.0713, ..., 5.2818, 5.9530, 6.1184],
[-41.7684, -42.2734, -43.5165, ..., 5.7649, 5.0034, 4.2097],
[-44.3601, -44.7549, -45.3909, ..., 2.0408, 1.5633, 0.9212]]],
[[[ -1.4403, -2.8520, -7.0391, ..., -19.8113, -15.4940, -14.0321],
[ 0.8504, -0.3509, -4.8551, ..., -18.1012, -14.2483, -13.2409],
[ 9.0323, 7.8041, 4.4682, ..., -15.9741, -10.1831, -8.4963],
...,
[-44.5452, -44.8842, -45.5113, ..., -3.3812, -4.8031, -5.7803],
[-48.4126, -48.5500, -48.6792, ..., -4.9052, -6.3405, -7.0175],
[-47.1862, -47.6854, -48.7584, ..., -2.8045, -4.2454, -5.1359]]]],
device='cuda:0'), TensorCategory([1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1,
0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], device='cuda:0'))
Yes apologies, we're on to that one as well: fastaudio/fastaudio#38