Normally, when we are loading data in pytorch, we do the followings
for x, y in dataloaders:
# Do something
However, in this dataset called MusicNet, they declare their own dataset and dataloader like this
train_set = musicnet.MusicNet(root=root, train=True, download=True, window=window)#, pitch_shift=5, jitter=.1)
test_set = musicnet.MusicNet(root=root, train=False, window=window, epoch_size=50000)
train_loader = torch.utils.data.DataLoader(dataset=train_set,batch_size=batch_size,**kwargs)
test_loader = torch.utils.data.DataLoader(dataset=test_set,batch_size=batch_size,**kwargs)
Then they load the data like this
with train_set, test_set:
for i, (x, y) in enumerate(train_loader):
# Do something
Question 1
I don't understand why the code doesn't work without the line with train_set, test_set.
Question 2
Also, how do I access the data?
I tried
train_set.access(2560,0)
and
with train_set, test_set:
x, y = train_set.access(2560,0)
They either give me an error message like
KeyError Traceback (most recent call last) in ----> 1 train_set.access(2560,0)
/workspace/raven_data/AMT/MusicNet/pytorch_musicnet/musicnet.py in access(self, rec_id, s, shift, jitter) 106 107 if self.mmap: --> 108 x = np.frombuffer(self.records[rec_id][0][ssz_float:int(s+scaleself.window)*sz_float], dtype=np.float32).copy() 109 else: 110 fid,_ = self.records[rec_id]
KeyError: 2560
or giving me an empty x and y