The fastai library simplifies training fast and accurate neural nets using modern best practices. See the fastai website to get started. The library is based on research into deep learning best practices undertaken at fast.ai
, and includes “out of the box” support for vision
, text
, tabular
, and collab
(collaborative filtering) models.
For now, the library can be installed via git
. However, later it will be available on pypi
:
Grab data:
See audio extensions:
Read files:
Read audio data and visualize a tensor:
fastaudio has a AudioConfig class which allows us to prepare different settings for our dataset. Currently it has:
Voice module is the most suitable because it contains human voices.
Turn data into spectrogram and crop signal:
Create a pipeline and see the result:
As usual, prepare a datalaoder:
item_tfms = list(ResizeSignal(1000), aud2spec)
get_y = function(x) substring(x$name[1],1,1)
aud_digit = DataBlock(blocks = list(AudioBlock(), CategoryBlock()),
get_items = get_audio_files,
splitter = RandomSplitter(),
item_tfms = item_tfms,
get_y = get_y)
dls = aud_digit %>% dataloaders(source = path_dig, bs = 64)
dls %>% show_batch(figsize = c(15, 8.5), nrows = 3, ncols = 3, max_n = 9, dpi = 180)
We will use a pretrained ResNet model. However, the channel number and weight dimension have to be changed:
torch = torch()
nn = nn()
alter_learner = function(learn, channels = 1L) {
learn$model[0][0][['in_channels']] %f% channels
learn$model[0][0][0][['weight']] %f% torch$nn$parameter$Parameter(
(learn$model[0][0][0]$weight %>% narrow('[:,1,:,:]'))$unsqueeze(1L)
)
}
learn = Learner(dls, xresnet18(pretrained = FALSE), nn$CrossEntropyLoss(), metrics=accuracy)
nnchannels = dls %>% one_batch() %>% .[[1]] %>% .$shape %>% .[1]
alter_learner(learn, nnchannels)
Find lr
:
And fit
:
epoch train_loss valid_loss accuracy time
0 5.494162 3.295561 0.632812 00:06
1 1.962470 0.236809 0.877604 00:06
2 0.801965 0.174774 0.917969 00:06
3 0.391742 0.208425 0.881510 00:06
4 0.243276 0.149436 0.914062 00:06
5 0.174708 0.134832 0.929688 00:07
6 0.142626 0.127814 0.910156 00:06
7 0.131042 0.120308 0.924479 00:07
8 0.121679 0.126913 0.919271 00:06
9 0.118215 0.114659 0.924479 00:06