Hi, I am trying to make an cough recognition app after the example in the VIP section.
I managed to gather 1000 samples of coughing sounds from the internet. By my understanding, I have to make segmetations on those sounds, such that every single wav file contains just one sound. I want to do that, because I want to treat the cough sound as a single word sound exactly like in the example. For the moment, in my samples, there are multiple cough sounds from a wav that only has a length of 2-3 seconds.
I want to make a CNN with two outputs, cough and no cough.
For the no_cough samples, I should have random words, sounds , taken from the internet? How many samples should I have for approximately 3000 cough sounds?
Is my approach good? Do you have any tips?

Thank you ,
