The Great cockatoo count is an annual census to collect data about the number and distribution of black cockatoos. It is a manual process involving hundreds of volunteers. The author set out to determine if it was possible to use machine learning to identify and locate Carnaby cockatoos?
To build a machine learning model you need data, lots of data. Fortunately during a previous fauna study, a dozen Caranaby cockatoos landed in front of my trail camera to drink from a pond. They stayed there for a good 10 minutes so I had a enough Audio to get started.
I needed to have a look at the calls to see if they were unique enough for machine learning to be able to identify them. I stripped the audio from the video and manually cut out some of the calls. In Python a Fourier transformation was done to isolate the dominant frequencies, and graphically display the frequency footprint of each call.
By looking at the sound graphs, a clear pattern could be seen in the rise and fall of frequencies. This was encouraging enough for me to believe machine learning could distinguish the Carnaby calls from other birds.
I took the calls I had and used a data augmentation process to increase the number of samples.
To create my negative audio samples (audio without a Carnaby call) I used audio from other trail cameras as it had the ambient bush noise. I also used liberated some audio of other bird species from Youtube.
I split my positive audio samples (samples containing a call) into training and test subsets and trained a Keras model. When the model was tested using the test data subset, the model showed a 97% accuracy.
The next question was can this model be implemented in a real-world scenario? for example, could a device be strapped to a tree to collect calls and triangulate the location of the bird?
Unfortunately, my model wasn’t designed to work with a constant stream of audio data, the first challenge I faced was that if I split the live stream audio at set intervals I risked cutting a call in half and missing a detection. To overcome this, I duplicated the audio stream and split each stream every 5 seconds with one stream split being offset by 2.5 seconds. This meant that if one stream split the call (which never lasted longer than 2 seconds) then it would be detected by the other stream. This worked but introduced the possibility of a call not being split by either stream and registering two positive detections. I worked around this by comparing the timestamps of each call and deleting one if there was a match.
I had a working machine learning model, but what I wanted was a device that could be placed in the field to collect data over several weeks. My use of Python really limited me to using a Raspberry Pi as the base. The Pi also had the advantage of having USB ports for three microphones, and USB microphones are relatively inexpensive.
I uploaded the model, program and hooked up the microphone. When I switched it on The device worked! Live audio came in through the microphone was split into chunks, and each chunk was fed into the model. When a call was detected, the LED blinked on.
However, feeding two audio streams into a machine learning model was too much for the processor on the Raspberry pi, and it was struggling to keep up. The calls all started above 2000Hz, so to reduce processor load I only fed data into the model when the frequency rose above 2000Hz.
It wasn’t long before I heard black cockatoos in the park near my house, I grabbed the Pi and a power source, and raced outside. The birds were there for only a couple of minutes, but the device registered and recorded each of their calls.
Enthused by the successful field trial, I purchased two more microphones, now could add a 3 mic array to calculate the direction of each call. The timestamps of the very highest frequency in each mic were used to calculate the direction of the call.
Success! I had a working prototype that identified black cockatoos by their call and recorded the direction the call came from. With two of these devices in the field, I could determine the origin of the bird….somewhat
The direction calculation used an approximated speed of sound which was accurate enough for the small testing area. However, to be accurate over longer distances, the speed of sound would need to be precise. The speed of sound varies constantly depending on temperature, humidity, and barometric pressure. Sensors could easily be added to calibrate the device, but that was beyond the budget and motivation for what was a fun learning exercise.
In addition to improving the accuracy of the direction, it would be interesting to see if individual birds could be identified by their calls. Perhaps, like whales, they have a very slight unique signature to their calls. If so then it may be possible to conduct real-time monitoring of the population including tracking their movements.