One of the hottest philosophies of robotics, AR, and VR is that of the "Uncanny Valley" - that strange intermediate stage of artificial intelligence, where a humanoid may look near-identical to a human, yet is more creepy than a robotic looking AI. No clue what I mean? Think of how cute Wall-E is; he is obviously a robot, makes adorable non-humanlike noises, and saves the planet! How endearing, right? 

Now check out this clip of a singing robot. Named "Tara," this robot has a creepy combination of human-like features and a human-like voice that makes her fall into the "uncanny valley" in our minds. We are comfortable with a robot like Wall-E, but Tara here, well - she is just flat out disturbing. 

I would argue that if she did not sing at all, she would not dip into this valley of our minds. In fact, she would more resemble a mannequin at Disney World, which we accept to be non-human with ease so they are not as off-putting. However, Tara's speech does not exhibit the natural timbre of a true human voice, nor do her sound waves hit the ear in the same way as those from our vocal chords. Thus, her creepy performance revolts the audience, rather than entertains.

As many thought leaders point out in this post, there are obstacles to overcome before AI, VR, and AR are on the mass market. This valley in our minds is a serious obstacle that requires more advanced technology for us to cross that bridge from revulsion, to acceptance. One way we can do that is with audio. A VR experience may be uncannily realistic visually, but our brains will never fully accept the realism while our ears are not engaged with natural sound. When we see something so realistic, our brains want to accept it is real - and have a hard time doing so if the accompanying sound does not measure up. Binaural and 3D audio might be the missing piece of the puzzle to get as close to a realistic artificial experience as possible, without running the risk of creeping out the audience.