MIT academic papers soundnet never marked in the video to learn the sound representation (with open -utc行家

The academic | MIT paper SoundNet: learning voice characterization of unlabeled video (with source code) – Sohu technology selected from the MIT machine of the heart: Li before the Asian MIT compiler in wrote an unlabeled video voice characterization study (SoundNet), recently they open source SoundNet code. Related papers can click on the "reading the original" download. SoundNet code address: Abstract: by effectively utilizing a large number of unlabeled sound data collected from the field, we have learned a wealth of natural sound representation. Using two million unlabeled video, we use the natural synchronization of time and sound to study acoustic representation. The advantage of unlabeled video is that it is possible to obtain a large number of useful data in a limited economic situation. We propose a student-teacher training process, using unlabeled video as a bridge to the future vision recognition model good discriminating visual knowledge migration to form sound. On the basis of the classification of acoustic scene recognition, the performance of our sound representation has been greatly improved. Visual data show that some high-level semantics can be generated automatically in the voice network, even if it is trained without ground truth tags. © this paper compiled by the heart of the machine, reproduced please contact the public number to obtain authorization. ————————————————? Join the heart of machines (full-time reporter Intern): hr@almosthuman submission or seek reports: editor@almosthuman advertising business cooperation: bd@almosthuman &相关的主题文章: