Check out this really interesting study over at Cognitive Daily, which explores the differences in acoustic and visual processing times. The authors of the study used a very elegant, simple protocol to demonstrate how accurate people are at reporting synchrony and "dis-synchrony."
One side note was that raw auditory processing times are faster than visual processing times. This may have to do with the levels and depth of processing that visual stimuli undergoes, and the amount of information (color, depth, size, position, movement, distance, etc) that must be integrated into a coherent "picture" of the world. I would bet that when auditory information contains more information (such as speech, music, etc) the depth of processing is also increased, and reaction times are a bit longer. Interestingly, the conduction velocity in humans appears to be quite a bit slower than in sonar-dependent dolphins, where huge auditory fibers appear to have evolved for especially rapid conduction (1).
(1) Ridgway, S. H., Bullock, T. H., Carder, D. A., Seeley, R. L., and Galam- bos, R. Auditory brainstem response in dolphins. Proceedings of the National Academy of Sciences, 1981, 78: 1943-47
This is completely anecdotal and unscientific, but my experience in doing VAPP (video-audio post production) work was that I needed a two-frame (about 67 ms) offset between sound and picture to be really certain that the sync was off, at least for typical match-the-lips-to-the-words stuff. For something more "hard-edged", like a drummer hitting a cymbal, I could spot a sync offset of about 1-1.5 frames.
Better engineers than I could spot a 1-frame offset on any program material.
It's way easier to detect a problem with synchronization between two audio recordings than it is for an audio recording and a video recording. With two audio tapes, the drums will start to flam obviously at no more than 10 ms of offset.