Sound for Video

Discussion in 'Video' started by Ed_Ingold, Dec 5, 2020.

  1. I have been recording sound professionally for nearly 50 years, mostly music, and much of that classical (but "no" is not part of my vocabulary). Video, for me, is a fairly recent addition (20 years), and multi-camera shooting for the last 10 or so. With live performances curtailed by the COVID epidemic, good video with good sound helps both amateur and professional musicians reach a remote audience. I do live concerts and studio sessions, as well as interviews and presentations, each with unique requirements. I typically use 4 microphones, 2 for space and 2 for details, but up to 26 for large ensembles in session.

    There are whole books on recording techniques, audio and video, but seldom both together. Ask any two engineers how to mic a piano or drum set, you will get 5 answers. For starters, I'd like to share things I have learned not found in books.
    • Sound Checks are Important! Digital recording does not tolerate overloads. Set the faders at zero, and set the trim (pre-fader) so that the meters peak at about -20 dB. Peak levels in a performance can exceed that by 12 dB or more. On the other hand, the S/N ratio of digital is so large, you can lift very low levels without excessive noise, other than that present in the room
    • Once the input levels are set properly, you can mix and balance the tracks with the faders for a live production, front-of-house (FOH) or monitoring.
    • Record the pre-fader signals, with each microphone on a separate track. That way if you don't like the live mix, you can change it later.
    • Once the recording begins, don't change the trim levels unless you are getting overloads. It's very hard to track trim changes when mixing in post.
    • Start recording early (at least 15 minutes) and end late (e.g., when the last note dies away, or the audience stops clapping)
    • Leave the recording run continuously for concerts and in session. It's disruptive for the talent to have to ask "are we running" in session, and you may miss something important. Punching in for each take is a relic of recording with tape (or film). With tape, you had 60-90 minutes on a reel, and you had to change reels early to avoid running out. With digital, you can record 50 hours of 8-channel music on a 128 GB card.
    • In lieu of starting and stopping, take notes and log the time code at the start of each take. It's much easier to locate the takes using a non-linear-editor (NLE) from notes (or visually) than to keep track of dozens of short clips. That's true for both audio and video.
    Synchronizing the video and audio can be done several ways. The best way is to record the mix directly in the video, but is often not practical, for issues of distance and mobility. The professional way is to record the same time code on each device. This can be done wired, wirelessly, or by jamming the same time code on each device before starting. With modern cameras and recorders, jamming is good enough for a day's production. The cheapest way (one which I generally use) is to align the camera's sound track and the recorded sound using an NLE, visually and audibly (listen for phasing or echoes). Professional NLE software can automatically align time codes or even audio patterns. The latter can take a very long time, with a high failure rate. Don't use it if you are reluctant to buy green bananas.

    You have to resync each time either the video or audio stream is interrupted. That's another reason to keep them running.
  2. Thanks for the write-up Ed. Very useful.
    Those coming from still photography tend to forget that great sound often is more important than the visual side. Most people are more intolerant of poor sound than poor video.
  3. One more ting. For a one person “do it all” setup, i hear that the new 32 bit float recorders like the Sound Device MixPre series and the Zoom F6 can relieve a lot of stress when controlling the recording levels.
    Do you have experience with those?
  4. I have a Sound Devices (SD) MixPre 10ii with 32F bit recording. It's almost impossible to overload, and the dynamic range is cited at 142 dB. The tracks may seem to be clipped in an editor (e.g., Cubase), but if you load from the memory card and normalize the level you see that the waveform is undamaged. If you use it as an USB I/O device, levels clipped in Cubase permanently truncated. The inputs have compressor/limiter options, which keep damage to a minimum if overloaded. The 10ii and F8n have analog limiters on the preamp stage rather than downstream, which is a desirable feature. The output can be separately limited, to prevent accidentally overloading external devices (e.g., a Shogun 7 or ATEM switcher).

    While I like the MixPre sound, I prefer to use a Zoom F8n in the field. It has more useful features, including power to a the FCR-8 mixing control panel. You can also use an iPhone or iPad to access nearly every setup parameter via BlueTooth. Sound Devices has the Wingman Bluetooth app, which is limited to record/stop and level display. Neither has remote monitoring, so I use a Bluetooth transmitter with Sony phones. I keep session notes referring to the time code of the sound recorder. The remote apps are much easier to see than the tiny LCD screen.

    You are unlikely to overload if you use due diligence setting up, and with limiting, it's seldom catastrophic.

    Last edited: Dec 5, 2020
    invisibleflash and NHSN like this.
  5. duplicate post
  6. All video devices delay the video with respect to real life, including the camera itself. That delay is generally on the order of two frames (ref 30 fps), or about 1/15 second. That can often be ignored for speaking, but it can be obvious for music videos, especially drums and piano. The solution is to delay the audio by the same amount. Sony cameras do this with a menu option - "Live Sync or Lip Sync." (Use Lip Sync for external recording). It's important to insert the sound upstream of further video processing, particularly in a computer. Even though the computer can add 1/2 second or more latency, up to 4-5 seconds for live streaming. If sound is embedded in the input signal, both the sound and video are delayed the same.

    Audio recorded by the camera's microphone is usually of poor quality, and too far from the subject, hence subject to ambient noise and reflections, resulting a hollow sound. However it is useful as a reference when adding an external sound track. It is wise to check the latency of that signal. To do that you need an easily identifiable visual and audible event. The traditional method uses a clapper board, but that's not practical at a concert or event, and disruptive in a recording session. (Movies are shot in a series of 10 to 30 second clips. The main use of a clapper board is to identify the take and time code so they can be sorted and synchronized while editing.) You can measure the delay to the nearest frame while editing. The visual cue can be a person pronouncing a "p" or "b" word, and matching that sound in the audio file. This is usually well into the clip, so you have to be something of a lip reader. You can also use a drum beat, bow touching a violin string, etc. Be imaginative! Sometimes the audio track is missing, which makes things harder but not impossible. My PTZ Optics cameras are mute by design, and the mic/line switches are easily bumped in handling more often than you might wish for.

    Monitor/recorders, including those by Atomos, can insert an external sound feed into the recording and output. The latter is essential when live-streaming, and saves a lot of time when editing multi-camera shoots. Most video switchers have audio inputs which can be mixed into the output. Mixers and switchers often can often delay the sound in millisecond increments. This adjustment is global, so you need consistent performance from cameras and video processors. The Black Magic switchers and Atomos Shogun 7 in my kit do not add latency which I can see, hear or measure.

    Audio/Video latency can have unexpected consequences. If you listen to your own voice with a small delay, you tongue will become glued to the roof of your mouth (figuratively speaking). A delay of 1/2 second or more is highly distracting if it can be heard at the same time as the venue sound (keep those iPhones off!). Live-stream broadcasts add delays of at lest 4 seconds (e.g., Vimeo) ti 30 seconds or more (e.g.,YouTube).

    Churches, probably the most significant users of live video outside of commercial broadcasting, need to monitor the results discreetly. Churches often place TV monitors for overflow areas where there is significant sound bleed from the main body. In those applications you should use direct SDI or HDMI feeds, which don't add latency. NDI is a new technology licensed by NewTech, and uses the same ethernet cable for camera control, audio and video. It also adds delays (to both audio and video) of 400 msec or more, depending on network traffic and configuration of intervening routers. What seems to be simpler and less expensive to implement has put forehead dents in a lot of walls.
    invisibleflash and NHSN like this.
  7. While there are many microphones which attach directly to a video, DSLR, or mirrorless camera and may offer improved sound quality, most are best suited for VLOG applications. They fall short for recording events, music, or even spoken word.
    • The main problem is that distance from the camera, and ideal distance from the microphone rarely coincide. Too far, and room noise and reflections spoil the quality. Too close and production noises (breathing, pops, key clicks, etc) become objectionable. Mic placement is highly subjective.
    • Direct analog connection to a camera is subject to a mismatch of signal levels, and a paucity of meaningful specifications. Plug-on microphones are mostly too hot. When you turn the levels down, the noise level increases, and the signal may still overload (and clip) depending on where the gain settings are in the signal chain.
    • Unbalanced audio and USB cables should be 15' or shorter to minimize hum and dropouts. That said, many situations work with a 15' cable or less.
    • For best quality and flexibility, use balanced cables (TRS or XLR) for mic or line signals. Secondly, look for a digital interface with mic preamps. USB is digital, and Sony has digital interfaces which fit in their smart flash shoe. The best microphones are condensor types, which require 48V phantom power. Most digital interface units have switchable phantom power.
    As a professional, I'm free place microphones and cameras where needed, subject to safety and decorum issues. I still have the problem of getting video and audio connected together.

    I try to avoid attaching cables to a camera as much as possible, especially microphone cables. They're heavy and a trip hazard, which could dump your camera too. Mini and micro connectors are easily damaged. If I need live sound for streaming, I mix it in at a video switcher or recorder. Bluetooth has too much latency to be useful, up to 1/4 second, and it's not consistent. Wireless (~500 MHz) sets have nearly zero latency, but good ones are expensive. You can safely forget garage door frequencies (27 MHz) because of bandwidth congestion. Most of the time, I need an SDI or HDMI cable to connect remote cameras, If that's not practical, there are wireless sets which handle HD video and sound in the $600 to $1700 per pair price range. 4K transceivers cost a lot more.
  8. You should make a YouTube channel on OP.

    Thanks for all the tips and put up some of your video work.

    I work mainly with silent film, but about 20% of my archive is sound film. I still have to figure out something for it as a sound scanner is very $$. All I have is a silent scanner.

    I have to try this one day for the sound films.

    AEO-Light (
    Last edited: Jan 12, 2021
  9. Thank you. I have plenty of samples, all music performances, but I need permission to post anything, and nothing at all involving minors. I will see if I can post a few seconds of a performing adult. I have a private YouTube channel, also a Vimeo account, but the same restrictions apply. What is an "OP"?
  10. My guiding principle for recording classical music is to have a single microphone (or stereo pair) as the main viewpoint. For and orchestra, that's roughly 6' above and behind the conductor. Other microphones are to supplement those sounds or instruments that need reinforcement. With an orchestra, the next step would be a pair of microphones about 8 feet to either side of center to fatten the sound. From that point, you can go crazy and spot sections and section leaders throughout the orchestra. That leads to attempts to synthesize the sound as though the musicians had no say in the balance. That leads to what I call "pop up audio", where the levels are raised on each section on their entrances, while the camera zooms in. I say "Stop, and listen to the music." Not every engineer agrees, and the extreme separation technique seems to sell CD's. My second principle is that you must like what you're recording, in terms of genre and rendering. Underlying these principles is the customer must like it too, so you must learn to hear things like your customer. If you really don't like something, it's best to decline, or say you have another commitment and offer someone you trust as an alternative.

    People speaking almost always need a separate microphone. If you want to keep the mic off-camera, use a short shotgun mic on a boom or low stand, or a wireless lapel mic. It's not really necessary to conceal the mic, so you can have on they can hold. Remind them to turn it on, or better, use one without a switch and control it from the console. If you are recording two or more people for broadcast, they must be isolated as much as possible. For informal interviews, a stereo pair may suffice.

    You will find sounding "natural" takes a lot of planning and experience, not to mention equipment suited to the job.
  11. Sorry it took so long to answer. Lost track of my posts.

    Ed...OP = Opening Post

    Why so much restrictions with what you can share?

    I ran across some transcriptions made with Cedar noise reduction equipment. Pretty impressive. But don't know what he used.
    Last edited: Mar 3, 2021 at 8:23 PM
  12. Nearly all of my work is with juveniles (under 18) or professional musicians who like to maintain tight control over distribution. Perhaps I can get permission to from one of the adults to post a short sample (e.g., 10 sec).

Share This Page