Sound for Video

Ed_Ingold · December 5, 2020

I have been recording sound professionally for nearly 50 years, mostly music, and much of that classical (but "no" is not part of my vocabulary). Video, for me, is a fairly recent addition (20 years), and multi-camera shooting for the last 10 or so. With live performances curtailed by the COVID epidemic, good video with good sound helps both amateur and professional musicians reach a remote audience. I do live concerts and studio sessions, as well as interviews and presentations, each with unique requirements. I typically use 4 microphones, 2 for space and 2 for details, but up to 26 for large ensembles in session.

There are whole books on recording techniques, audio and video, but seldom both together. Ask any two engineers how to mic a piano or drum set, you will get 5 answers. For starters, I'd like to share things I have learned not found in books.

Sound Checks are Important! Digital recording does not tolerate overloads. Set the faders at zero, and set the trim (pre-fader) so that the meters peak at about -20 dB. Peak levels in a performance can exceed that by 12 dB or more. On the other hand, the S/N ratio of digital is so large, you can lift very low levels without excessive noise, other than that present in the room
Once the input levels are set properly, you can mix and balance the tracks with the faders for a live production, front-of-house (FOH) or monitoring.
Record the pre-fader signals, with each microphone on a separate track. That way if you don't like the live mix, you can change it later.
Once the recording begins, don't change the trim levels unless you are getting overloads. It's very hard to track trim changes when mixing in post.
Start recording early (at least 15 minutes) and end late (e.g., when the last note dies away, or the audience stops clapping)
Leave the recording run continuously for concerts and in session. It's disruptive for the talent to have to ask "are we running" in session, and you may miss something important. Punching in for each take is a relic of recording with tape (or film). With tape, you had 60-90 minutes on a reel, and you had to change reels early to avoid running out. With digital, you can record 50 hours of 8-channel music on a 128 GB card.
In lieu of starting and stopping, take notes and log the time code at the start of each take. It's much easier to locate the takes using a non-linear-editor (NLE) from notes (or visually) than to keep track of dozens of short clips. That's true for both audio and video.

Synchronizing the video and audio can be done several ways. The best way is to record the mix directly in the video, but is often not practical, for issues of distance and mobility. The professional way is to record the same time code on each device. This can be done wired, wirelessly, or by jamming the same time code on each device before starting. With modern cameras and recorders, jamming is good enough for a day's production. The cheapest way (one which I generally use) is to align the camera's sound track and the recorded sound using an NLE, visually and audibly (listen for phasing or echoes). Professional NLE software can automatically align time codes or even audio patterns. The latter can take a very long time, with a high failure rate. Don't use it if you are reluctant to buy green bananas.

You have to resync each time either the video or audio stream is interrupted. That's another reason to keep them running.

Niels - NHSN · December 5, 2020

Thanks for the write-up Ed. Very useful.

Those coming from still photography tend to forget that great sound often is more important than the visual side. Most people are more intolerant of poor sound than poor video.

Niels - NHSN · December 5, 2020

One more ting. For a one person “do it all” setup, i hear that the new 32 bit float recorders like the Sound Device MixPre series and the Zoom F6 can relieve a lot of stress when controlling the recording levels.

Do you have experience with those?

Ed_Ingold · December 5, 2020

I have a Sound Devices (SD) MixPre 10ii with 32F bit recording. It's almost impossible to overload, and the dynamic range is cited at 142 dB. The tracks may seem to be clipped in an editor (e.g., Cubase), but if you load from the memory card and normalize the level you see that the waveform is undamaged. If you use it as an USB I/O device, levels clipped in Cubase permanently truncated. The inputs have compressor/limiter options, which keep damage to a minimum if overloaded. The 10ii and F8n have analog limiters on the preamp stage rather than downstream, which is a desirable feature. The output can be separately limited, to prevent accidentally overloading external devices (e.g., a Shogun 7 or ATEM switcher).

While I like the MixPre sound, I prefer to use a Zoom F8n in the field. It has more useful features, including power to a the FCR-8 mixing control panel. You can also use an iPhone or iPad to access nearly every setup parameter via BlueTooth. Sound Devices has the Wingman Bluetooth app, which is limited to record/stop and level display. Neither has remote monitoring, so I use a Bluetooth transmitter with Sony phones. I keep session notes referring to the time code of the sound recorder. The remote apps are much easier to see than the tiny LCD screen.

You are unlikely to overload if you use due diligence setting up, and with limiting, it's seldom catastrophic.

Edited December 5, 2020 by Ed_Ingold

Ed_Ingold · December 5, 2020

duplicate post

Ed_Ingold · December 6, 2020

All video devices delay the video with respect to real life, including the camera itself. That delay is generally on the order of two frames (ref 30 fps), or about 1/15 second. That can often be ignored for speaking, but it can be obvious for music videos, especially drums and piano. The solution is to delay the audio by the same amount. Sony cameras do this with a menu option - "Live Sync or Lip Sync." (Use Lip Sync for external recording). It's important to insert the sound upstream of further video processing, particularly in a computer. Even though the computer can add 1/2 second or more latency, up to 4-5 seconds for live streaming. If sound is embedded in the input signal, both the sound and video are delayed the same.

Audio recorded by the camera's microphone is usually of poor quality, and too far from the subject, hence subject to ambient noise and reflections, resulting a hollow sound. However it is useful as a reference when adding an external sound track. It is wise to check the latency of that signal. To do that you need an easily identifiable visual and audible event. The traditional method uses a clapper board, but that's not practical at a concert or event, and disruptive in a recording session. (Movies are shot in a series of 10 to 30 second clips. The main use of a clapper board is to identify the take and time code so they can be sorted and synchronized while editing.) You can measure the delay to the nearest frame while editing. The visual cue can be a person pronouncing a "p" or "b" word, and matching that sound in the audio file. This is usually well into the clip, so you have to be something of a lip reader. You can also use a drum beat, bow touching a violin string, etc. Be imaginative! Sometimes the audio track is missing, which makes things harder but not impossible. My PTZ Optics cameras are mute by design, and the mic/line switches are easily bumped in handling more often than you might wish for.

Monitor/recorders, including those by Atomos, can insert an external sound feed into the recording and output. The latter is essential when live-streaming, and saves a lot of time when editing multi-camera shoots. Most video switchers have audio inputs which can be mixed into the output. Mixers and switchers often can often delay the sound in millisecond increments. This adjustment is global, so you need consistent performance from cameras and video processors. The Black Magic switchers and Atomos Shogun 7 in my kit do not add latency which I can see, hear or measure.

Audio/Video latency can have unexpected consequences. If you listen to your own voice with a small delay, you tongue will become glued to the roof of your mouth (figuratively speaking). A delay of 1/2 second or more is highly distracting if it can be heard at the same time as the venue sound (keep those iPhones off!). Live-stream broadcasts add delays of at lest 4 seconds (e.g., Vimeo) ti 30 seconds or more (e.g.,YouTube).

Churches, probably the most significant users of live video outside of commercial broadcasting, need to monitor the results discreetly. Churches often place TV monitors for overflow areas where there is significant sound bleed from the main body. In those applications you should use direct SDI or HDMI feeds, which don't add latency. NDI is a new technology licensed by NewTech, and uses the same ethernet cable for camera control, audio and video. It also adds delays (to both audio and video) of 400 msec or more, depending on network traffic and configuration of intervening routers. What seems to be simpler and less expensive to implement has put forehead dents in a lot of walls.

Ed_Ingold · December 16, 2020

While there are many microphones which attach directly to a video, DSLR, or mirrorless camera and may offer improved sound quality, most are best suited for VLOG applications. They fall short for recording events, music, or even spoken word.

The main problem is that distance from the camera, and ideal distance from the microphone rarely coincide. Too far, and room noise and reflections spoil the quality. Too close and production noises (breathing, pops, key clicks, etc) become objectionable. Mic placement is highly subjective.
Direct analog connection to a camera is subject to a mismatch of signal levels, and a paucity of meaningful specifications. Plug-on microphones are mostly too hot. When you turn the levels down, the noise level increases, and the signal may still overload (and clip) depending on where the gain settings are in the signal chain.
Unbalanced audio and USB cables should be 15' or shorter to minimize hum and dropouts. That said, many situations work with a 15' cable or less.
For best quality and flexibility, use balanced cables (TRS or XLR) for mic or line signals. Secondly, look for a digital interface with mic preamps. USB is digital, and Sony has digital interfaces which fit in their smart flash shoe. The best microphones are condensor types, which require 48V phantom power. Most digital interface units have switchable phantom power.

As a professional, I'm free place microphones and cameras where needed, subject to safety and decorum issues. I still have the problem of getting video and audio connected together.

I try to avoid attaching cables to a camera as much as possible, especially microphone cables. They're heavy and a trip hazard, which could dump your camera too. Mini and micro connectors are easily damaged. If I need live sound for streaming, I mix it in at a video switcher or recorder. Bluetooth has too much latency to be useful, up to 1/4 second, and it's not consistent. Wireless (~500 MHz) sets have nearly zero latency, but good ones are expensive. You can safely forget garage door frequencies (27 MHz) because of bandwidth congestion. Most of the time, I need an SDI or HDMI cable to connect remote cameras, If that's not practical, there are wireless sets which handle HD video and sound in the $600 to $1700 per pair price range. 4K transceivers cost a lot more.

invisibleflash · January 13, 2021

You should make a YouTube channel on OP.

Thanks for all the tips and put up some of your video work.

I work mainly with silent film, but about 20% of my archive is sound film. I still have to figure out something for it as a sound scanner is very $$. All I have is a silent scanner.

I have to try this one day for the sound films.

AEO-Light (usc-imi.github.io)

Edited January 13, 2021 by invisibleflash

Ed_Ingold · January 13, 2021

Thank you. I have plenty of samples, all music performances, but I need permission to post anything, and nothing at all involving minors. I will see if I can post a few seconds of a performing adult. I have a private YouTube channel, also a Vimeo account, but the same restrictions apply. What is an "OP"?

Ed_Ingold · January 13, 2021

My guiding principle for recording classical music is to have a single microphone (or stereo pair) as the main viewpoint. For and orchestra, that's roughly 6' above and behind the conductor. Other microphones are to supplement those sounds or instruments that need reinforcement. With an orchestra, the next step would be a pair of microphones about 8 feet to either side of center to fatten the sound. From that point, you can go crazy and spot sections and section leaders throughout the orchestra. That leads to attempts to synthesize the sound as though the musicians had no say in the balance. That leads to what I call "pop up audio", where the levels are raised on each section on their entrances, while the camera zooms in. I say "Stop, and listen to the music." Not every engineer agrees, and the extreme separation technique seems to sell CD's. My second principle is that you must like what you're recording, in terms of genre and rendering. Underlying these principles is the customer must like it too, so you must learn to hear things like your customer. If you really don't like something, it's best to decline, or say you have another commitment and offer someone you trust as an alternative.

People speaking almost always need a separate microphone. If you want to keep the mic off-camera, use a short shotgun mic on a boom or low stand, or a wireless lapel mic. It's not really necessary to conceal the mic, so you can have on they can hold. Remind them to turn it on, or better, use one without a switch and control it from the console. If you are recording two or more people for broadcast, they must be isolated as much as possible. For informal interviews, a stereo pair may suffice.

You will find sounding "natural" takes a lot of planning and experience, not to mention equipment suited to the job.

invisibleflash · March 4, 2021

Sorry it took so long to answer. Lost track of my posts.

Ed...OP = Opening Post

Why so much restrictions with what you can share?

I ran across some transcriptions made with Cedar noise reduction equipment. Pretty impressive. But don't know what he used.

Edited March 4, 2021 by invisibleflash

Ed_Ingold · March 4, 2021

Nearly all of my work is with juveniles (under 18) or professional musicians who like to maintain tight control over distribution. Perhaps I can get permission to from one of the adults to post a short sample (e.g., 10 sec).

laylamorales · September 9, 2021

Thanks for the post. Can you recommend any free programs? Or ones with the trial. I'm a newbie and not sure I need to buy expensive soft right away,

Ed_Ingold · September 13, 2021

Dropbox - Sound Sample.wav - Simplify your life

This is a brief sound byte from a recording session I had a short time ago. As shown in the photo, I have an ORTF stereo array on a boom overhead, just out of the video area. Harder to see, but I also have an ORTF array under the piano lid, which I mix to have more presence than with the overheads alone. There is also a single spot mic low on the cello, again for presence.

I'm using Schoeps microphones and a Sony A7Siii for video. The selection is the Chopin - Polonais Brilliante, Op. 3 transcribed for piano and cello.

Edited September 13, 2021 by Ed_Ingold

Ed_Ingold · October 27, 2021

Sony cameras have an interesting feature. In certain cameras, the flash shoe is a multi-purpose interface, which can accept a microphone interface with a digital connection to the camera, bypassing the internal preamp and unbalanced phone jack port. Level adjustments are made on the interface, which has up to two XLR inputs plus an optional mounted microphone.

Dangling one or two heavy cables from the adapter is hardly ideal, even for fixed operation on a tripod, much less hand-held. Sony makes an adapter into which you can snap in a Sony wireless mic receiver with one or two channels. There are two types of compatible transmitters - belt pack and microphone plug-on - either of which can be configured for mic or line inputs. The plug-on is designed for use with a conventional microphone and provides 48 v phantom power if needed, and simply plugs into its base. The belt pack and plug-on can also accept an XLR connection from a mixer.

Normally I don't bother with a mic feed, but on occasions I operate a camera at a considerable distance from the audio mixer/recorder. A wireless connection gives me a zero-lag, stereo backup should a non-fatal error occur at the mixer, such as a full memory card (don't ask). Bluetooth could be used too, but it incurs an excessive lag, from 60 msec to 250 msec, not always consistently.

When streaming, I need to work from a central location, handling both audio mixing, PTZ camera control and video switching. In that case, I like to use a camera in the balcony, for example, with a WiFi video transmitter, That signal carries both audio and video, and using Teradek equipment, has lag on the order of <1 msec.

Sign In

Sound for Video

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in