Free Music Software written by Norm Spier (


Musical Pitch/Spectrum Software for Live Sounds

Windows XP / VISTA / Windows 7 / Windows 8 / Windows 10 SOFTWARE (Full Version: Free, No Ads)

Overview and Instructions

Spectratune Software: MUSICAL SPECTRUM ANALYZER (For Live Sounds)

SpectratunePlus :
Windows Software that Generates Several Types of Spectrograms of Music: Both from Live Sound and from Recorded Sound, with Many Features for Musical Aural Feedback (MIDI notes, Sing-Along pitch overlay), Other Features to Aid in Chord and Key Recognition, as Well as Some Other Features to Support an Understanding of How Overtones Relate to Perceived Musical Sound.

However, some may prefer the Spectratune software on this page, if needs are limited to live sound, as the interface is substantially simpler.

SpectratunePlus Information

-- Displays In Real-Time For Any Sound Going Through a P.C.

-- Dual Analysis Modes: Simple Displayed Arrow (Showing Fundamental Like a Chromatic Tuner) for Single-Pitch Sources
and/or Musical Spectrum (Showing Fundamental and All Overtones) for Multi-Pitch Sources

-- Can Optionally be Used to Interconnect with Norms Music Visualizer to add Sing-Along feedback to any Piece of MIDI Music

Screenshot 1 (Above) -- Note this is a still shot of the software's displaying "motion picture" of the sound as it changes over time.

Importantly, the way I have the program set for the picture, notes go up like reading: as you go right (within an octave range), and as you go down (each new row is the next octave). (You may think I have up and down inverted from what is natural, but somehow the analogy for Westerners of the way we read seemed most natural to me. Anyway, I actually have "up goes up" as an option, so you can run it that way if you like, and that's what I did for the videos on this site.)

The screenshot was taken with my electronic keyboard feeding my computer sound-card as device 1. I also happen to be listening to that keyboard with headphones, and humming along and matching pitch into my webcam microphone, which is set as device 2.

The keyboard is playing the E right above middle C. What shows in the spectral analysis for the keyboard, in blue, are the note (fundamental) and the first 12 overtones. (The electronic keyboard was set to a saxaphone -- a timbre rich in overtones. On say the piano setting of the keyboard, the higher overtones are less conspicuous. For the Sax from the keyboard that I did use, there are in fact some overtones beyond the first 12 visible in that last half-octave when I raise the "Plot Gain" a little beyond what I have it at for the screen shot.)

Because the keyboard generates the sax with a little vibrato, if you watch the fundamental and its overtones in motion, they actually move up and down in complete synchrony by about a 1/5 of a half-step. At the moment I snapped the screen shot, the vibrato was putting the true frequency just a tad above the note.

If you were using the blue spectral display above to check tuning or intonation, you would look for the lowest overtone to get the exact note, rather than an overtone.

When you have a single sound (with overtones as their usually are, of course), you can also use the single-pitch detection mode, which works like a normal chromatic tuner. (And by the same algorithm -- called autocorrelation based.) This is shown above in red for the keyboard. (Beware that the autocorrelation algorithm is actually imperfect, and in particular it occasionally picks out the right note in the wrong octave -- whether in this software product, or another chromatic tuner product.)

As I indicated, I was humming along trying to match the note in the webcam microphone when I took the shot. I am only running the single-pitch detection on the webcam microphone channel -- no spectrogram. The single pitch detection is showing as the yellow arrow. It is picking up only my humming and not the keyboard because I am listening to the keyboard through headphones.

The spectrum display (blue above) works no matter how many sounds are present. When things are more complicated (say a singer with an instrument, or Peter, Paul, and Mary with several instruments), it gets a little difficult to figure out what's what. One hint is: (a) that a single note always shows the straight down pattern of the fundamental, then the first overtone one octave below, and the 3rd overtone one octave below that, plus occasionally the 7th overtone and 15th overtone straight down below those. Another hint, when you see it changing in time, is (b) that overtones from the same sounds move together. You can often pick this up. Finally, (c) when you have more time to look at real detail, the precise pattern of overtones from a single sound that may exist must be as in the screen-shot (except with additional overtones after the 12th), and using this information, you can often deduce additional sounds. (Frequently, while looking at string quartet recordings or the audio from MIDI files, after going to the lowest fundamental, I can deduce two other fundamentals down within the first two octaves of that first fundamental.)

The ear actually works much like the spectrogram. There is a long curled thing in the cochlea of the ear called the basilar membrane that vibrates at different sections corresponding to the fundamental and all overtones present. Several thousand nerves transmit information to the brain about what sections are vibrating. The brain then puts together single sounds basically by method (b) above. With my software, you won't be able to do as good a job at putting things together as your brain does with the ear, partly because my software doesn't respond as quickly as the basilar membrane and auditory nerves. But you get some idea what it does by looking at the software.

(NOTE: I have a slight oversimplification in the last paragraph. At the lowest frequencies, there is evidence that the brain may actually use, additionally to the information about WHERE the basilar membrane is vibrating, or solely, an actual COUNT of the vibrations. The count would be transmitted via neural firing frequency.)

For deaf folks with auditory nerves still intact, a cochlear implant actually works by taking something like the signals from a scattering of points across the spectrogram, and sending them to the appropriate nerves along the basilar membrane.

Some may think the spectrograms (whether from my program, or another spectrogram program) are defective because the overtones don't show on the graph in one sharp point, but rather are a little spread out. But that actually parallels what's going on in the ear on the basilar membrane -- which is again the data that the brain gets. The basilar membrane won't vibrate in just one precise point (even if fed a perfect sine wave), but will vibrate in a small zone, with the amount of vibration peaking at one point. The ear doesn't work by detecting absolutely precise sine waves -- it works by picking up where the basilar membrane is vibrating and figures out the pattern.

As a technical note, spectrograms work by breaking signals down to sine waves. One might wonder what is the significance of the sine waves -- why sines, isn't the choice by all those engineers of that mathematical sine shape arbitrary? No, it's not arbitrary. It works out that that's what the ear, at the basilar membrane, picks up. It vibrates at sections acording to what sines are present.

Oh, I need to explain the first plot down from the top (on the panel displayed in screenshot (1)). That is the autocorrelation used in the "single-pitch detection" algorithm. (The plot is split into halves for each device. At the exact moment I snapped the shot, the autocorrelation for device 1 was not on screen.) The autocorrelation is not that interesting, but I put it there because it can help you figure out when the single-pitch detection will work well, and make adjustments, or move your mic, to make it work well. (It will work well when the plots hits very near the top line in clear places -- rather than being kind of ambiguous. In this case, for device 2 it is hitting in 7 clear places -- and the single-pitch-detection algorithm is working well on that device. (For device 1 there was a similar situation, and, thus, the red detected-pitch arrow is showing, and agreeing with the fundamental in the spectrogram.)

Also, I may need to explain about the amount of each frequency that the spectrogram shows. It is in decibels (dB), which is on a "logarithmic scale". Every time you go up 10 dB, you multiply the sound has 10 times as much power. (If you go up 20 dB, the sound has 100x as much power, up 30dB, 1000x as much power). Now, how many dB of power range (i.e. ratio) the full range of my plot shows depends on the dynamic-range setting you make. In the picture its 74dB, so the range in the screen shot above runs over a bit more that 10,000,000 to 1 power ratio. From "eyeballing" it, it looks like the first overtone is about 30 dB down from the fundamental, and so the first overtone has about 1 one thousandth as much power. Some of the higher overtones are stronger than this. (The pattern of overtone strengths you get depends on the instrument and how it is played. The way these vary is perceived as "timbre".)

By the way, if you've downloaded the program, to move any of those 5 "sliders" (of which dynamic range is one), you just left-click on where you want to be on the slider.

Screenshot 2 (Above; Spiral Mode) ( This spiral form of display may be more useful when the key is unknown, and also for chords and music theoretical/psychoacoustics exploration and theorizing.) The particular example is of a major tonic triad.

Math/science notes:

the fundamental and overtones that the music theorists refer to are all, precisely, sines, though they rarely say it, and perhaps some do not know that detail. The most important reason overtones are defined this way is that sines are what the ear hardware breaks sound down into. I have a little more on that on my old pre-made spectrogram video page. My Spectratune's, and all common spectral analyses, display their output in terms of that same sinusoid "basis", which is the one relevant to hearing. Also, the exact frequencies are: A right below middle C has a fundamental that is a sine of 220 cycles per second. Each time you go up a half-step, you multiply this by the 12th root of 2 (about 1.059463094359), down a half step, divide by the 12th root of 2. In each case, the harmonics are sines (in varying strengths, dying out as you go up) at 2 times the fundamental frequency, 3 times, 4 times, 5 times. Thus, the B right below middle C has a fundamental of 246.94 cycles per second, with harmonics 493.88, 740.82, 987.77, etc. DETAIL: This way of determining notes is the common standard way, called the tempered scale. Sometimes instead of the A below middle C being 220 cycles per second, it is a few cycles different. My Spectratune uses 220 as the default, but this can be adjusted.

(People familiar with the science of hearing will note that what you are looking at in the spectrum output (but not the single pitch arrows) is basically the raw data that the brain gets from the ear. However, there is one limitation--the ear puts out data, and the brain processes it quite quickly, perhaps every thousandth of a second. With my Spectratune, you will only be able to get perhaps 5 or 10 or 20 pictures per second, depending on your computer, and the exact way you have the Spectratune adjusted.

The curious might wonder about the spectrograms showing always overtones not at a precise point of pitch, but rather peaked at a pitch and tapering down perhaps 30db (1/1000th in energy) gradually over the adjacent lower and higher half step. The answer is that this is also the way the brain receives the information--it gets excitement levels along the varying-pitch-resonant basilar membrane, never a point of pitch indication. My software uses a resonant process similar to what happens on the the basilar membrane. (I explain what I did in a bit more detail on my Spectrogram videos page, which videos preceeded the Spectratune by a few years, but which uses basically the same method.)

You might also want to know that the single-pitch detection, which is the second type of analysis from the Spectratune, which does give a precise point of pitch (of the fundamental of a single musical note from a single instrument or singer) does NOT work like the ear. It works differently, getting data that the brain does not get from the ear, but that the computer does get. Its the perfect thing to track pitch of a single instrument or singer, or to tune an instrument, but it happens to work using specific data that the computer gets from the microphone -- air pressure level very precisely in time -- that the brain does not get from the ear.

Incidentally, in people who are deaf, where the defect is in the ear hardware, but nerves leading from the ear to the brain, and the brain audio-processing area, are normal, a device called a "cochlear implant" can replace, to a limited extent, the signal from the ear hardware, and send it through the usual nerves up to the brain. The sound information replaced is similar to that shown in the spectrogram, but coarser, in that level information is sent only for about 24 different frequency zones. This is good for speech, but musical sound is not usually regenerated effectively. This National Public Radio audio story gives the state of the situation as of 2005, and includes some simulated sounds of music as heard by a deaf person with a cochlear implant. You can find also find some information about the sound information replaced here at the FDA cochlear implant site .)]

Screenshot 3 (Above; Device-Selection Panel)
Adjustment Controls and Tips to Tweak Execution Speed (i.e. rate of new frame display)

What you're doing with the spectratune (tuning an instrument, checking your voice intonation, looking at intonation of notes on a recording, studying timbre, etc.) will affect how you have things set.

In general, to speed things up (and show a quicker, "more responsive" picture-rate), you can either reduce the number of tasks being performed, or do an adjustment which in some sense reduces the detail of the analysis.

Reducing the number of tasks being done means switching off the analyses you're not really using. In my own screen shot at the top of the page, I have both the spectrogram and the single-pitch detection switched on. This was to make a nice demonstration picture of the program working. But, for faster frame rate, one of those tasks could be turned off. (If the sound contained more than one pitch, the single-pitch detection won't even work, so it of course should be the one switched off.)

Here are the adjustments that trade-off speed vs. detail. Note that in many cases, depending on what you're doing, a lot of speed can be gained with only an imperceptible loss in detail.

  • Sound Amount / Analysis: Like virtually all sound analysis software, the Spectratune takes a small duration of sound, and analyzes it. Lengthening the duration increases the sharpness of the spectrograms (including quicker descent away from the center of the note), and also gives a bit more precision to the single-pitch detection. However, the tradeoff of lengthening the duration is slower response, that is, less frames per second. Note that the general sharpness of the spectrograms only (not the single-pitch detection) is a function of how many cycles of the tone in question that you look at. For a fixed amount of sound / analysis, higher tones have more cycles and are shown more sharply in the spectrogram. The single-pitch detection behaves a bit differently than the spectrogram, in that you always need at least about two cycles of the sound in the amount per analysis--otherwise it won't work, but enlarging the sound amount in the analysis won't usually do any additional good.

    In the analysis displayed above, I have used a .10 second analysis sample time. NOTE: Typically a users needs will be handled with sound amount set anywhere in the .05 to .10 zone.

  • Device Sampling Rate and Undersample. The device sampling rate is how many readings of sound level your device (sound card, web-cam mike, etc) is sending out every second. Without an undersample set, the program will use every one of those readings in the analysis. The higher the device sampling rate, the more accurately the signal going in represents the sound, but the slower the analysis. The lower the frequency, the less important this is. Undersampling is simply a way of making the program act like the device is sampling at a lower rate. You lose some precision, which is less important at the low frequencies, but gain some speed. NOTE: Typically, I use a device samping rate of 11.025 khz (lowest available on my system), with a 2X undersample. Equivalent to this is to use twice the device rate and a 4X undersample, etc. If you have too high a device rate or too low an undersample, the program will take longer than you probably want to display each update.

  • Number of Octaves Shown: For the spectrogram only, reducing this will speed up the analysis. (It has no effect at all on sharpness, actually. It just affects how much of the audible sound range is shown.)

  • Dynamic Range: In some cases, reducing this will speed up the analysis for spectrograms only. (No effect on sharpness.)

  • Release notes (and useable for determining latest version)

    2013.09.23: Fixed calculational error in relative power calculations. (Previously, relative decibels between different tones were scaled about twice as high as actual, due to a long-standing typing error in the code. That is, the frequencies and picture of the spiral were correct, but labelling of relative power in dB according to the graph scales were incorrect, and about double the actual value.

    2009.09.10 (actually, 2009.07.10 is the date--I goofed on the date code and am too lazy to fix it): Improved show minor/major buttons so that when one is pushed, if on, the other goes off. Added link for on-line help (under HELP menu item).

    2009.04.13: Enlarged autocorr arrows to make it show nicely even when burried in spectrogram; made spiral representation default