All of My Free Sound and Music Software / Some Videos

Music Visualizer: A Music Comprehension / Music Analysis Tool: Interactively play and display MIDI files with tools to see chords and keys to aid comprehension of tonal aspects of the music, with optional overlaid pitch feedback for learning to sing along with the melody, or in harmony, and several other features.

Recently added: some Ear-Training features for pitch recognition within a key, and sing-along, that are more flexible than the other software I've seen. (It lets you adjust to a random key, random instrument, or change trial notes exactly when you want to and need to for best learning. It lets you play key identification sounds such as scales and tonic cadences exactly when you want them and need them. Sing-along pitch feedback can always be on -- just don't look at it when you don't need it but look at it when it will help you learn.)

Click here for Norm's Music Visualizer Information and/or Download.

Tonegen: A versatile tone generator, including multiple notes, and variable overtone frequencies and volumes, for exploring the science of timbre, "musicality" of sound, consonance, dissonance, roughness (theories of Helmholtz, Terhardt, Parncutt, etc.). Supports rapid switching between up to 3 variants of similar or related sounds (e.g. "probe tones"). Also has separate functionality for testing pitch-discrimination perception, effect of phase-difference between ears, etc. Also, you can take peaks at a point in time from a SpectratunePlus spectrum, and pass them to ToneGen via a clipboard, and you get the same sound, and you can also monkey around with them--shift individual overtone frequencies and intensities.

(One fun thing you can do, is take a musical sound, and switch off and then any overtone. For louder overtones, you get, at least for my ears, a robust "hearing out" of the overtone switched on and off. That is, the sound, once perceptually part of the composite sound with its timbre, stands out as a very engineeringish, non-musical sine. The "hearing out of individual overtone sines" is much studied in the psychoacoustics literature, and some people can learn to do it without turning off and on, or any other manipulation, of the overtone(s) they hear out.)

Click here for TonGen Info.

SpectratunePlus: Does spectral/harmonic analysis on live sound, and also recorded sound. Several aural-feedback, and key/chord-related, features.

Click here for SpectratunePlus Info.

Spectratune An earlier version of SpectratunePlus, limited to analysis of live sounds. Some with live-sound-analysis needs may prefer it, because it has a simpler interface than SpectratunePlus.

Click here for Spectratune Info.

My (now ancient) Spiral Spectrogram Videos (here). These are a video version of the spirals that you get in live mode of SpectratunePlus, recorded against mostly classical music recordings. There is even a full length video of Brahms' German Requiem.


A Music and Science Site

This page discusses, and has examples of, a kind of spectrogram video geared towards exploring music as it relates to fundamentals and overtones.

I (Norm Spier), put the page together way back in about 2002.

Some of the software I have been programming since that time allows you to get a music spectrogram of your own piece of music, and in many more formats. (That software is available on this site.)

Nonetheless, I am leaving these old examples up, as they give a person a way to look at a music spectrogram by just viewing the media (all .avi files) without monkeying with any software.

NOTE: I am able to play these old .avi files, getting both video and the accompanying musical sound, on Windows Media Player. (Some other players, which perhaps your computer manufacturer or someone else has made on your PC as the default player for .avi files either don't produce audio or don't produce video. So you should view the files in Windows Media Player if you don't get both audio and video with your default player.)

You may also find useful some of the various other music software I have written and updated recently (thru 2016), described in the box at right, which allow you to look at your own music in the equivalent of the videos on this page (and much more).-->

For the purposes of looking at performed music, consider a spectral analysis (i.e., pitch breakdown) for a fixed time that looks like:

On this type of spectrogram, when you go around clockwise 1/12 of a revolution, you go up a half-step. The half steps (against a particular tuning, here the piano was tuned to A3=222hz) lie on the white rays emanating from the center. What we have in this type of spectrogram is that all of the occurrences of the same notes on higher octaves are on the same ray. Going out one level on the blue spiral curve brings you up an octave. The labels of note and octave (e.g. "E8") give the octave for the note shown on the ray on the outermost loop of the spiral curve. (E.g., the example shows of note E, some E3, E4, E5, and E6, with E5 the strongest.)

Now, if you do a spectrogram like this every tenth of a second or so, and look at it synchronized with the playing music, you get a dynamic picture of the music.

(People familiar with the science of hearing should note that what you are looking at in such a dynamic spectrogram is the raw data that the brain gets from the ear.)

Here [.avi] is a 30 second a segment of recorded music (from Bach's Goldberg Variations) as such a dynamic spectrogram. (NOTE: ".mov" format gives a sharper image, but on Windows, you need Apple's Quicktime player. ".avi" should work on any Windows machine. On a Windows computer, if you don't already have it, you can get the Quicktime player needed for the sharper ".mov" for free by clicking here. These files are about 4Mb each, and take about 20 seconds to load on a high-speed connection, longer on a low-speed.)

Here [.avi] is another segment of recorded music (about 35 seconds from the Brahms Requiem movement 2).

NOTE: You can pause these sample clips, go backwards, forwards, single frame, even in the I.E. browser view. To do so, use the controls on the bottom right for Apple Quicktime, and elsewhere for other players.

Note that what you are looking at in such a spectrogram is not just the notes in the score, but also the harmonics of those notes, as present in the recorded music. The amount of such harmonics will depend on the particular instruments and how they are played. (Where, exactly, will the harmonics be? --click here to see.) A detail here that winds up being important is that, each fundamental and each harmonic is, by definition, a sine-wave shape. This is important because the ear (via the basilar membrane) discriminates and separates out the separate sine waves in a sound, i.e. it separates out the tones and harmonics. The spectrograms as well separate out the sine waves, and display strength of each sine component.

The fact that the spectrograms contains both the notes and some harmonics can be a bit confusing at first. It helps to look at a single note, intervals, and chords, which I have spectralized and can be examined below (under MORE SAMPLE SPECTROGRAM CLIPS). One of the basic observations is that the patterns of a single note, or a consonant chord, have energy at a fundamental, 4 half-steps up, then 3 more up (counting angle only, not octave). However, when a note is played softly, some of the harmonics may not show up, and either the 4 or 3 more half-step-up positions may not show. (They are caught by my analytical method, actually, but are below the start of the scale that I have used on my spectrogram.)

Unification of Single Sounds: As in these spectrograms, the ear does output the information about all the different harmonics of each separate sound as separate signals, and, in its marvelousness, the brain puts it all together into the correct picture of actual distinct sounds. this is done substantially using the perceptual mechanism of "common outcome" -- that is, the harmonics of a given sound go through parallel changes in volume and frequency-shift (as well as having a prescribed harmonic frequency relationship) -- this allows the reconstruction. With a little practice, you can somewhat see the parallel changes and separate out different sounds in these spectrograms. (It is much harder to also incorporate the harmonic frequency relationship information -- everything is happening too fast.) Though the visual perceptive apparatus can't sort this out all that well (and further, my frame rate is a bit slow to give the visual perceptive apparatus its best shot) -- the aural perceptive apparatus apparently can.

Consonance and Dissonance: Psychoacoustic studies show that dissonance is caused when music contains sine components (i.e. fundamental or overtones) that are close to each other (less than the critical bandwidth), but not at virtually the same tone. The critical bandwidth is about 3 half-steps when we are dealing with tones above about A5, and a bit larger (about 100 hz -- a frequency-varying number of half-steps) below A5. The dissonance occurs when substantial such close sines exist, either in any of the fundamentals or overtones. (This yields surprising results: sines 11 half-steps apart DO NOT sound dissonant, though the major 7th interval is classified as dissonant. The sines do not sound dissonant only because there are no overtones -- there would be clashes if we added in overtones. See my example moving sine spectrogram below.) At any rate, consonance should be pretty perceivable from these spectrograms -- close, but not negligible, differences.

Do note that these spectrograms supersede neither the score, nor music theory, for appreciation and analysis purposes. It is simply an additional tool, which provides certain information about the sound. This information is likely closer to the input feeding into the nervous system than is the musical score. Though the score, particularly after a music-theoretic interpretation is applied, may be closer to what is actually perceived after processing by the brain, particularly for melodic elements.

Though the spectrograms are scientifically pure and without any artistic enhancement, you may notice certain visual symmetries and beauties, and certain beauties in the combined visual and aural, as in dance.


Basically, the ear distinguishes sound pitch by use of a long membrane called the basilar membrane in the inner ear, whose thickness and stiffness varies over its length. The varying thickness and stiffness cause different portions of the membrane over its length to tend to sustain vibration in response to particular different frequencies of sound. (In engineering terminology, the different portions "resonate" at different frequencies.) At any rate, when the ears receive a sound, the nerves sense which portions of the basilar membrane are vibrating, and thus can determine the pitches present.

Now, the computer program that generates these spectrograms does NOT simulate the basilar membrane exactly. But it uses a similar idea of resonation induced by vibration of an elastic object. Instead of a vibrating basilar membrane, the program uses a large number of (simulated) tuned, damped mass-springs picking up the sound. Each mass-spring looks like:

The sound vibration (change in air pressure) puts an alternating leftward and rightward force on the mass. The resulting motion builds up to a very large amount if and only if the spring tension is just right so that the mass moves a little, springs back and forth, and does the springing back and forth so that it is in synch with the continuing back and forth of the sound pressure, that is, so that the sound pressure is always going in the same direction as the building up motion. If this happens, we have resonance. It works out that the resonance will occur only close to one particular frequency (with the resonance being less as we move away from the frequency). How quickly the resonance reduces as we move away from the resonant frequency depends on the amount of frictional/air-resistance forces encountered by the mass-spring.

All in all, by selecting lots of mass springs with varying spring tension, masses, and friction/air-resistance levels, my program comes up with about 3000 mass-springs which have resonant frequencies spread throughout the audio frequencies. (The tendency of each spring to resonate as we go off the frequency of greatest resonance is set as appropriately as the laws of physics allow.) The program simulates the sound pushing the mass-springs, and plots the amount of vibration of each mass-spring at the appropriate point for its resonant frequency on the spiral spectrogram. It does this each 1/10 of a second or so.

NOTE:The power that I have plotted is decibels of power, where the total number of decibels in the scale is indicated on the plot in the lower left. (Less-technical people: This is a logarithmic scale. When you go up 10 decibels, you multiply the power by 10. When you go up 20 decibels, you multiply the power by 100. Thus, for a recording where the plotted range is 30 decibels, a bit of sound that goes 1/3 of the way up the scale is about 10 times as powerful as one you can just barely see. One that goes 2/3 of the way up the scale is 100 times as powerful as one you can just see, and one that goes all the way up is 1000 times as powerful as one you can just see.)

As another example, on the still at the top of this page (where the indicated range is 25 decibels, not 30), the E5 is roughly 20 decibels, or 100 times, more powerful than the E3 or E6.

Engineering and Math people: I have a few technical details at the bottom of this web page.

(Each plays for about 30 seconds)

[.avi] MOVING SINE AND FIXED SINE. The moving sine starts more than an octave down from the fixed one, and goes to more than an octave above. It is not dissonant when anywhere within a few half-steps of an octave down, or anywhere within a few half-steps of an octave above. It is only dissonant within a few half-steps of the fixed note (but not when almost exactly at the fixed note).

[.avi] of a playing of intervals on the piano (middle C is always the lower note):
First consonant: unison, octave, perfect 5th (7 half-steps), perfect 4th (5 half steps)
Then imperfect-consonant: major 3rd (4 h-s), minor 6th (8-hs), minor 3rd (3-hs), major 6th (9-hs)
Then dissonant: major 2nd (2 h-s), minor 7th (10-hs), minor 2nd (1-hs), major 7th (11-hs), augmented 4th (6-hs)
Note: The explanation in Cook's book of the presence of close tones seems to hold. However, if it also has to do a bit with the more dissonant patterns being simply unlike the patterns you get more used to from listening to harmonics in nature -- like the human voice (energy at the rays 4 and 7 half-steps up from the fundamental -- e.g. like the first sound--the unison), this wouldn't surprise me. (My moving sine example doesn't seem to support the latter, but I'd need to see more variations on that to be sure. And pass them through expert ears, not my own.)

[.avi] SOME TRIADS: a C, followed by a major triad, a minor triad, a diminished triad, and an augmented triad (all with C as root). Again, the explanation in Cook's book would do it. An again, I wonder if the deviation from the standard harmonic pattern adds some bite for the dissonant ones.

[.avi] APPLAUSE. The spectrum is diffuse (noiselike), running over about 3 octaves. (There is also at times some energy at a very low frequency. This is not the applause, but some recording equipment rumble.)

[.avi] SOME INDIAN MUSIC. There is a Sitar (Plucked fretted instrument playing the melody in an improvised fashion within the bounds of the "raga" or formula), Tamboura (Drone: Playing open-string always, tuned to the main tones of the raga), and a Tabla (tuned-pitch hand drums, tuned to some main tones of the raga). This is the Hindustani variant of Indian music, and is in the Dadra raga.
An Indian raga is roughly a formula combining particular notes and orders of playing of those notes. The notes are roughly (but not exactly) a subset of those spaced as within a tempered Western scale. Thus, after setting my software to show A at 227.5 hz (an unusual tuning for Western music), the fundamental tones of the music appear roughly where they would in Western Music -- that is, on my 12 outward rays.

[.avi] from the Beethoven 4th Piano Concerto.

[.avi] from the Beethoven 2nd String Quartet.

THANK YOU NAXOS: I am grateful to Naxos for making available a number of its high-quality professional recordings for this project. Here is the Naxos site.


I have not used the somewhat standard Fourier techniques to do these spectrograms. The battery of tuned damped mass-springs seem closer in functioning to the basilar membrane than Fourier transforms. Further, the efficiency of the FFT does not come into play so much, since the evenly spaced musical intervals are not evenly spaced in frequency.

I have no knowledge of how the method I have used might compare with using windowed Fourier transforms. (My guess is that the results would be roughly similar. However, this comparison does not apply to DFT/FFT -- my technique is much sharper in tone distinctions.)

With Fourier transforms, there is a well-known tradeoff called the Uncertainty Principle (absolutely NOT related to Heisenberg's) where the shorter the sample in time, the less precise the image in frequency. This tradeoff is clearly visible when I look at the examples in my method as well. In my dynamic spectrograms, I choose parameters to place a cap on the rate of spring slow-down after the sound signal is removed (keeping lag or sluggishness of response under control). Doing so yields frequency images which are less and less sharp (even on synthetic pure sines) as we go down in frequency. (Some charts in Cook confirm that the same type of thing happens in the basilar membrane. Further, the wider critical bandwidth (in terms of half-steps) may be another manifestation of this.)

The precise modelling I use for each damped mass-spring is:

x" = -(k/m)x - (c/m)x' + s/m

here, x(t) is the one-dimensional position as a function of time, k is the spring constant, c is a damping constant, m is the mass, and s(t) is the one-dimensional force placed on the mass by the sound vibration in the air.

This mass-spring model and differential equation is covered in virtually all basic physics and differential equations texts, and the solution to the equation is given (with or without proof).

It is important to comment that, of course, it is not in the nature of biological things, like a basilar membrane, to be precisely engineered so that neurological detectors can be pre-wired to know that this position on the membrane resonates precisely 2 octaves above where this other point resonates. That correspondence would be learned, either from music, or from simply the experience of hearing sounds in nature. Thus, the spiral layout that I have used, with outward rays representing the same note in all octaves, presents the information not quite raw to the nervous system, but actually after a bit of neurological processing.

What Are the Frequencies of the Notes?: "A" right below middle C has a fundamental that is a sine of 220 cycles per second. Each time you go up a half-step, you multiply this by the 12th root of 2 (about 1.059463094359), down a half step, divide by the 12th root of 2. In each case, the harmonics are sines (in varying strengths, dying out as you go up) at 2 times the fundamental frequency, 3 times, 4 times, 5 times. Thus, the B right below middle C has a fundamental of 246.94 cycles per second, with harmonics 493.88, 740.82, 987.77, etc. DETAIL: This way of determining notes is the common standard way, called the tempered scale. Sometimes instead of the A below middle C being 220 cycles per second, it is a few cycles different. (My spectrogram software used to make the videos was adjustable, and sometimes I adjusted to the apparent tuning used by the ensemble in the recording -- e.g. 222hz in the first screenshot.)

A discerning observer's question: why sines?: The notion of frequency and harmonics implicitly defines that a vibration of a certain frequency is a sine function of that frequency. Why does everybody seem to choose this (the set of sine waves) as our "basis"? Most books just start out by looking at the sine as the fundamental wave, without saying why.

I am not sure all of the reasons, but the fundamental and best reason is that the sine is what the ear perceives. As above, the psychacoustics literature seems to bear this out. (However, the common choice of sine probably predates the psychoacoustics literature.)

A second reason is that the general model for most acoustical transformation, the linear-time-invariant system (supported by physics and usually reasonably accurate), preserves sines (just shifting and changing amplitude). Incidentally, using the knowledge that linear time invariant systems are those systems that are describable as the effect of an impulse response (i.e., essentially a large linear combination of shifted versions of the input wave), then the fact that linear time invariant systems preserves sine waves boils down to the well-known elementary trigonometric identity cos(x+y)=cos(x)cos(y)-sin(x)sin(y).

If anyone knows of any other reasons for using sines, please let me know.

SPIRAL REPRESENTATION: Of course it's not new. The spiral representation is pretty obvious, so one does not expect it is new. Indeed, I have bumped into a few people who have used it recently, and the book by Perry R. Cook indicates that the German physicist Moritz Drobisch proposed a helical representation (essentially the same thing--just pull up my flattened spring to form a stretched-out spring -- a helix). I expect the representation is even older than that, and I do know of one of my old Math professors who would be pretty surprised if Archimedes didn't think of this arrangement. (Oh, by the way: there is a terminology for the angular position around the spiral or helix (i.e. the note without reference to octave) -- it is called the chroma of the tone.)