Screenshot 1 (Above) Spiral Live Anlaysis by SpectratunePlus Here is a shot of a single pitch being sung by me (at the D below middle C at this instant) in real-time-"live" analysis mode. Here, I have both the live spectrogram (blue) and the live single-pitch detection (red arrow, mostly covered over at the moment) turned on.
The spectrogram displays, at its rounded peaks, all the fundamentals and overtones.,
In the mode I used (both real-time-spectrogram and real-time-single-pitch turned on for input device 1) you get levels of fundamentals and overtones, both printed and with a histogram (green). (The overtone levels are taken from the spectrogram by the program, automatically.)
The real-time spectrogram (blue) works whether there is one note, or multiple notes being played. The exact pitch detection (red arrow) only works when there is one pitch being played or sung.
(The "D1" and "D2" checkboxes on the above screenshot refer to that live mode actually supports simultaneous presentation of spectrogram and single-pitch on two separate live sources. (In that case, the spectrograms and single-pitch arrows are in different colors for each source.)
Some possible uses for live-mode are as intonation aids, and timbre-understanding aids to musicians (relating overtone relative volume to timbre produced).
Spiral setup: Going clockwise brings you up in pitch, by a full octave every time you go around once. The height plotted wherever there is sound is plotted in dB=deciBels. The number of deciBels range of each band in the spiral is set by the user ("Dynamic range" slider), with the power of the lowest power that can be observed in a spiral band being adjustable with "Plot Gain".
"Unrolled" Across Spiral For some purposes, such as finding "formants" in live speech or live singing, better than a spiral is a straight-across spectrum. This option is shown above: it's the plot that starts with A55 (indicating the A at 55 cycles per second).
For those less familiar with spectrograms:
The fundamentals and overtones of the instrument or voice are at specific frequencies, (About the first 7 of these are clearly picked up in the above screenshot as the "peaks" of the blue spectrogram, with perhaps the next 5 weaker overtones more faintly visible.)
The spectrogram, which breaks down sound into the individual frequenies (sines) it is composed of for each short instant of time, is a pretty good representation of the information the ears sends to our human nervous system, resulting in us hearing any sound. This is because: The ear delivers sound to its inner basilar membrane within the cochlea. The long and thin basilar membrane tapers from thicker to thinner as you go along it, the physics of this arrangement making its "resonant frequency" vary from low to high along it. Each little bit along the basilar membrane vibrates most when the sound contains frequencies near the resonant frequency of that particular little bit of the membrane. In turn, the nerve cells along the basilar membrane send the vibration magnitudes all along the membrane to the brain. Thus, the ear separates the sound mechanically (well, actually there is a bit of electromechanical amplification by nerve structures near the basilar membrane--which is irrelevant for the key idea and can be ignored here) into its component frequencies, and passes the information about the separate frequencies to the brain. The brain uses the information about the levels of the different frequencies (essentially a spectrogram), and we perceive the sound.
Note that the vibration along the basilar membrane it is not an all or none thing. The 200 cyle-per-second point on the basilar membrane gets most excited (i.e. vibrates most) from 200 cycles per second, but the sections near 200 cycles per second get excited a little less by 200 cycles, excitement level going down as you get away from the 200 cycle point on the membrane. Thus, what the ear sends to the brain, when it receives a musical sound with precisely-pitched fundamental and overtones of say precisely 200hz, 400hz, 600hz, etc., is like in screenshot (1) in that the precise frequencies of the sound are only evident as smoothly-approached peaks in the spectrogram (or info from the ear to brain). Then, amazing brain-processing yields, from that data, sense of pitch, harmonic richness, combinations of sound, etc. For simple musical sounds, the tone and overtone peaks result in a perception of 1 pitch with overtones whose relative intensity gives a certain timbre. [Aside: in certain deaf people, the ear malfunctions and does not feed the spectral information (again the information in blue in screenshot (1)) to the brain. A "cochlear implant", is slipped into the cochlea, and the brain gets the information, and the person can actually hear, though with not quite the same versatility as a person with a normal ear.]
NOTE on deciBels: deciBels are a "logarithmic scale", and the idea is every time you go up or down 3 dB, power doubles or halves (respectively), up or down 10dB power is multiplied or divided by 10, up or down 20 dB power is multiplied or divided by 100. A change of 3dB is a small audible change -- like just raising or lowing the volume on your stereo system by "just a tad". (Now, a critical thinker might ask further whether, say the same dB level at two different frequencies, sounds equally loud, and the answer is no, not exactly, but close enough. If you really get obsessed with this detail, there exist conversion tables designed based on the average human.)
The rest of the screenshots are of the many views that you can get from the non-live view, that is what you can get if you start from a .wav recording file instead of a live sound. If you are only interested in live mode, you may want to skip down to the instructions about 2/3 of the way down the page (you can get there by clicking here) , and perhaps to the site links also farther down on the page (get there by clicking here). .
Screenshot 1b (Above) "Over-Time" "All Power" view.. Here is a the over-time view spectrogram showing a few seconds of a Brahms piano+cello+violin piece. Time is shown on the horizontal axis, while frequency=tone/overtone height is on the vertical access. The software is set to show "all power" of the spectrogram, meaning relative sound power of every frequency in the selected vertical frequency range is represented (darker shades of gray/black represent more power). (This contrasts with another mode, shown in various screenshots below, where we only plot frequencies where the spectrogram has peak powers.)
The all power mode gives a purer and more fully informative view of the sound: essentially it shows something like the vibration level along the basilar membrane in the ear. Besides pitch, it tends to make rhythmic elements in the music visible, and you can distinguish certain instruments. (As an example of the latter, the piano notes have a distinct nail-head-like higher volume at the start. Note also there is here a bit of vibrato in the violin/cello.) The all power mode can also pick up non-pitched sounds (certain drums, noises, "shhhh" sound, etc.), though no such sounds exist in this particular example.
A disadvantage of "all power" mode is that, although note frequency is visible from the scale on the side, there is no place to underlay that scale under the spectrum for easier reading of pitches (as I do on the peaks-only spectrograms). To help the situation, that little narrow striped scale about 2 inches from the left of the all power gray display is actually horizontally slidable, so you can put it right next to any tones or overtones you want to find the note-equivalent-frequency of. (Note1: To slide this, left click the mouse over it and drag it. Also, this slidable scale needs to be toggled on to appear. You toggle it on or off by hitting the "n" key. (Note2: there are other ways to get the approximate or exact note value of points in the spectrogram, involving the left, middle, and right mouse buttons, which are described in the instructions and other examples. Note on Striped Scale: Whether in all-power mode, where the striped scale is narrow and moveable, or in peak-only mode, where the scale is under the full spectral peaks display, you can set major or minor known or trial potential keys, and the darker stripes correspond to the non-scale notes. If you don't know the key and aren't trying to figure it out, you can leave it at the default of C major, and the darker stripes then represent the black piano-key correspondences.
Also shown in the screenshot: towards the left, I have told the software to place several vertical-stack overtone-position markers, which mark where (natural non-computer) pitched instruments will place any overtones if the fundamental is at the bottom of the stack. The stack on the very left is a visually lighter version, which also doesn't go up as many overtones.
At some point before taking the screenshot, asked the program to tell me the relative power and exact frequency / note-value of a particular point on the spectrogram by using the mouse and clicking the middle mouse button, and that's what the "Marked General POWER" line is.
Besides querying for the power at any frequency and time, you can get a plot of powers at all frequencies at any time (not shown above, but shown in screenshot (1c) immediately below).
Note: To produce any over-time view, such as the one above, you do cannot use live sound (i.e. directly from a microphone or currently-playing CD), but rather you must use a .wav file of recorded sound. [You have to have the program make an stage-1 analysis file (.spf, for "SPectratune file), which will contain the spectrum information over time.) The process of making the .spf takes roughly as long as the .wav file to play, depending on exactly how many cores your cpu has.] Once you use the SpectratunePlus to make the .spf file, you can then use the SpectratunePlus in "play back" mode: you can look at music paused, or play it or sections of it, and see this over-time view simultaneously, generate MIDI sounds, get sing-along pitch feedback, etc. You can also get the spiral-view if you want. (In playback mode, the SpectratunePlus uses both the .wav and .spf files to allow it to quickly get and display the spectral information while also playing the music.)
Screenshot 1c (Above) Over-Time All Power view, adding the panel graphing measured power of all frequencies at any instant.. To get that graphing panel, which comes up attached to the sliding scale, you left click anywhere on the all-power view with the desired time point while the "M" key is down. You can get rid of the little graphing panel, together with the striped scale, if you want, by hitting the "N" button (i.e. toggle off that striped scale.)
In the shot, I have also requested the exact frequency and relative power corresponding to the point of the lowest yellow bar (by middle-mouse-button clicking on it--that made all the bars pop-up at the overtone positions as an added feature). Anyway, the power level is 127.2 dB (which is relative to all other powers you will get for the same recording).
The recording happens to be the bit of the beginning of Judy Collins singing "Suzanne" in they key of F#. (I was able to determine this key using other features of the software, though of course a person with a good ear might be able to do it without any software!.) Anyway, the time cursor is over her first few notes of the word "Suzanne", which is at the dominant of the key.
Screenshot 2 (Above) Over Time View of Peaks Only This is an over-time view of 6 seconds of music, showing "peaks" rather than all power. (Note: the program gives you the option of displaying "all-power", "peaks", or both.)
The bit of music is in the key of c minor (4 violins, a Beethoven String Quartet), and I've set the Spectratune plus software in playback to overlay the note guides for that key. The gray horizonal lines plotted are at spectrogram peaks, and identify each fundamental (=note played) and overtone. These lines show fundamentals and overtones all mixed together. (Some single lines at a single time will correspond to both a note played from one or more instruments and overtones of one or more notes from other instruments.) In this case, I have set the program so that the width of those lines is proportional to the power, in decibels. (The program has other options: constant width with varying darkness, as in some examples below.)
As in the prior screen shot, the vertically stacked arrows are potential-overtone-markers.
Incidentally, note the pattern in the overtone spacing stacks, showing, as they teach in music theory, that the first overtone is an octave up, second a fifth above that, etc.
The little gray plot above the little black plot down by the bottom is something that I took out of the current program, but still in the program, below the gray in black is total power of the signal over all frequencies, to help gauge total volume, etc. visually.
Screenshot 2b (Above) adding Octave-Overlaid View: This is a part of the same piece of music as in prior screenshot (2).
One thing added, in my settings of the SpectratunePlus, is that, on the bottom, all octaves overlaid together. The idea is to give some guidance about key and chords. However, I have also set it to not overlay anything more than 2 octaves below A220 or more than 3 octaves above A220. In particular, this gets rid of high octaves, which have lots of overtones grouped together all over the place, and keeps you in the zone of chordally relevant frequencies. (This setting limiting the overlaid octaves, is what causes the yellow area on the top non-overlaid portion -- it's a reminder that the overlay is restricted in octave range.)
Also shown: I middle-mouse-button clicked on a portion of the display, and got actual text values of power. The display also shows raw power at the first approximately 20 overtone positions (labelled "Raw dB H2 on up"). (Note" Some of the information you get relates to a feature that I "deprecated" because it doesn't work so well -- automatic classification of fundamentals vs overtones, but you do get exact frequency and power at that frequency, which are not deprecated and work well. Incidentally, the "deprecated" fundamentals vs overtones classification, though not in any screenshots on this page, is still in the program, but you won't get it unless you change the program settings from the default. Though the general picking out of fundamentals vs. overtones is an impossible problem, with the best rates in current research in the IEEE literature being maybe 70% to 80% accuracy, my approach was based on too simplistic a model of overtone vs fundamental powers, and has a worse general accuracy rate than the stuff in the recent literature, so I "deprecated" it. Those who, despite its limits, wish to play around with it: you get it by setting certain checkboxes in section (M2), and monkeying with parameters in section (V). At the moment, I am thinking it still may have a role in key and chord detection despite not working terribly well at what it was directly designed for.)
Screenshots 2b(ii-iii) (Above and the narrower panel below): Adding additional Chord- and Key- help panels:.
The octave-overlaid view alone, as in prior screenshot (2b), or this screenshot (2b-ii), when used diligently, can give some pretty good guidance as to the chords and key of the piece of music you are looking at. (This is so even though ultimately chords are defined by music people to depend on notes played (=fundamentals) and not all the notes and overtones mixed together. This is because if you restrict to the more powerful overtones, which are usually just the first few overtones of each played note, you are adding overtones that are mostly in the same pitch class as the fundamental, plus maybe a major fifth above the fundamental. Thus, the octave overlaid view of the more powerful fundamental and overtones should usually give a good idea of the actual chords played, even though you could perhaps do a little better figuring out chords by fishing through the non-octave-overlaid spectrogram and trying to figure out exactly what pitch-classes are fundamentals.
So, accepting that you get some idea of chords from the octave-overlaid view of the most powerful peaks of power, it still is not always that quick to figure out key and chords from that octave-overlaid view. Thus, I have put in something that I originally did for MIDI sounds. (Acknowledgement: based on important ideas that I picked up reading David Temperley's book The Cognition of Basic Musical Structures; also note that my MIDI software for this method can be found here.)
The colorful narrower panel on the right of the above screenshot is brought into existence and turned on by clicking, in area (R) of the SpectratunePlus control panel, "Chord/key assist panel when OTW" and also checking "Chordal". There is a plot for each of the 24 potential keys. There are two plots against each note letter: the top one is for major, the bottom one is for minor. What it does is tell you, in each of the 24 major and minor potential keys, how many pitch-classes (= the 12 octave-overlaid note-pitches) which are present in high power in the peaks of the spectrogram, are part of the "best-fit" (= "using the most high-power-present-pitch-classes") chord in that potential key. (The scale giving the number of pitch-classes is those little horizontal lines on the left of the panel.) The "function" (based on chord root pitch relative to the key) is as follows: Tonic chords (I or i -- major and minor chords are colored the same way) are that rich darker green as in the G major shown, whereas the traditional I/i "functional" substitute, VI/vi, is a paler green (existing a little in the line for G major in the shot). The dominant function V/v is given by a bold red, while the substitute, VII/vii is a paler red. IV/iv is given by a bold yellow, while the substitute II/ii is a paler yellow. III/iii is orange.
For each of the 24 potential keys, cross-hatching, and/or one way diagonal, fills up the distance from the number of notes in the best-fit chord for that key, to the number of notes in the best-fit chord for the key whose best-fit chord has the most notes. (Detail: In each of the 24 potential key lines, I use one-way diagonal to the extent that there are other in-scale pitches in that key, and cross hatching for the rest. The detail can probably be ignored, but my idea was that diagonals indicated an in-scale "non-harmonic" tone.)
What this all adds up to is that, in the true key, you expect less cross-hatching and diagonals, as well as chord-patterns (represented by colors) characteristic of Western music. You expect a lot of richer red and darker green, as there is in the G major line in the shot. (Tomato red and tomato-stem green.) G major fits better than G minor because minor has more cross-hatching. (Thus, yes, G major is the key of this part of some music by Mozart that I used to make the screenshot.)
Note that I've had pretty good success using this to quickly read of keys of recordings of songs: they basic idea is that, in the introduction, there is a lot of dark green: a I chord to get everyone's singing and sense of the key going.
Also note that, as you are exploring for potential keys, you can express-change the software configuration to any key by left-clicking, on the chord/key assistance panel, a spot over a key. (This is a quicker alternative to using the Control Panel functions in section (F). This will change the key-scale on the over-time-window plots to let you assess they key fit by additional means.)
Once you have the key, you can also get some idea as to actual chords and chord functions from the colors. (Also, note, I actually try to compute inversion number relative to detected base note: those are the little white horizontal lines that interrupt the bars when the inversion is not 0 -- i.e. not root position.) (Of course, the harmonic analysis that the method tries to do -- even in the MIDI version where the actual notes are known -- is incomplete, in the sense that you have to look at the actual notes to figure out if a non-harmonic tone is a passing tone or whatever; also other aspects of harmonic analysis are left out.)
There is also a somewhat helpful alternate version of the chord/key assistance panel I added for key guidance only, not chords, which is the mustard-yellow and black panel just below. You get this when, in area (R) of the SpectratunePlus control panel, "Chord/key assist panel when OTW" is checked and also "Chordal" is UNchecked. (Or left click below the section of 24 potential key-lines on the panel to toggle from one version to the other.) The mustard height, within the total mustard and black range, is the proportion of notes that are in the key scale. In the shot, from the same exact bit of music as in the prior screenshot, it is apparent that G major fits best. (Note 1: sometimes it is best to look separately at only-major-key and only-minor-key fits on this alternate-version chord/key assistance panel, as the best fit visually may stand out better that way. You can cycle between both major and minor, just major, and just minor if you click the right mouse button over the bottom part of this panel.) (Note 2: The key is not always that clear from this alternate version of the chord/key assistance panel -- usually the colorful key and chord plot from the prior shot is more helpful.) (Detail: what particular of the 3 minor scales the minor plots are determined from in this plot depends on the checked option in (F) on the control panel. For the colorful chord/key version, the plot doesn't depend on that checked minor version-- the chords for all 3 minor scales are all tried for the one using the most pitch-classes. There is a technical reason I did them with this difference, but never mind.)
Another thing I should point out about the octave-overlaid plot shown in the above screenshot. It shows only the peaks that are within 15 dB of the peak power at each instant. In the newer versions of the software, this is the default for the octave-overlaid view (which you get when, in area (M2), "sig. funds + rest sig. on ovld" is checked. The fact that the plot has both blue and orange has to do with some experimenting I was doing and is unimportant -- both colors represent peaks within 15 dB of the peak. (However, the 15 dB can be adjusted to a different number of dB can be controlled by the slider in (V)(c) "signif. pwr: > db bel. mx", and varying this may create a clearer view of chords and keys in the overlay, and also in the "chord/key assistance window" plots as shown, which are affected by that parameter. (Also note: if the musicians' tuning is not to A=220 hz, as is the SpectratunePlus default, it can affect the chord/key assistance panel analysis. Thus, if you note a lower or higher tuning by the peaks showing a little off the center of the scales on the Over-Time Panel, you should adjust it (section (D) of the Control Panel).
Screenshot 2c (Above) Adding Sing-Along Pitch Feedback:).
The music is Pete Seeger doing Where Have All The Flowers Gone. This is a very simple arrangement, being just a banjo and Pete singing somewhat louder than the banjo, so it's quite easy to pick out the fundamental of the singing. The red diamonds indicate the sung-along pitch, and I am singing along matching pitch with Pete. (The diamonds appear both over the exact bit of music being played back, on the right, as well as on the left, near the scales, for use depending on whether you are interested in knowing your sung pitch, or in knowing where you are relative to the sound in the spectrogram.)
In this shot, the top, non-overlaid plot shows power (I am back in "all power" mode for the top) in shades of grey. I have tossed in an octave-overlaid portion for "peaks-only" overlays octaves in the default range of 3 octaves below A220 to 3 octaves above A220, giving some (imperfect) idea of chords. The software and is set to make line width proportional to power of each overlaid fundamental or overtone.
Pete and I both are singing a D#, the third scale note in the music's key of B.
Screenshot 2c2 (Above) Adding At-Play-Position Note-Scale Ruler and Left to Right Indication for Stereo Recordings:. (The two features shown were adding in the 2014.07.31 release of the software.)
New feature #1: There is now an optional at-play-position note-scale ruler, whenever the all-power display is on. This which is a note scale like the slidable ruler. However, it follows the play position cursor, moving along as the spectrogram and .wav sounds play back together. My idea is to help you read off in-scale notes as the music is playing, or as it is playing and you are also singing along. (In the screenshot above, the slideable ruler is to the left. The music is being played back, and the at-play-position note-scale ruler is towards the right, and is moving along as the music plays back. I am singing along, and I have on the pitch-feedback diamonds, to indicate my sung pitch.)
New feature #2: It is now an option, when you make the .spf file, to have spectrograms of both left and right channels included. (To do this, when you make the .spf, before you hit "Start", check the "st" box in section (K) of the control panel.) If you make a stereo .spf, whenever you play it back, you then have the option of, in the "all power" display, displaying either L, R, or center. (Here I have center.) However, also, on any all-peaks display, you can get a color coding based on where the peak is, left to right. ("R"ight is "R"ed, left is green, center is gray, with brighter red and green being more extremely to a side.) These are the red and green lines above.
In the above screenshot, Peter, Paul, and Mary are singing "Puff the Magic Dragon" in A major. I have the all power-display for center (which is shades of gray), and, in addition, I have narrow peaks showing color coded left to right for the peaks. I am singing along with Peter, who is sharply to the right in the recording, and is singing a tonic. I am also singing the tonic, shown by the diamond. At his moment, Mary, towards the left in the recording is humming a harmony, at the third of the next higher octave.
Screenshot 2c3 (Above). This is an all-power (no peaks) view of a Stereo recording with widely separated sound sources ("Lucy in the Sky With Diamonds"). I have set the all power display to be stereo, which codes left as green, right as red, and center as blue. You get this kind of button by hitting the "L-R.." checkbutton in section M1. In only works if you "made a stereo .spf" (checked "st" in section J). It also is not a particularly effective display if you don't have instruments fairly widely spaced from left to right, which is why I went right to the Beatles to get an example where it works well.
This short video shows a few seconds around the screenshot, with sound.[.mp4]:. (Note the clip was set in the SpectratunePlus to autorepeat a small section, and that is why it jumps back in time at one point.)
Screenshot 2d (Above):. (A pure sine = pure tone = what a fundamental or overtone really is science-wise).
I used the (free, by the way) Audacity recorder/sound editing tool/plus more to make a recording of a 440hz sine wave (click here and you should hear this very simple therefore musically boring sound). My SpectratunePlus gives what is pictured above, showing something very simple, simpler than the complicated music, and even simpler than simple music. In fact, there is just one power clump around A440. (The gray/black shades are spectrogram power proportional to darkness, and I have also set the SpectratunePlus to put a white line at peaks of power, and in this simple case, there is just one peak. The single panel shown is just the non-octave-overlaid panel.)
Fundamentals and Overtones are Sines: It would be irresponsible of me not to interject some appropriate science here (which will run until the next screenshot): the sound-pressure-level functions of time that are referred to, when scientists, engineers, or musicians talk about fundamental pitches and overtones are really mathematical sines. (The mathematical sine is really a single very exact shape, graphed below left, and, recall from school, defined geometrically (as below right) as y-coordinates of points going around the circumference of a unit circle centered at the origin. Anyway, all sines, of any frequency, are just this same single exact shape either squished or stretched vertically, horizontally, or moved left or right or up or down.)
Now, the main reason sines are so important for music is that A: the ear picks up sounds via vibration on its long basilar membrane, which is inside the cochlea of the ear. (A set of thousands of nerves along the basliar membrane sends vibration levels at each section further along to the brain, for further processing and whatever yielding the final amazing, and certainly not well-understood, result of perceived sound.) The basilar membrane has a stiffness and thickness which varies along its length. The elasticity and thickness variations yield, substantially via basic Newtonian physics, "resonances" such that: one section along the membrane vibrates if the membrane is fed a sine of around a certain frequency, and another section will vibrate if it is fed a sine of a different frequency. Each little section of the basilar membrane vibrates only when the membrane, via the rest of the ear, receives sines very near a precise frequency, with the frequency of the vibration-stimulating sine going up as you go in one direction along the membrane. (And, further, combined vibration of the basilar membrane sections when there are sines of different frequencies in the air are at the sections and vibration-levels corresponding to each individual sine.
Technical detail, unfortunately spoiling the simplicity: I've kind of carefully chosen my words "substantially via basic Newtonian physics" in describing what causes the basilar membrane to respond to sines of different frequencies with peak vibration at different points along the membrane. Indeed, the basilar membrane, as a piece of inactive elastic material with its arrangement of varying width and stiffness, would respond to sines of different frequencies with peak vibration at different points along the membrane. But it is known that the vibration magnitudes of such a piece of inactive elastic would be lower than they actually are, and that certain "outer hair cells" in the right along the basilar membrane actually electromechanically amplify the motion. The exact details of how this happens are not known (2014), but the long and short of it is that, pretty closely, sines of different frequencies excite different sections along the basilar membrane, even after the electromechanical amplification, and that further, these vibration amounts all along the membrane are passed along to the brain via nerves. I have a few links in the links section to sites about the ear and the basilar membrane that discuss the electromechanical detail a bit. )
Now, for reasons of physics, B: all the pitch-generating instruments that man was able to devise over the millennia (pre-electronic and digital) and including the human voice, generate single notes (e.g. pluck one string, hit one piano key) whose sound-pressure (i.e. air pressure exactly like from the barometer except variations tiny compared to what a barometer can detect) functions are sines of a frequency (fundamental) added to sines of double the fundamental frequency (first overtone) added to triple (second overtone), etc., with the height of the overtone sines getting negligibly small, or stopping completely, at high enough overtones (maybe 10th overtone, depending on the instrument and pitch within the instrument and also how it is played). The separate sines that the ear picks up from a single instrument are received separately by the ear along the basilar membrane and transmitted to the brain that way, and any putting together into single sounds is done by the brain. Further, when multiple instruments produce sounds, the brain puts them together in a way that the relationship of all the sines has an effect on the perception of harmony, dissonance, timbre, etc.
I need to add to the above that C: spectrograms generally, and particularly all of the spectrograms produced by my software, detect sines. (That is, when they get a sine of a certain frequency within a mix of many frequencies, the part of the spectrogram corresponding to that sine shows high power, in proportion to how big that particular frequency sine component is.) And, especially in the case of my software, spectrograms pick them up in a "resonance-zone, non-discrete" way like the ear does it. That is, if the ear gets a 440 cycle-per-second pure sine, it won't just vibrate in an exact spot, but it will vibrate in areas around an exact spot, with the peak vibration at an exact spot. And, like the ear, the spectrogram in the above screenshot (vibration level in shades of black/gray) is picking up peak energy at the note A above middle C at 440hz, but it also shows energy, tapering off, when you are close to, but move away from, 440hz. Now, the white horizontal line is the actual place of peak power, which is at almost exactly 440hz in the spectrogram above. (The print reveals a tiny .2 hz difference at the point where I clicked to ask for exact frequency -- this is due to my interpolation techniques and possibly what is called in computer science term "rounding and discretization errors", but this difference is acoustically tiny and beyond what the ear can detect -- it's only 8 one thousands of a half-step, i.e. less than one musical "cent".)
Some electronically-generated, and particularly computer-generated sounds, are completely liberated from the non- electronic/digital physics laws that say a sound is always a sine, plus a sine of double that frequency, triple that frequency, on up. ("Double" can be freely moved to "11.45 times" or whatever you choose, etc.) That is, the spacings on my spectrograms (either in angle on the spiral, or vertical distances on the over-time view) do not have the same regularities on these post-acoustical-instrument free-form sounds. (In my software and the screenshots shown, the markers are all based on these regularities, which is perfectly appropriate, as all the sounds I looked at are generated by standard instruments, which obey these regularities.) Note 1: most electronic instruments (organ, electric guitar, etc.) are designed to obey these regularities, so that they give sounds in-line with traditional instruments. Note 2: If you want to experiment with the new types of sounds that you get by freeing up these restrictions, you can use my versatile mutiple-tone generator (here), as well as other sound generation/synthesis software that may be around.
There is a further reason that engineers and psychoacousticians always break down sounds into sums of sines besides just that the sines are what the basilar membrane in the ear can find and sends information about to the brain. D: When a sound that is a sum of sines goes from one place to another, through or using something physical thing (like a duct, the walls of a room, or the body of a violin), to a close approximation that is quite accurate except in unusual physical situations, each sine will stay a sine of the original frequency, but will be multiplied in magnitude, and possibly be shifted in phase. Further, for a fixed physical entity that the sound is going through (particular duct, particular violin body, etc.), the multiplication in magnitude will depend on the frequency of the sine, but, for each fixed frequency, is the same whatever the sound. Thus, the walls of a particular room may reflect more of higher frequency fundamental and overtones than lower frequency fundamentals and overtones, but for all musical sounds, that tendency will be the same, those particular walls tending to give the sound a "brighter" character. Further, the reflection will not change the particular tones and overtones present--just their magnitudes(=volumes). (This last sentence means it will not change the pitch: good thing, what chaos there would be if you played a sound and reflecting from the walls changed the pitch!) Similarly, the body of a violin will take the sines from the vibrating string, and adjust its sound without changing any frequencies of the fundamental or overtones, but making some louder relative to others. When you switch to a sound going through the body of a cello from that of a violin, fundamental and overtone frequencies are also preserved, but the multiples of magnitude that happen to each frequency change. (Going to the cello, for lower freqeuncies, we expect bigger multiple than with the violin body.)
Summing up the reasons fundamental or overtone always means "sine" and why spectrograms are the tool of choice to examine these:
Detail: Why not break sounds down into square waves or triangle waves? You might try looking at a musical sound as a sum of triangle waves or square waves of different frequencies, and in particular you could devise some kind of special spectrogram software that breaks down sound into component triangle or square waves. But, the problem is, (A) and (D) above would fail if you replace there "sine" by either "square" or "triangle". A square wave of a single frequency will excite the basilar membrane in the ear at multiple points, not one point. And it will change from being a square wave when passed through an environment, instrument resonator, etc. Thus, a triangle- or square- wave spectrogram wouldn't be terribly useful either for understanding music or for studying the propagation of sound in acoustical environments.
Watch out: don't assume the Fundamental is the strongest tone/overtone from a played note--Overtones may be stronger. Often, the fundamental is strongest, but there are many gross exceptions (such as very low notes on the piano, where the explanation is that the fundamental from the string doesn't get carried through by the sound board resonator, but higher overtones do). However, it isn't complete chaos: for a given instrument, playing attack, and importantly: particular note played, the overtone profile tends to be similar each time. If you monkey around with music spectrograms for a while, you pick up these patterns, and often you can problem-solve enough to even decide exactly what notes must have been played to produce the spectrogram.
Detail: sine is a single exact shape, but the engineering and physics of the ear, instruments, etc. work out so that you don't get from instruments exact sines or sums of sines -- they are things just a little off from sines and sums of sides, and certainly, they don't, like a real sine, go on forever-- they are truncated. But it's O.K. -- the ear on the sections of its basilar membrane will detect things that are shaped a little off from exact sines. It also tends to work when the sines do not go on forever but are truncated (as they are in real music with notes of, of course, finite duration), as long as there are at at least enough cycles present. So the mathematically absolutely exact single sine function shape can be a little off in real world fundamentals and overtones, but musical instruments still generate sums of sines of a fundamental frequency and multiples of that fundamental frequency pretty closely, and these approximate sines are what really are fundamentals and overtones in the real world, and these approximate fundamentals and overtones are in most cases close enough to exact sines to be detected just like the exact sine would be in the ear.) (Mathematics/physics calculations about ears and musical instruments, using the real world approximate sines rather than pure sines, and using pretty accurate knowledge of the physics of materials, can be done carefully to give pretty exact results -- as accurate as the knowledge of the physical behavior of the materials incorporated in the analysis. Often this analysis has gobs of calculations and so is done on the computer. On the other hand, some people, notoriously sometimes engineers, get sloppy and assume they are getting exact sines out of instruments, and that may work well enough, but I just thought, for the detail-oriented complete-truth seekers, I should make the point that actually, the fundamentals and overtones in music are inexact sines. But they are close enough to exact sines so that the vibration on the basilar membrane corresponds closely to the frequencies of these inexact sines.
For final completeness, let me toss in some details that may cause more confusion than it's worth: the varying sine-frequency sensitivity along the basilar membrane, is mostly due to basic mechanical physics of an elastic basilar membrane being moved back and forth by sound pressure, but it is also known that there is some kind of electro-mechanical vibration amplification going on around the membrane, not well understood, that may give some special ability to detect very low-intensity sounds. Further, there seem to be some feedback nerves that go from the brain to the ear's hardware near the basilar membrane, that might command the ear's electromechanical hardware to adjust Q (the tuning-sharpness vs. time reaction speed tradeoff). Finally, for low frequencies only, it may be that the brain gets the pitch in part not just by looking at the magnitude of the basilar membrane vibrations all along the membrane, but the brain may actually count vibrations at some points along the membrane in some way. (In any case, these details do not change the point that the spectrogram, in particular those with the default SpectratunePlus Q setting, gives a pretty good representation of what the brain gets from the ear. And in particular, psychological studies and musical experience confirm that the ear breaks down frequencies into sines, closely if not exactly, and sends that information onto the brain, even if the mechanical-only basilar membrane model, which implies sines are the relevant commodity, are not quite the full story of what happens in the electromechanics of the ear.)
Screenshot 2e (Above):. A single note (the E almost 2 octaves below middle C) played on a piano, from one of the free sound-sample libraries that I have listed later on this page. I have set the program to show "all power" in gray shades, without adding explicit lines for peaks, though you can of course make out the peaks. The program has printed the relative-to-fundamental powers of all of the detected overtones as "Raw db Ov1. on up". I have also placed the programs markers of positions where all overtones should be over the fundamental. (Acoustical physics details: for various reasons, sometime overtone positions can vary a few percent from simple "predicted" frequency multiples of the fundamental.)
Screenshot 2f (Above):. A single note (the A just below middle C--that is A220hz) played on a clarinet, from one of the free sound-sample libraries that I have listed later on this page. I have set the program to show "all power" in gray shades, without adding explicit lines for peaks, though you can of course make out the peaks. The program has printed the relative-to-fundamental powers of all of the detected overtones as "Raw db Ov1. on up". Note that the early odd overtones (i.e. even multiples of the fundamental frequency) have little or no power. This is a known property of the clarinet--a "closed pipe" (see UNSW clarinet page), and may be responsible for its "funky" sound. (Don't quote me on this last bit about it causing the funky sound until I do some more experiments.)
Screenshot 2g (Above):. This is a drumroll. Shown is the middle to end of the drumroll, where the last time the drum is hit it is hit harder. (I have shown powers at all frequencies in shades of gray -- without showing any peaks.)
I think the drum is a snare drum. In any case, it is a non-pitched instrument, thus, every time the drum is hit, the energy is distributed over a continuous range of pitches, rather than showing the tight concentrations around notes and its overtones, as for a pitched instrument. There is still some concentration at certain pitches, though, due to various resonances.
Screenshot 3 (Above): Adding Spiral Single-Time View to Accompany Over Time View:. This shows the Over-Time view and the Spiral View together as I play through the .wav and .spf from a segment of Judy Collins singing Turn, Turn, Turn (voice plus guitar). The spiral, which is by default turned OFF when you replay a .wav + created .spf, can be turned on if you uncheck "Skip Spiral Window for .spf replay" in section C of the SpectratunePlus Control Panel. (I have also specifically chosen to show only peaks of the spectrogram on the spiral, and only those within 30 dB of the maximum power at each time instant, with the other settings shown in section C of the Control Panel in this screenshot. Also note that the "Gain--Device 1" and "Dynamic Range" sliders on the Spiral Display Panel itself affect this display and may need to be adjusted, although the other controls on that panel are inactive (in "non-live"=".wav+.spf replay" mode).)
The purpose of having the spiral, in replay mode, is that for cetain tasks it is easier to organize visually in terms of trying to replicate with your brain's visual processing section the pattern-matching that goes on in your brain's sound-processing section. (The latter, as discussed somewhere above, receives data about the vibrations along the various sections of the basilar membrane that is similar to what's displayed in either of the spiral or over-time spectrograms. However, with the spiral spectrogram, power-of-2 harmonics run on rays outward on the spiral, and other overtones are at fixed particular angles.) My idea is to let people (including myself!) see if they can develop a knack for picking up tonality, chords, local tone sets, sound complexity, tonal complexity (a la Tymoczko although here the data is spectrogram--more complicated than Tymoczko's "music notes"=fundamentals), perhaps also candidate virtual pitches (a la Terhardt).
In the screenshot, besides the spiral, I have, on the over-time display, on the non-overlaid section, all-power (no peaks added). The overlaid section has peaks, with width proportional to power. I have played through bits of the music, and am paused (shown on the over-time panel) where the vertical line "time cursor" is. (This is a section without Judy singing, though you can clearly see her singing a tad earlier where her squiggly vibrato is.) The paused instant with the time cursor is also displayed on the spiral (which can of course show only one instant of music). The SpectratunePlus is set to show everything relative to the actual key of the song, which is G#. I have also manually marked, with the SpectratunePlus markers, little yellow dots identifying the fundamentals and overtones that I can find (the lowest dot in each set is a fundamental--this automatically yields a red arrowhead on the octave-overlaid display). Anyway, though we could have an extra note hiding in an overtone of one of the ones I've identified, what seems to be playing at this instant is a tonic triad. The pattern on the spiral corresponds to this tonic triad, and a musically-trained brain will, in response to signals from the cochlea about basilar membrane activity, say "ah, a tonic triad" and a less musically trained one will say "ah, time to relax". (As you step through the music, in this setting with the spiral on, you can try to see if you can make visual correlations with other chords, tonal features, etc. You can of course also do this with the non-spiral over-time spectrogram, where overtone number corresponds not to fixed angles but rather fixed distances, but this alternative may help for certain things.)
As expected, the patterns on the spiral are simpler on simpler music, and get more complex when tonality is relaxed or eliminated. Bartok is fun to look at on the spiral.
Screenshot 4 placeholder. It has been removed due to redundancy.
Screenshot 5 placeholder. It has been removed due to redundancy.
Screenshot 6 (Above): Adding Combined "All-Power" and "Peaks" Display on Top
A bit of the first movement of a recording of Mozart's String Quartet # 14 in G. . The upper part shows all powers in black/gray/to white (black higher power), with the peaks drawn in orange to white (white higher power). In the lower panel, peaks are overlaid over octaves in gray to red (darker gray higher power, red highest powers), giving some chord and tonality information.
Screenshot 7 (Above)
A bit of "Both Sides Now" sung by Joni, showing peaks-only mode (harmonics and overtones both), with displayed thickness according to intensity.
Screenshot 8 Placeholder: It has been removed, since I found it redundant.
Screenshot 9A (Above)
This is a bit of the final (rondo) movement of a recording by Alfred Brendel of the Beethoven Waldstein Sonata. I am showing peaks of the spectrogram, and thus this is for fundamentals and overtones all mixed in. I have set higher power levels to be represented by higher widths. (Aside: since at least for non-low notes, piano has relatively few significant-power overtones, transcribing from a spectrogram is an easier task for piano than for overtone-rich instruments.) The all-octaves-overlaid version (on the bottom) sort of gives chord information. The catch is that overtones are mixed in, though I have set frequencies more than 3 octaves above A220 not to be overlaid to reduce this problem. (Aside: the overlay suppression is what the yellow on the top is actually indicating.) Thus note, though some of the overtones shown on the overlay may represent an overtone with no fundamental, note that first overtone corresponds to the same note as the fundamental, and otherwise, overtones beyond the first that are high in power at least contribute to a perceptual chordal sound quality, though not a literal music-theory chord.
By the way, I think I have the Spectratune Plus set to the correct key for this part of the movement, C major,
Screenshot 9B (Above)
All-power corresponding to section of music in 9A.
Screenshot 10 (Above)
This is a nice way to set the program. The non-overlaid view on top (shades of gray) is all powers without any peaks added. (As above, this is kind of the purest view corresponding to data on the ear's basilar membrane, viewable without obstruction by the peaks. To find via text or hear notes corresponding to a blob of higher energy, you can still click on the blob to get the midi sound and text note).
However, below, the overlaid view shows peaks as lines (through the the 2nd octave above A220), with width proportional to power. This gives you, roughly, chord information. Also, often you can use it to read off the note of a blob on the top non-overlaid view without having to click on the blob.
The piece of music is a recording of Beethoven's String Quartet op 18 # 4, 3rd movement, very end. The key of that movement is c minor, and note that you can see the final c minor chord easily at the very end. (I have used the program to put put 3 overtone-spacing stacks on the final note, showing the 3 deducible fundamentals, which happen to be the c minor chord tones. If you check the score, these are in that final chord. (There are also, in the score, 2 other fundamentals which are octaves above these deducible fundamentals. These latter 2, of course, are exactly mixed in with overtones of the first 3.)
To get this "combined" view, that is, all power without peaks on top, and peaks on the bottom, set, in section (M), "show all power (OT)" to on, and "Show note fit or Peaks (as below, OTW)" to off, with, in (Q), "Center-of-Stripe Note Center + Overlaid Pitch" chosen. With these settings, since "show all power (OT)" is a non-overlaid-only feature, I have the program ignore the "off" setting for "Show note fit or Peaks (as below, OTW)" on the overlaid pitch section so that you get some useful information on that section.
Screenshot 11 (Above) SpectratunePlus Control Panel(The Contol Panel is always around, as well as the Main Spiral Panel (screenshot (1)). The Over-Time Panel (screenshot (1b), (2), etc.) pops up automatically whenever an .spf+.wav is being played back.
For Live Mode, about half of the controls are on the spiral panel, but a few affecting Live Mode are also on the Control Panel. A few of the controls on the Control Panel affect making of an .spf. However, most of the controls affect playing back an .spf + .wav file. Fortunately, you don't need to use most of the controls, and can leave them at their default values, so using the program is really not that complicated.
Screenshot 12a (Above)
Showing some complex orchestral music with peak detection high sensitivity
-- control panel section T parameters:
Peak needs db up = 0
(at least once above and below)
over hsteps = 1.0 (note: over-hsteps has no effect when the prior dB up parameter is 0)
Min absolute dB for peak (OT Only)= 130
(Non-overlaid-portion: All power in gray shades plus peaks proportional to width
octave-overlaid portion: only peaks proportional to width)
Screenshot 12b (Above)
Showing same exact music with peak detection lowered sensitivity, reducing undesired "snowlike" quirky peaks, but perhaps missing some musically relevant low-energy peaks, overtone peaks, etc.
--control panel section T parameters:
Peak needs db up = 6.5
(at least once above and below)
over hsteps = 1.0
Min absolute dB for peak (OT Only)= 130
Screenshot 12c (Above): Speech
"Brown fox jumped over the lazy dog".
Note also, (affecting only sinusoidal resysnthesis if you select that) my sinusoidal resynthesis range is set to a range of about 5 half steps around the fundamental of that bit of speech. (That's what the little triangles on the left are showing.)
Screenshot 12d (Above): "virtual pitch" inspired spectrogram on main window
This spectrogram display alternative is for the main window, that is, the one that displays a spiral. As with the spiral, you can get it for live sounds, and also when you replay an .spf+.wav.
It was inspired by a "virtual pitch" idea, due to Terhardt some years back, where basically the theory is that a sound will sound like a given pitch if it has a lot of overtones, particularly low overtones, associated with that pitch. To help you identify such pitches, the bottom row of the spectrogram shows the spectrogram (note-scaled) at true frequency. One row up shows the same note-scaled spectrogram based on half actual frequency (i.e. a single note will have first overtone over where the fundamental is in the first row). The next row up shows the note-scaled spectrogram based on 1/3 of actual frequency (i.e. a single note will have 2nd overtone over where the fundamental is in the first row).
Thus, the idea of doing the spectrogram this way is that virtual pitches should correspond to where you have overtones lined up vertically, especially in the lower rows.
Thus, in this display, which was of a single note sung by me, if you know the software was set with context key C, you can see that the sung note was 4 half-steps above C, that is E. (You could pick this off easily from the spiral display, but the idea is, in more complicated situations with various notes going on a once, you can attempt to pick off virtual pitches, which, according to the virtual pitch idea, are what a musical human ear would likely perceive to be notes present.)
To see references on Virtual Pitch, search for it and "Terhardt" together. There is also a short article in Wikipedia on it.
(By the way, my yellow, as in my spirals, is a just a warning device showing the start of the point where, because of chosen sample rate and undersample, the plotted spectrograms may have inaccurate frequency due to "aliasing".) In this case, you can see all spectrogram points are a few octaves below this, so there is nothing to worry about.)
Screenshot 13 (Above): What's this?
If you get this red screen on the over time window while replaying a set of .spf and.wav files, firstly, make sure you didn't forget to hit "Start" in section (J) of the Control Panel after selecting the .spf and .wav files, to actually start the replay. If you get this while replaying, it means that the program is working on stage 2 data (peak fit).
[When you create an .spf file, it contains power-at-each-frequency-bin data for each little bit of sound. But, much of what is displayed in the over-time window also needs peak-determination processing, which is not in the .spf file because I want it to be adjustable at playback time. (You can adjust these parameters in section (T) of the control panel at playback time, and they affect which peaks in the spectral power are actually used in the display--ignoring certain tiny peaks reduces the sensitivity to tiny little noises and meaningless tiny spectral variations. Also, actually though I have deprecated "note fit"=fundamental-finding" in the software, and none of the screenshots show note fit, note-fit processing is also done at stage 2.)]
You get the red screen when the program can't display peaks for the bit of music you are trying to display. It may happen when you first try to replay an .spf+.wav pair, or if you move to a part of the recording where the peaks and notefits are not yet determined, or if you adjust parameters in control-panel sections (T) or (V), which makes the program need to do recalculation of peaks and note-fits. Usually, the red screen goes away in a few seconds by itself, when the calculations complete. If you have trouble, try moving the play position, either with the right mouse button, and/or with the controls in section (K).
Light technical notes: The stuff I have been doing with music spectrograms started a bit over 10 years ago. Since I was not living near a good university library at the time, I proceeded using my own simple ideas, ignoring the vast literature (IEEE, Acoustical Society of America, etc.) related to speech analysis, speech synthesis, music transcription, and music pitch/harmonic modification. I had read about the position-dependent resonances along the basilar membrane caused by the physical variations over its length (whose energy levels act as the main intermediate data to the brain for hearing), and my spectrograms are based on very simple physics to capture, very roughly, those resonances.
Now that I live near a good university library, and now that internet access to certain needed research paper sources has become cheap and sometimes free, examination of some of the research papers and books has led me to see that I probably have done O.K. even though not working connected to the literature.
My precise spectrum method used in the SpectratunePlus is to use a battery of standard-model damped-mass-springs, with spring parameters set to give constant Q, to simulate the basilar membrane vibration levels. This, of course, does NOT simulate the basilar membrane exactly, but it is a commonly used model and should be close.
(For those not familiar with the mass-spring model, each mass-spring, which simulates a small little bit of the basilar membrane, looks like: (In a gravity-free environment, of course!). The sound vibration (change in air pressure) puts an alternating leftward and rightward force on the mass. The resulting motion builds up to a very large amount if and only if the spring tension is just right so that the mass moves a little, springs back and forth, and does the springing back and forth so that it is in synch with the continuing back and forth of the sound pressure, that is, so that the sound pressure is always going in the same direction as the building up motion. If this happens, we have resonance.)
I think my algorithm is fairly efficient programming method, even though not the more common FFT or an FFT-related wavelet transform. (I think the filters will turn out to be a relatively simple gammatone ones, when I get around to verifying that.)
The method used in the SpectratunePlus software is equivalently to my description above a filterbank method, but not the one used in the Schoerkhuber/Klapuri constant Q toolbox: Matlab toolbox and examples of their spectrograms here. I have not examined how my method compares in time for computation to theirs. Their resultant spectrograms seem to be similar looking to mine.