Skytopia > Projects > Technology/science/misc articles > MIDI spec crash course

A crash course on the standard MIDI specification

This guide is meant as a quick start to programming and manipulating raw MIDI data (at the byte level). It is by no means exhaustive, but concentrates on the structure of the MIDI (SMF) format and some of the main commands, so it should help you learn the basics very quickly! Using the knowledge contained on this page, you'll be well on your way to creating a program like my MIDI Transform applet. For more details and a complete command reference, you should visit the links throughout and at the bottom of the page.


Basic MIDI byte layout:

There are three types of MIDI, Type-0, Type-1 and Type-2. Type 2 is very rare though, so we'll concentrate on 0 and 1. Both Type-0 and Type-1 only differ in the way the data is stored, so anything that can be heard in Type-1 can also be heard in Type-0. Consequently, you can even convert between the types with no loss of data or precision.
Type 0 is where all the data is put into only one 'Track'. A Track (not to be confused with a 'Channel') is a contiguous (uninterrupted) data stream in the file, where all the bytes are next to each other. Type-1 is more versatile as the channel data can be compartmentalized into 1 or more tracks (up to 65536!). So imagine you have a simple bass-line and a melody. With Type-0, you would alternately encode notes of the bass-line and melody next to each other. In Type-1, all the melody notes would go into one track/datastream, and all the bass notes would (optionally) go into another track. Also in Type-1, you can imitate the Type-0 'interleaving' style in any of the tracks.

MIDI allows up to 16 'channels'. Each channel can have one or more voices at once to make a chord, but can only use one instrument (analogous to a 'part' in music theory) at a time. If you want to use a piano and guitar simultaneously, then you'll need to use 2 channels. The below table only has one track chunk (sections F, G, H, I) at the moment. It also just uses one voice of music, so if we were to add more voices of the same or a different instrument, we would either interleave data into the existing track (being careful to specify the channel number every few bytes or so), or we could instead add another track (saving some memory, as you wouldn't need to keep interpolating the channel number if you were using more than one instrument). Don't worry if all that doesn't make too much sense, as there are examples later on! For now, take a look at this:

MIDI section  <-------------MIDI Header---------------> <-----Track Header----> <-Track data-> {Track out>
Byte number   0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 [22 to x]      [x+1 to x+4]
Byte data     4D 54 68 64 00 00 00 06 00 01 00 01 00 80 4D 54 72 6B 00 00 00 0A blah blah..... 00 FF 2F 00
MIDI section  A---------> B---------> C---> D---> E---> F---------> G---------> H------------> I--------->


Minus the actual note data (green 'blah blah'), you've just seen a template of a one track MIDI file! The parts highlighted in red (and green) are the parts you'll probably want to edit. Be especially careful with section G, as many players require it to state the exact number of bytes in the rest of the track! (H & I). "Blah blah....." is where all the good stuff goes - i.e. the music data!

Here's a rundown of the various sections:
A = The very first 4 bytes (hex for "MThd") show that the file is a MIDI.
B = The next four bytes indicate how big the rest of the MIDI Header is (C, D & E). It's always 00000006 for Standard MIDI Files (SMF).
C = MIDI has sub-formats. 0000 means that's it's a Type-0 MIDI file. 0001 (shown) is Type-1.
D = The number should reflect the number of tracks in the MIDI. Type-0 is limited to 1 track.

E = The speed of the music. The hexadecimal value 80 shown will mean 128 ticks per quarter note (crotchet).
F = "4D 54 72 6B" is hexadecimal for the ascii "MTrk", and marks the start of the track data.
G = This should be the number of bytes present in H & I (Track data & Track Out). Shown is 0000000A, so that means 10 more bytes (10 is decimal for hex A).
H = All the music data. See further below for details.
I = 00 FF 2F 00 is required to show that the end of the track has been reached.

The music data

A quick list of possible MIDI pitches and their HEX code:
7F = G9
7E = Gb9
7D = F9
7C = E9
7B = Eb9
7A = D9
79 = Db9
78 = C9
77 = B8
76 = Bb8
75 = A8
74 = Ab8
73 = G8
72 = Gb8
71 = F8
70 = E8
6F = Eb8
6E = D8
6D = Db8
6C = C8
6B = B7
6A = Bb7
69 = A7
68 = Ab7
67 = G7
66 = Gb7
65 = F7
64 = E7
63 = Eb7
62 = D7
61 = Db7
60 = C7
5F = B6
5E = Bb6
5D = A6
5C = Ab6
5B = G6
5A = Gb6
59 = F6
58 = E6
57 = Eb6
56 = D6
55 = Db6
54 = C6
53 = B5
52 = Bb5
51 = A5
50 = Ab5
4F = G5
4E = Gb5
4D = F5
4C = E5
4B = Eb5
4A = D5
49 = Db5
48 = C5
47 = B4
46 = Bb4
45 = A4
44 = Ab4
43 = G4
42 = Gb4
41 = F4
40 = E4
3F = Eb4
3E = D4
3D = Db4
3C = C4
3B = B3
3A = Bb3
39 = A3
38 = Ab3
37 = G3
36 = Gb3
35 = F3
34 = E3
33 = Eb3
32 = D3
31 = Db3
30 = C3
2F = B2
2E = Bb2
2D = A2
2C = Ab2
2B = G2
2A = Gb2
29 = F2
28 = E2
27 = Eb2
26 = D2
25 = Db2
24 = C2
23 = B1
22 = Bb1
21 = A1
20 = Ab1
1F = G1
1E = Gb1
1D = F1
1C = E1
1B = Eb1
1A = D1
19 = Db1
18 = C1
17 = B0
16 = Bb0
15 = A0
14 = Ab0
13 = G0
12 = Gb0
11 = F0
10 = E0
0F = Eb0
0E = D0
0D = Db0
0C = C0
0B = B(-1)
0A = Bb(-1)
09 = A(-1)
08 = Ab(-1)
07 = G(-1)
06 = Gb(-1)
05 = F(-1)
04 = E(-1)
03 = Eb(-1)
02 = D(-1)
01 = Db(-1)
00 = C(-1) 
As mentioned previously, section H is where all of the music data is stored. Let's start with the simplest scenario. Say we wanted to play 3 notes, middle C, D and E. We would do this:

Event    <--- W --->     <--- X --->     <--- Y --->     <--- Z --->     ...
Byte no. 1  2  3  4      1  2  3  4      1  2  3  4      1  2  3  4
         00 90 3C 60     7F 90 3E 60     7F 90 40 60     7F B0 7B 00

(NB: Often a midi will have several bytes of meta event data before the actual note data by using the FF byte - more on this later).

Here you can see four Events, each containing 4 bytes (each byte is comprised of two 'nibbles' - two hexadecimal digits), and each beginning with a time-stamp on byte 1 (colored red). An Event can be lots of things - it can be a message to play or stop a note, or it could be a message to add vibrato or change the instrument. But all events begin with a time-stamp. In the example shown, the first three events (W, X, Y) play the musical notes - middle C, D, E, respectively. The fourth event - Z - silences all notes. Now here's a rundown of the various byte numbers:

Byte 1 is the time-stamp for each event. The time accumulates for each time-stamp, so to represent a steady rhythm, instead of using time-stamps of say... 00, 10, 20, 30, 40, 50, you would say: 00, 10, 10, 10, 10, 10. Looking at the table examples, 00 means no time has passed. The next 7F means wait 7F time units. The next 7F means wait 7F more time units...and so on. If we want to wait longer than this, then MIDI does something special. Instead of what you might expect - 80 - we would instead use two bytes: 81 00.
Instead of the original.....
   00 90 3C 60      7F 90 3E 60      7F 90 40 60      7F B0 7B 00              ...we could've written it as:
80 00 90 3C 60   81 00 90 3E 60   81 00 90 40 60   81 00 B0 7B 00
Converting from the MIDI
timestamp value to decimal in Java
int midiDecTime2normalTime(int[] n) {
  int l=n.length;    int t=0;
  for (int i=0 ; i<l-1 ; i++) {
    t += (n[i]-128) * Math.pow(2,7 * (l-i-1)) ;
  }
  t += n[l-1];
  return t;
}
81 00 waits very fractionally longer than 7F (128 time units instead of 127). As you might imagine, it then goes 81 01... 81 02... 81 03......... up to....... 81 7F, and the next one is 82 00 (256 time units), and then eventually 83 00.... 84 00...., FF 7F...., 81 80 00...., 81 80 80 00 up to a massive FF FF FF 7F. You'll know when the time stamp has ended because the last (or only) byte will always be less than 80, while the possible preceding byte/s are always 80 or over. For a quick java conversion algorithm, see the code in the boxout to the right. For more details (and sample C/C++ code) see this site. One last thing to say; the time gap is dependant on the music speed defined in the MIDI header (section E).

Here below is the table again, so you don't have to keep scrolling the page up and down:

Event    <--- W --->     <--- X --->     <--- Y --->     <--- Z --->     ...
Byte no. 1  2  3  4      1  2  3  4      1  2  3  4      1  2  3  4
         00 90 3C 60     7F 90 3E 60     7F 90 40 60     7F B0 7B 00

Byte 2 is the Status byte (event type). In the case of W, X and Y, the event type is 'Note On'. The 9 part of '90' is the 'Note On' message, and the '0' digit is the channel to which this applies to. This most often used event type takes two parameters, which you can see as Byte 3 and Byte 4. Byte 3 is the note's pitch (3C = middle C), and Byte 4 is the note's volume (both range from 00 - 7F).

In summary, 7F 90 3E 60 means: first wait 7F time units, and then play on channel 0 - the musical note C at volume 60.

9 and B aren't the only event types. There are a number of others ranging from 8 to E. Here's quick list:
  • 8 = Note Off
  • 9 = Note On
  • A = AfterTouch (ie, key pressure)
  • B = Control Change (try these links too)
  • C = Program (patch/instrument) change
  • D = Channel Pressure
  • E = Pitch Wheel

    As mentioned previously, a single digit follows the above types to represent the channel the message is acted out upon. For example, 92 is Note on for channel 2. AB applies aftertouch to channel B. Be careful, in the example I gave, there were four bytes to each event, but some event types use less or more than this.

    Running Status

    It's possible to use a neat 'trick' called 'running status' - cutting out subsequent repeats of the Status byte. Observe:
    Instead of.....
    80 00 90 3C 60   81 00 90 3E 60   81 00 90 40 60   81 00 B0 7B 00           ...we could've written it as:
    80 00 90 3C 60     81 00 3E 60     81 00 40 60     81 00 B0 7B 00
    
    This way, '90' is only used once at the beginning after the first time-stamp. That's because the client will 'remember' it, and apply it to subsequent events.

    In the case of the last event (event Z), the event type (Byte 2) is 'Control Change' ("B") applied to channel 0, and Byte 3 ("7B") is the 'All Notes Off' Controller. Byte 4 is ignored, so just fill it with a 00. See this site for more details

    Be careful though, running status only applies to control bytes 8x - Ex. From F0 to F8, the control byte gets cancelled, and from F9 to FF, the control byte is temporarily ignored (thus the running status is taken from previously).

    Getting it all together

    Okay, just to prove to you that we're actually creating a real MIDI file, I'm going to use the above data, and insert it into section H of the original template near the top of the page! (making sure to edit byte 21 in the G section). You can copy and paste this code into your favourite hex editor, and save it out as a MIDI file to see if it works. Or if you're lazy, here's the job done for you.
    MIDI Header chunk:  4D 54 68 64 00 00 00 06 00 01 00 01 00 80 4D 54 72 6B
    MIDI Track  chunk:  00 00 00 16 80 00 90 3C 60 81 00 3E 60 81 00 40 60 81 00 B0 7B 00 00 FF 2F 00           
    
    So far, so good, but there's one slight problem. All of the notes overlap each other. What if we want each note to stop before the next one begins? Just like using the 90-9F command to start a note, one can use the 80-8F event type to stop a note on a channel. Alternatively, an exact equivalent is to use the 90 command, but with the new volume set as zero. This acts (internally as well) as though the note has stopped. Let's add these note 'silencers' to what we already have:
    1st line: Original with overlapping notes.
    2nd line: Note 'silencers' added on 2nd, 4th, and 6th events.
    
    80 00 90 3C 60                  81 00 3E 60                  81 00 40 60                   81 00 B0 7B 00
    80 00 90 3C 60     81 00 3C 00     00 3E 60     81 00 3E 00     00 40 60     81 00 40 00
    
    In the second line, as you can see, because each silencer event (2nd, 4th, 6th events) has had a delay of 81 00, the time stamps for subsequent sounded notes (3rd and 5th events) should be zero (instead of the usual 81 00), because the 'wait' has already been done by the previous event. Also, the last event has been deleted in line 2, as it isn't necessary anymore to stop all notes, as they were already stopped! Finally, remember that all the events in both lines have an implied Status of 90 due to the 'Running Status' trick as described earlier. Here's the MIDI for you to see/hear.

    A small note on MIDI meta data

    In the track data (section H in the diagram), before plunging straight into the note data, often a MIDI will contain music meta events, and you'll know this because the FF byte will be used. A special case of the FF event is also used to indicate a track has finished (FF 2F 00), but that one's forced (see 'footers' in below data). Apart from that one, meta events don't usually do anything apart from provide names for the instruments, tracks, and lyrics etc. They take the form: "FF XX YY", where XX is the meta event type, and YY is how many bytes the meta event takes up. So for example, "FF 03 07" means the next following 7 bytes specify the Track Name. See here for more information about the various meta event types.



    Simultaneous voices/instruments


    To finish off, I'll discuss using more than one voice and implement this using both Type-0 and Type-1 techniques. We'll build on what we've already done - C, D, E, by adding a new note - lower G - when C plays, and lower A when E plays.

    First off, here's how the Type-1 version would look. C, D, & E go into Track 1, and G & A go into Track 2. Basically, Track 2 plays while Track 1 plays, so they're effectively superimposed upon each other. Also included are the Track headers and 'footers', and the MIDI header to complete a full MIDI file. I've aligned the data so as to make it easier to visualize. (Obviously, in the real file, the bytes are all next to each other).

    Type-1

    (This section has alignment gaps deliberately, to help with visualization)
    MIDI Header
    
    Trk 1 Header
    Trk 1 data
    Trk 1 footer
    
    Trk 2 Header
    Trk 2 data
    Trk 2 footer
    
    4D 54 68 64 00 00 00 06 00 01 00 02 00 80
    
    4D 54 72 6B 00 00 00 1A
                  00 90 3C 60     81 00 3C 00     00 3E 60     81 00 3E 00     00 40 60     81 00 40 00
    00 FF 2F 00
    
    4D 54 72 6B 00 00 00 16
    00 C1 18      00 91 37 60                                  82 00 37 00     00 39 60     81 00 39 00
    00 FF 2F 00
    
    Here's the MIDI file

    A few things to take into account here.
  • First off, take a look at the third from last byte in the MIDI Header. It's 02 - which means two tracks. Also, the fifth from last is '01' which means this is a Type-1 MIDI.
  • Next take a look at the last byte in the Trk 1 Header. It's 1A, and that represents the number of bytes in the rest of Track 1 (similar story with 16 in Track 2's header).
  • Now take a look at Track 2's data. The first event uses the digits C1 for the Status byte! This means that we change the instrument on channel 1 to the instrument '18' (the next byte). This happens to be the Acoustic Guitar (see this site for a complete MIDI instrument reference).
  • The second event in Track 2's data plays the note Lower G (pitch 37) on channel 1 (note the '1' from '91'). The next event is important. Instead of the usual 81 00, we have 82 00, which means we hold the note G for twice as long.

    Now the Type-0 MIDI equivalent for comparison:

    Type-0

    MIDI Header
    
    Trk 1 Header
    Trk 1 data
    
    Trk 1 footer
    
    4D 54 68 64 00 00 00 06 00 00 00 01 00 80
    
    4D 54 72 6B 00 00 00 2E
    00 C1 18    00 91 37 60    00 90 3C 60    81 00 3C 00       00 3E 60    81 00 3E 00
    00 40 60    00 91 37 00    00 91 39 60    81 00 90 40 00    00 91 39 00
    00 FF 2F 00
    
    Here's the MIDI file




    If the info on this site has been of sufficient interest, a small donation would be appreciated:
    Amount you wish to contribute:



    Links:

  • MIDI Transform - A simple web applet to edit MIDI music files. Upload a file and change the volume, speed, instruments, key, and especially interesting and unique to this software - the scale's mode. Listen to Paul McCartney's 'Yesterday' or Mozart's 'Eine Kleine Nacht Music' in a minor key!
  • Waveform to MIDI Conversion Software Roundup - Although the human brain can easily distinguish between instruments and notes in a piece of music, getting a computer to perform the same feat is a very tricky artificial intelligence problem. This article reviews and rates the top four programs and provides five music examples so you can hear how science is progressing in this curious field of research.
  • Converting MIDI to traditional music score - We compare and contrast 24 music notation packages to see which one transcribes MIDI files to score best.

    Offsite links:

  • MIDI controllers
  • MIDI File Format Specification
  • MIDI spec
  • MIDI File Parsing (nice and simple)
  • The MIDI Specification
  • MIDI modes
  • The MIDI File Format (includes event time info)
  • More midi info
  • MIDI FAQ (includes difference between type 0 and type 1)
  • Guide to the MIDI Software Specification


    Back to top

    Skytopia home > Project index > Technology/science articles > MIDI spec crash course




    All pictures and text on this page are copyright 2005 onwards Daniel White.
    If you wish to duplicate any of the information from this page, please contact me for permission.