Every Beats song starts out life as a YAML file, and if lucky, undergoes a metamorphosis into a beautiful *.wav
file. Let’s look at what happens during this process.
When you install the Beats gem, it adds bin\beats
to your path. This is the entry point. It collects the command line arguments, and then calls Beats.run()
(located in lib/beats.rb
), which is the real driver of the program. When Beats.run()
returns, bin/beats
displays an exit message, or alternately lists any errors that occurred.
Beats.run()
manages the transformation of the YAML file into a *.wav
file, by calling the appropriate code to parse the YAML file, normalize the song into a standard format, convert the song into an equivalent song which will be generated faster, generate the song’s audio data, and save it to disk. For more info on each of these steps, read on below.
The SongParser
object parses a raw YAML song file and converts it into domain objects. It contains a single public method, parse()
. The input is a raw YAML string, and the output is a Song
object and a Kit
object.
A Song
object is like the “sheet music” for the song. It is a container for Pattern
objects (which are in turn containers for Track
objects). It also stores the song flow (i.e. the order that patterns should be played). The flow is internally represented as an array of symbols. For example, when this song is parsed:
Song:
Flow:
- Verse: x2
- Chorus: x4
- Bridge: x1
- Chorus: x1
the resulting Song
object will have this flow:
[:verse, :verse, :chorus, :chorus, :chorus, :chorus, :bridge, :chorus]
A Kit
object provides access to the raw sample data for each sound used in the song.
A nice thing about YAML is that it’s easy for humans to read and write, and support for parsing it is built into Ruby. For example, reading a YAML file from disk and converting it into a Ruby hash can be accomplished with one line of code:
hash = YAML.load_file("my_yaml_file.txt")
Despite this, SongParser
is still 200+ lines long, due to the logic for validating the parsed YAML file and converting it into the Song
and Kit
domain objects.
After the YAML file is parsed and converted into a Song
and Kit
, the Song
object is normalized to a standard format. This is done to allow the audio generation logic to be simpler.
As far as the audio engine knows there is only one type of song in the universe: one in which all patterns in the flow are played, all tracks in each pattern are mixed together, and the result is written to a single *.wav
file. If that’s the case though, then how do we deal with the -p
option, which only writes a single pattern to the *.wav
file? Or the -s
option, which saves each track to a separate *.wav
file?
The answer is that before audio generation happens, songs are converted to a normalized format. For example, when the -p
option is used, the Song
returned from SongParser
is modified so that the flow only consists of a single performance of the specified pattern. All other patterns are removed from the flow.
Original Song:
Song:
Flow:
- Verse: x2
- Chorus: x4
- Verse: x2
- Chorus: x4
Verse:
- bass.wav: X...X...X...X...
- snare.wav: ....X.......X...
Chorus:
- bass.wav: X.............X.
- snare.wav: ....X.........X.
After Normalization For -p
Verse Option:
Song:
Flow:
- Verse: x1
Verse:
- bass.wav: X...X...X...X...
- snare.wav: ....X.......X...
(This transformation occurs in the Song
object, not the actual input YAML. The “after” example is what the equivalent YAML would look like).
When the -s
option is used, the Song
is split into multiple Song
s that contain a single track. If the Song
has a total of 5 tracks spread out over a few patterns, it will be split into 5 different Song
objects that each contain a single Track
.
The benefit of song normalization is that it moves complexity out of the audio domain and into the Ruby domain, where it is easier to deal with. For example, the output of the audio engine is arrays of integers thousands or even millions of elements long. If a test fails, it can be hard to tell why one long list of integers doesn’t match the expected long list of integers. Song normalization reduces the number of tests of this type that need to be written. Normalization also allows the audio engine to be optimized more easily, by making the implementation simpler. (The audio engine is where almost all of the run time is located).
In contrast, normalizing Song
objects is generally straightforward, easy to understand, and easy to test. For example, it’s usually simple to build a test that verfies hash A is transformed into hash B.
After the initial Song
object is normalized, the resulting Song
object(s) are further transformed into equivalent Song
objects whose audio data can be generated more quickly by the audio engine.
Optimization consists of two steps:
Performance tests show that generating audio for 4 patterns 4 steps long is faster that generating a single 16-step pattern. Generally, dealing with shorter arrays of sample data appears to be faster than really long arrays.
Replacing two patterns that have the same tracks with a single canoncial pattern takes advantage of the fact that the audio engine will cache the sample data for previously generated patterns. If you have two Pattern
objects that have the same track data, the audio engine will not realize they are the same and will generate audio data from scratch more often than is necessary.
Humans are probably not that likely to define identical patterns twice in a song. However, breaking patterns into smaller pieces can often allow pattern consolidation to “detect” duplicated rhythms inside (or across) patterns. So, these two optimizations actually work in concert. The nice thing is that this optimzation algorithm is simple, but effective.
The SongOptimizer
class is used to perform optimization. It contains a single public method, optimize()
, which takes a Song
object and returns a Song
object that is optimized.
Before reading this section, it might be helpful to read up on the basics on digital audio, if you aren’t already familiar.
All right! We’ve now parsed our YAML file into Song
and Kit
domain objects, converted the Song
into a normalized format, and optimized it. Now we’re ready to actually generate some audio data.
At a high level, generating the song’s audio data consists of iterating through the flow, generating the sample data for each pattern (or pulling it from cache), and then writing it to disk. The two main classes involved in this are AudioEngine
(the main driver) and AudioUtils
(general utility methods for working with audio data).
Audio generation begins at the track level. First, an array is created with enough samples for each step in the track at the specified tempo. Each sample is initialized to 0. Then, the sample data for the track’s sound is “painted” onto the array at the appropriate places. The method that does all this is AudioEngine.generate_track_sample_data()
.
Generating the sample data for a pattern consists of generating the sample data for each of it’s tracks, and then mixing them into a single sample array by using AudioUtils.composite()
.
Once each pattern is generated it is written to disk using the WaveFile
gem.
One complication that arises is that sounds triggered in a track can extend past the track’s end (and therefore also its parent pattern’s end). For example, imagine a long cymbal crash occurring in the last step of a track – it can easily extend into the next pattern. If this is not accounted for, then sounds will suddenly cut off once the track or the parent pattern ends. This can especially be a problem after song optimization, since it chops patterns into smaller pieces. During playback sounds will continually cut off at seemingly random times.
To help deal with overflow, AudioEngine.generate_track_sample_data()
actually returns two sample arrays: one containing the samples that occur during the normal track playback, and another containing samples that overflow into the next pattern.
When pattern audio is generated, overflow needs to be accounted for at the beginning and end of the pattern. AudioEngine.generate_pattern_sample_data()
requires a hash of the overflow from each track in the preceding pattern in the flow, so that it can be inserted at the beginning of each track in the current pattern. This prevents sounds from cutting off each time a new pattern starts. The pattern must also return a hash of its outgoing overflow in addition to the composited primary sample data, so that the next pattern in the flow can use it.
Patterns are often played more than once in a song. After a pattern is generated the first time, it is cached so it doesn’t have to be generated again.
There are actually two levels of pattern caching. The first level caches the result of compositing a pattern’s track together. The second level caches the results of composited sample data converted into native *.wav
file format.
The reason for these two different caches has to do with overflow. The problem is that when caching composited sample data you have to store it in a format that will allow arbitrary incoming overflow to be applied at the beginning. Once sample data is converted into *.wav
file format, you can’t do this. Cached data in *.wav
file format is actually tied to specific incoming overflow. So if a pattern occurs in a song 5 times with different incoming overflow each time, there will be a single copy in the 1st cache (with no overflow applied), and 5 copies in the 2nd cache (each with different overflow applied).
Track sample data is not cached, since performance tests show this only gives a very small performance improvement. Generating an individual track is relatively fast; it is compositing tracks together which is slow. This makes sense because painting sample data onto an array can be done with a single Ruby statement (and thus the bulk of the work and iteration is done at the C level inside the Ruby VM), whereas compositing sample data must be done at the Ruby level one sample at a time.
© 2010-2020 Joel Strait