How UYKFD Works
UYKFD is still being developed. The next section is generally still correct, but build auto calculates better distance estimates by building progressively larger "backlink" graphs. See here for a more recent (but higher level) discussion.
Outdated Explanation
A playlist is constructed through the following steps:
- The ID3 tags of MP3 files are read and stored in a database
- To help identify artists, the ID3 artist tag is simplified and split into separate components. This means that a track may be associated with several different "artist strings".
- The Last FM web API for artist-related "top tags" is queried for each distinct "artist string". Again this information is stored in the database.
- This service returns lists of tags and weights, so the results form a directed, weighted graph over various strings in the database, with weights based on how popular the tags are.
- In addition, each artist is given a low weighted link to the tag "music".
- An undirected, weighted graph is constructed between the different artists read from the ID3 tags. The weighting reflects the "similarity" between the two artists.
- This is created by looking for tags that are common to "artist strings" associated with the two artists that are being connected. The weight is the sum of the weights for different tags; the weight for a particular tag is the product of the maximum weight for the tag for each artist (through all possible "artist strings").
- This graph is not guaranteed to be complete.
- To generate a playlist an artist is selected, based on the initial/previous artist, then a track for that artist is selected:
- To select an artist:
- A list of all connected artists (to the initial/previous artist) is made. Each is given a weight equal to the weight of the connection.
- Artists that have already been visited have their weight reduced by a factor of 10 for each visit (the list of visited artists has a maximum length defined by the property ARTIST_MEMORY).
- An artist is selected at "weighted random".
- To select a track:
- A list of all the artist's tracks. Each is given an equal initial weight.
- Tracks that have already been visited have their weight reduced by a factor of 10 for each visit (the list of visited tracks has a maximum length defined by the property TRACK_MEMORY).
- A track is selected at "weighted random".
- A second algorithm uses a backlink table, which is a directed graph generated as follows:
- A "forward link" graph is created from the connected artists table, in which each node links to BACK_FWD highest weighted neighbours.
- A "backward link" graph is also created, in which edges are added from each each artist's BACK_BACK highest weighted neighbours towards the artist.
- The backlink graph is created by adding these two graphs. Typically BACK_BACK is larger than BACK_FWD, hence the name.
- A playlist can be generated from the backlink graph using a similar algorithm to that described above, but with modified artist weights:
- Each node has an initial weight proportional to its degree (so nodes that lead to many other nodes are selected more often).
- Already visited artists and tracks are penalised as before.
- In addition, a distance (link count) table can be calculated.
- Candidate artist nodes then have an additional weight proportional to the distance, normalised relative to the largest distance, raised to the power ARTIST_DEXPONENT, between the candidate and the oldest known artist in the list of visited artists (see above).
- If ARTIST_DEXPONENT is negative, then the playlist will stay close to similar artists; if it is positive it will move away towards more varied choices.