This is one of my favorite projects, and a very early use of crowd intelligence: YADB is a CD metadata database.
YADB stands for Yet Another DataBase, a riff on YACC -- yet another compiler compiler.
Music CDs do not include any metadata: no artist name, no CD title, no track titles. All they have is the
number of tracks and the length of each track. J. River
built a nice media player, and implemented YADB
to populate CD metadata when CDs are ripped. There's a server with a database containing the mapping
from number of tracks and lengths to title, artist, and track titles. Users both looked up info and
submitted their own versions.
User info is notoriously inaccurate. For example, at the time I left the company, we had seen 187
different spellings or capitalizations of Pink Floyd. To improve the quality of the data, I implemented
a voting system. All user submissions were kept forever, with a count. If another user submitted matching
data, the count was increased, otherwise a new entry with a count of 1 was created. When a user queried
the data, the highest count match was returned. In the case of Pink Floyd, the proper spelling and capitalization
overwhelmingly had the highest count.
YADB is still running today, though you must use J. River's Media Center to access it. The data continues
to be higher and higher quality. The algorithm also applies to cover art, which has been very successful as well.