Thursday, September 27, 2012

Machine Learning With K-Pop

I've finally found some time to write a few blog posts about some areas of interest that I've wanted to write about that are too lengthy to cover in a G+ post. This post is about Korean pop music, machine learning and the Pandora Internet radio station that I recently created, Super K-Pop Radio, for my new favorite music genre.

If you follow my Google+ or Twitter accounts, you may have noticed all the links I've been posting to K-pop music videos on YouTube. From now on, I'm only going to do that when I find something particularly noteworthy, so as to avoid annoying my friends who have different tastes in music. I am sharing a big YouTube playlist of K, J, and C-Pop music videos, so you can check out that playlist if you're interested in all the new videos that I've discovered.

A few months ago I decided to make a serious effort to start learning Korean, primarily for business reasons (there are a few times when I think it would've been helpful for me to know a little Korean when working with one of our OEM partners on previous Nexus phones). It's something I've wanted to do for a while, and once I got started on it seriously (I'm a big fan of the Pimsleur audio CD lessons, so I ordered the Korean I 30-lesson course and am working through that right now), I discovered that I really enjoy the way the language sounds and the abundance of high quality pop music and dramas coming out of South Korea in the past few years.

About 5 years ago, I made a similar decision to learn Japanese, and I managed to learn a thing or two (also using the Pimsleur CDs plus a few books) before losing interest and moving on to other studies. I definitely want to continue studying and improving (or at least not losing) my limited Japanese, but for the moment, it's more exciting for me to work on Korean, and I have many more opportunities to practice hearing the language. There are lots of things I'd like to write about learning a new language, and about aspects of Korean that remind me of things I'd previously learned from other languages, like Japanese and also French (similarities in the vowel sounds and the way the words connect together when spoken), but this post is about music, so I'll save those thoughts for another post.

The event that really kicked off my current K-obsession started a few months ago when Google released a native YouTube player for the Playstation 3. When I dusted off my PS3 to try it out, one of the top-ranking music videos was, you guessed it, PSY's Gangnam Style (technically I believe it was the duet version of the song with Kim Hyuna). Needless to say, I was hooked, and discovered many other cool K-pop videos in short order through the updated PS3 YouTube interface. The original Gangnam Style has over 278 million views on YouTube as I write this, and the duet version has over 54 million. Clearly this is a world phenomenon. So I started to get really seriously into K-pop, as it encompasses genres of music that I already enjoy, namely catchy dance-pop with good melodies, interesting rhythms, and R&B and electronica influences.

I've been a fan of J-Pop for many years, but J-pop as a genre doesn't really encompass the entire spectrum of uptempo pop music styles that I like. For example there's not a lot of R&B or funk influence in Pizzicato Five or Puffy AmiYumi songs that I can discern. K-pop seems to draw from a broader base of influences that match my own musical tastes, and I have some interesting data to prove that point, and maybe a few others. In fact, I think I can explain the popularity of PSY's Gangnam Style and Carly Rae Jepsen's Call Me Maybe (nearly 270M views!) quite easily based on their shared musical attributes.

The subject of this post is "machine learning," and one of my favorite methods for discovering new music in my favorite genres is the Pandora Internet radio station. I must apologize here to my international readers, because Pandora is currently only available in the U.S. (with limited access in Australia and New Zealand, according to Wikipedia) due to the ridiculous complexity and expense of today's music licensing landscape.

The way that Pandora works that's different from other Internet radio stations is that every song in their archive has been listened to by a trained reviewer, who tags the song in Pandora's database with a set of attributes describing its various musical characteristics (genre, tempo, major/minor key, instrumentation, singing style, etc) according to well-defined criteria for each genre of music. For more detail, see this 2009 article or this 2011 interview.

This is different from other music matching systems such as iTunes' "Genius playlist" / "Genius Mix" feature, which uses patterns discovered from the user's own play history and ranking plus the selection of tracks that are listened to by the millions of other iTunes users using the service, to create playlists of similar songs. A "Genius mix" is just a larger playlist based around the genres that the service determined that the user listens to frequently, that is shuffled and looped like a radio station, but stored locally. The user can create any number of Genius playlists based around any single track in their library, but the list of Genius mixes and their contents is determined by the user's listening history and Apple's algorithm, and is not configurable.

Another common approach to playlist creation is taken by Google Play Music's "instant mix" feature, which uses automatic algorithmic analysis of the frequencies and rhythms of the songs in your library to find other songs with similar sonic features (according to the algorithm) to create a playlist around songs with similar themes. I would also recommend a free program called MusicIP Mixer, which is currently unsupported (the company was sold to Gracenote) but still freely available for Windows, Mac OS X, and Linux, that scans your local music collection and then generates playlists that seem to be about as good as Google's instant mixes.

Both types of approaches (user behavior analysis or algorithmic analysis) work very well for creating playlists from your own music collection (presumably composed mostly of songs that you already like), but not so much for discovering new songs. With Pandora's approach, you crate a new station starting from a specific genre, or using an individual song or artist as a seed (you can have more than one seed, but it's usually best to start with something specific). I tried to do this with a few K-pop artists but then discovered I got a much better result by starting with the entire K-pop genre as the seed. For each song that plays on a station, you can give Pandora a thumbs-up or thumbs-down, and it uses those scores to determine which songs to play on that station in the future.

So we have human reviewers who have cataloged each song in Pandora's library, and then human listeners scoring each song that plays on each of the millions of channels that people create on the service. Where does the machine learning come in? When people think of artificial intelligence, they often think of neural networks and other complex systems designed to simulate the behavior of human brains. That makes for great science fiction, and can actually be useful in certain areas such as speech and vision recognition, but for this application, we don't need anything nearly that complicated.

Instead, the simple approach (simpler and more predictable than a neural net, anyway) is to imagine each song occupying a position in a very high dimensional space (you can't actually visualize this, but try to imagine it conceptually). In this space, instead of the X, Y, and Z dimensions of the real world, you have a dimension to represent each type of musical attribute that is recorded in the system. Because there are so many attributes, there are very many dimensions, but for each dimension, there may be only a very limited number of possible values. In fact, for many dimensions the only values may be 0 or 1: a song either does or does not have an acoustic piano, or a laid-back female vocalist, or whatever, and every single attribute has its own unique dimension in the problem space.

Because there are so many dimensions, each song can be thought of as having its own unique (or nearly unique) position within this high-dimensional space, even if a particular dimension only has 2 or 3 possible values. Instead of an X,Y,Z position in space, each song's "position" is a long list of numbers representing its value for each of the many attributes along which it has scored by Pandora's reviewers. "Nearby" songs in this high-dimensional space will have close or identical values along many dimensions, while "distant" songs will have few. For each radio station, Pandora has to decide which song to play next based on the feedback given by the user (thumbs up or down) for previous songs that it has played on that channel. Songs with thumbs down are eliminated from the channel, but how can the thumbs-up scores be used to discover similar songs with similar attributes?

This post is long enough already and I'm not an expert in the field, but the Wikipedia entry for support vector machines goes into much greater detail on how the math works. Essentially what the Pandora service does (or is likely doing, since I have no inside knowledge of their algorithm) is a whole bunch of matrix arithmetic to construct a hyperplane (or more complex shape) to cut through this high-dimensional space of musical attributes, dividing the songs that are likely to be appropriate to play on that channel (based on the weighted thumbs-up scores of previously played nearby songs) from the ones that are likely to not be appropriate. Thumbs-down scores only eliminate a particular song from the channel; they don't signal a dislike for similar songs in the same way that a thumbs-up does, but I think they do signal a slight dislike for other songs from that artist. Similar models are used for tasks such as email spam classification or automated sorting of news stories into categories (e.g. Google News).

The primary weakness of this approach for music is that there isn't a dimension in the Pandora database to express whether or not a singer or a song is actually any good, by the individual listener's subjective standards of quality. The only way that Pandora could build a system that scales up to its current size is by limiting their list of attributes to reasonably objective attributes for which a relatively large number of human listeners can be trained to give consistent scores across songs and reviewers. Whether a song uses an acoustic piano or an electric guitar is something everyone can agree on, but whether a singer or a song is any good is not so clear-cut. So here's what happened when I created Super K-Pop Radio and started rating songs based on my personal standards of good pop music.

The main problem that people typically have with Pandora is that they always get the same 20 or 30 songs and have a lot of trouble adding more variety to a particular station that's of high quality. For those who aren't familiar with the service, the particular Devil's pact that Pandora had to make with the recording industry in order to be considered an Internet radio station, as opposed to some other kind of music service with an even more exorbitant royalty structure than what they currently have to pay, requires them to limit the number of songs you're allowed to skip: currently the limit is 12 skips across all channels every 24 hours. The limit used to be 6 skips per channel and 72 skips total in any 24 hour interval, but the new system actually works better for me when I'm focused on fine-tuning a single station like I am now.

So this limitation makes creating and tuning your stations a sort of game, because when a bad song comes on, you have to decide between listening to the whole thing and rating it thumbs down after the next song starts (which has no penalty) vs. hitting thumbs down immediately to skip the rest of the song and losing a chance to skip an even worse song that might come on in the future.

Because of the limitation on skips, sometimes it's difficult to build up a large thumbs-up list due to the combination of horrible songs in the exact genre that you want, horrible songs in a completely random genre, and songs you've already rated that it's already played for you 5 times in the past 5 hours. Fortunately for the last case, there's a separate "I'm tired of this song" button which will remove that song from all of your channels for some period of time, which works fairly well for me once I have a large enough collection of thumbs-up on a channel.

Pandora really needs to add an "achievements" system like Xbox to give you trophies for listening to a particular song 20 times or 50 times or 100 times on a channel. It does seem to have a habit of finding 5 or 6 songs that it thinks are particularly representative of that channel to play for you as often as possible until you are really and truly tired of it, but once you do mark it "tired", usually a few new songs pop up in the list that might have been hidden by the greater "star power" of the songs it wants to play all the time. But as a challenge I like to listen to those songs as many times as I can absolutely stand to before clicking "I'm tired". My playlist is long enough that by the time it comes around again, I'll probably really be wanting to hear it.

So here's the interesting thing that happened with the K-pop channel. Instead of getting the same 10 songs, I actually accumulated a collection of nearly 100 thumbs-up (out of about 200 songs total) before Pandora completely ran out of South Korean bands to play that it thought were closely related. Then it switched completely to songs from non-Korean bands. What I discovered is that the songs that I like in the K-pop genre blend seamlessly into songs from R&B, synth pop, J-Pop, and so the base of songs that match all the Pandora attributes that I could add to the station became huge.

But I didn't actually like more than a small number of songs from a small number of the non-Korean bands. The other thing I quickly discovered is how much bad R&B, bad synth pop, bad Latin pop, bad dance pop, etc. there is out there. By this I mean I have a very high standard in a few areas when it comes to music that I really like and want to listen to, in general:
  1. Very high quality singing. Auto-Tune can conceal a multitude of flaws, but somehow there are so many bad singers (to my ear), including some big names, like Kesha (horrible!), Lady Gaga (I like a few of her songs, when she's singing in tune), Katy Perry, Justin Bieber, etc. that if I didn't have plenty of good K-pop songs to listen to, I'd probably abandon the entire genre.
  2. Equally high quality musical performance and production values.
  3. Interesting and enjoyable composition, melody and rhythm.
It's the third item that is the most challenging to judge, and of course the most subjective, but usually I can come down on one side or the other after one listen. BTW, it amazes me how many Disney child singers and musical soundtracks come up (because they're in exactly the K-Pop style) and how, even when the singing is not bad, the composition of the songs is always just so soulless and boring.

If you're a parent of young children who has to listen to those Disney soundtracks all the time, and you think it's horrible soulless corporate garbage, it's not because you're old and out-of-touch, it's because it really is soulless corporate garbage that doesn't have any reason to justify its existence. Buy your kids some Girls' Generation or Super Junior or the K-Pop version of whatever style they like, and you still might not like that kind of music, but at least it'll have some soul and some actual musicality to it, and they'll be better off for having some quality music to listen to.

So after many hours of listening and 300 or so rankings of non-Korean bands, Pandora is now starting to mix in songs from the K-Pop bands again, and even branching out into J-Pop and some other areas that I also like. But as I write this, I only liked 124 out of 527 songs: 100 songs from S. Korean bands plus 24 songs from the rest of the world.

If you like stats, I actually made a spreadsheet to add up the numbers, which I know marks me as a hopeless geek. I'm continuing to update the spreadsheet whenever I listen to the station, so here's a link to view the current version and here are snapshots at 500 songs and 400 songs ranked (you can open them in different tabs and flip back and forth). BTW, if you use Gmail and are concerned about your email privacy, you may want to open those links in an incognito window or log out of Google before opening them because otherwise your username will probably show up in the list of viewers where any other viewer could see it. Edit: because I made the doc world readable, Google Docs will keep your name and email address private and you'll show up to other users as "Anonymous user #x" if you're signed in.

Column A is the artist, column B is the # of songs from that artist that I rated thumbs-up, column C is the # rated thumbs-down, column D is 1 for a South Korean band, or 0 otherwise. B2 is the total thumbs-up count, C2 is the total thumbs-down count, D2 is the # of Korean bands in the list, E2 and F2 are the # of Korean bands I gave thumbs-up and thumbs-down ratings to (columns E and F are formulas I copied down the page, multiplying the cells in that row in columns B and C by the value in D, so the values are set to 0 for the non-Korean bands). G2 and H2 are the non-Korean thumbs-up and thumbs-down values, and in column H and I, I've put some formulas to provide the interesting stats.

So if you want an explanation for the popularity of both "Gangnam Style" and "Call Me Maybe", I think the simplest explanation is that they're catchy pop songs with high musical production values and good composition in a genre where it's easy to stand out from the rest of the pack because the rest of the pack (in this particular genre) tends to be pretty mediocre. I think it's not a coincidence that PSY, aka Park Jae-Sang, is a graduate of attended Boston University and Berklee College of Music. He knows his music theory and he has 10 years of experience making records in South Korea before Gangnam Style rocketed to the top of the charts. Edit: the Pandora bio for PSY says he graduated from both schools, but other sources say that he attended but did not graduate.

If there's a distinguishing characteristic of K-Pop as a genre, I would say it's the quality and attention to detail as opposed to something that's specifically Korean (they're not wearing hanbok and playing the gayageum or anything like that). I think I'm definitely a bit unusual for being so particular and not being able to really enjoy music that isn't of super high quality, because a lot of people don't have a good ear for pitch so they don't even notice the little things that turn me off, and it definitely limits the number of songs that I enjoy listening to, but then again, I had private piano lessons as a child and played in the school band for many years, so I have a certain set of standards that people who didn't play musical instruments as a kid don't have. I wouldn't be surprised if the South Korean school system and society place a much higher value on musical education compared to the U.S.

I'll keep listening to the Pandora station and adding thumbs up and thumbs down, and feel free to listen along if you enjoy the genre as much as I do. I'll try to start on my next blog post in a day or two.

Tuesday, July 10, 2012

Adventures In Analog Audio

I recently started to digitize my parents' vinyl record collection, as well as any interesting LP's I can find at music stores and from friends. Vinyl has made a big resurgence in recent years, so it's a great time to pick up the right equipment to do the job properly. While you're reading this, check out the videos I posted to YouTube of the first three tracks of Breezin' by George Benson, which happened to be the album with the best audio quality in my collection so far. Be sure to set the video quality to 720p HD in order to hear the audio at full quality (384 kbps stereo AAC).
First, a comment on the myth that vinyl sounds "superior" to CD's. I know there are a lot of people who believe this, but it's not literally true. Vinyl imparts a particular sound to music which some people enjoy. I know I have songs in my library where the artists have intentionally inserted the pops and hiss and dynamic compression of a vinyl record into the song itself, but in the best case scenario, CD quality digital audio will offer superior audio quality to even the best mastered LP records.

There's a historical element of truth to the belief that some albums sound better on LP than on CD, because good mastering of LPs has been well understood since the 1970s, while poor mastering of audio CDs was common until the mid-1990s. There are a lot of tricks that needed to be discovered to build up a set of best practices for mastering digital audio, just as there are tricks for mastering LPs for the best quality. Unfortunately, CD's issued in recent years sometimes suffer from the disease of over-compression, a consequence of the Loudness War, which leads to poor audio quality.

Don't get me started on the loss of audio quality caused by the common psychoacoustic (lossy) compression techniques used by MP3, AAC, Ogg Vorbis, and other popular audio codecs to remove portions of the audio that your ears and brain are not supposed to notice are missing. That's a topic for another post. I'm digitizing LPs at DAT quality (48 kHz, 16-bit samples), saved as FLAC (lossless compression which preserves the original bits), from which I can convert to MP3 or AAC if needed, so the appropriate comparison is to CD's converted to FLAC.

For those who want to believe in some mystical purity of analog audio, you're buying into some voodoo. For more on the psychology of the vinyl vs. CD debate, as well as the thoughts of recording engineers with experience in these matters, see this NPR story, Why Vinyl Sounds Better Than CD, Or Not.

The upshot is that in the very best case, an LP can sound indistinguishable from a CD, but there are a whole bunch of little details that you have to get right in order to get the best sound. For a CD, the only piece that needs to be of a high quality is the final digital-to-analog conversion stage, which usually happens in the receiver connected to your speakers, or the audio chip connected to the headphone jack in a portable device. The only requirement for the CD is that it not be so scratched or damaged that the bits can no longer be read from it. For an LP, the quality of the analog signal from the record needle has to be preserved as much as possible until the samples are digitized. This means that the record needs to be clean, the needle needs to be clean, and the turntable has to be well-designed to avoid distortions such as rumble, wow, and flutter.

Traditionally, a turntable will output an analog stereo signal at the unamplified voltage generated from the magnetic fluctuations in the cartridge, measured in millivolts. Then a phono pre-amp is used to amplify the signal to the more common line-level voltage used by analog audio inputs to a stereo receiver. The pre-amp also has the important task of reversing the RIAA equalization that is performed before mastering the disc. RIAA equalization reduces the bass frequencies by up to 20 dB so that the needle doesn't jump out of the groove on tracks with heavy bass. It also boosts the treble frequencies by up to 20 dB so that the background hiss picked up by the needle due to imperfections in the surface is reduced when the record is played back (similar to Dolby noise reduction in cassette tapes). Finally, the analog line-level signal can be digitized by an analog-to-digital converter.

Rather than three separate boxes, I'm using a single turntable that was designed for this application, as well as for DJ's spinning records in clubs: the Stanton T.92 USB. In addition to outputting analog audio at either phono or line-level, it includes an A/D converter and both USB and coax S/PDIF outputs. For digitizing, I've connected it to my MacBook running the free Audacity audio editor. So all of the analog stages are happening completely inside the turntable, which minimizes the chances of introducing interference or reducing audio quality due to cheap cables connecting the components, and eliminates having to set the various volumes to the appropriate levels to avoid clipping.

I had to change two settings in Audacity to get the best audio quality. The first was increasing the sample rate from 44.1 kHz to 48 kHz. While this makes a small difference in practice, especially if your hearing in the upper frequencies isn't good enough to notice, but for this turntable, it sounded to me like the absolute quality is worse at 44.1 kHz than would be expected merely from the lower sample rate. Either way, it's better to capture at the higher sample rate and resample down to 44.1 kHz if necessary than to have not captured that info in the first place.

The second change was to disable dithering in Preferences: Quality. Dithering is a step that is intended to reduce unwanted high-frequency "aliasing" but in practice, because the samples are originally 16-bit, they don't need any further processing or rounding of values, and any such dithering only reduces the high-frequency aspects of the sound. I've discovered that iTunes on Mac OS appears to automatically dither the output, which leads to slightly reduced quality in my experience (it's especially obvious that this is what's happening if you try to play a high-frequency 16 kHz "mosquito tone" through iTunes, which will play with loud low frequency overtones introduced by the dithering process).

Finally, I replaced the turntable cartridge bundled with the T.92 with a replacement cartridge designed specifically for digitizing records, the Ortofon Arkiv Concorde. It was a simple plug-in replacement for the head shell bundled with the T.92. One word of caution: the cartridge came with an O-ring to insert between the cartridge and the tone arm, which was differently sized to the O-ring on the bundled head shell. I originally used the O-ring from the Ortofon, and it did not dampen the sound sufficiently, causing the tone arm to vibrate loud enough that the music was clearly audible from the audio vibrations of the tone arm. This is clearly going to introduce distortion into the process. Using the O-ring from the Stanton head shell with the Ortofon cartridge eliminated the vibration transfer to the tone arm.

The last point I wanted to make was the importance of cleaning the album before playing it. Vinyl albums can easily build a static electricity charge, which attracts dirt and dust, leading to pops during playback. I was able to acquire a Discwasher cleaning brush from a friend, which works like a large lint brush to pick up the dust from the records. These cleaning kits originally came with isopropyl alcohol-based cleaners, which are not recommended because they can damage the vinyl. Instead, I'm using a fuzzy cleaning cloth from an LCD cleaning kit (the kind used for cleaning glossy LCD screens), along with a bottle of Xtreme Klean screen cleaner from Fry's electronics. The ingredients are listed as deionized water and "proprietary polymers." More important is what it doesn't contain: alcohol or ammonia. It's also antistatic, which is good. Records should be cleaned by brushing in a circular fashion, so as to minimize the chances of scratching the grooves.

I'll add some photos and additional comments later, but I think this covers the basics. Please let me know in comments if you'd like any more info on any of the steps. Happy listening!

Thursday, January 12, 2012

The Myth of Obsolescence

There's an insidious belief held by many people that computer hardware inevitably becomes obsolete and useless after only a few years, at which point it must simply be thrown away as E-waste and replaced with the "next new thing." It's true that the capabilities of computer hardware continue to improve at an exponential rate, but this doesn't imply that the computer that was good enough to perform a task in the past is necessarily no longer adequate today, simply because something else has come along that can perform the same task slightly faster, or otherwise better.

One point I'm trying to make with this series on old computers that we don't have to buy into the assumptions of planned obsolescence that companies often try to foist on consumers in order to make more money. In fact, with nearly 7 billion people on the planet today, our collective environmental impact, including from peak oil and global climate change, will force us to make more conservative use of our natural resources in the 21st century than the United States and other industrial countries ever had to consider in the last half of the 20th.

If we continue to treat computers and other electronics as merely cheap and disposable products, and don't give any concern to reusing our older equipment, then eventually the new products will become more expensive due to higher demand and lower supply of raw materials, and they'll probably be of lower quality, because when people don't expect for something to last for a long time, they're less likely to demand (and be willing to pay the slightly higher price for) products that are designed to last.

Where hardware is concerned, you do often get what you pay for, and high-quality components that were expensive when new are often available used for a tiny fraction of the original price. The Apple IIe (Platinum series) and Apple IIc+ (upgraded to 8MHz) that I recently purchased on eBay for $154.70 and $249.99 respectively were admittedly nostalgia-driven purchases, but they are so solidly built and the keyboards are so nice to type on that I was willing to pay the price. They were also extremely expensive computers when new: I would never have paid $750 in 1988 for an Apple IIc+, or $1400 for an Apple IIe, at a time when an Amiga 500 cost about $550, and was far more powerful, but now that I own one of each, I can almost see why some people were willing to pay such high prices for Apple II gear.

One area I will spend a lot of time covering is the SCSI bus, as this is the interface used to connect hard drives, CD-ROMs, and tape drives to many of the older machines, including my Amiga, Alphas, and VAXstations, and there are a number of tricky details to cover. Some years ago, Seagate released an informative white paper explaining why SCSI drives almost always cost more than the IDE drives of the same era. The differences were all related to the higher performance and reliability requirements of hard drives for the server applications where SCSI hard drives were used, and the lower prices that users were willing to pay (and willing to accept lower performance and reliability in exchange) for IDE drives.

In the paper they mention a typical MTBF of 1,000,000 hours for a typical high-end SCSI drive. That's over 114 years of continuous usage! Now I have no idea how manufacturers justify this estimate without a time machine, but the point is that these are the goals that the drives were built to achieve. I had to go through 5 or 6 different 36 GB SCSI drives before I found one that was quiet enough to put in the Amiga: most of them had annoying high-pitched whines, and I have extremely sensitive ears. Fortunately, I live near Weird Stuff, which has an excellent selection of used computer gear and a good return policy. for this project. According to smartctl, it had a lifetime usage of 13756 hours (about 1.5 years) when I bought it (the drive itself is probably about 10 years old). If I'm lucky, this drive will last another 20-30 years, and hopefully the Amiga itself will also last as long.

Tuesday, January 10, 2012

Restoring Old Computers

Now that I'm fairly well adjusted to my new, happier, life working at Google on Android, I'm returning to blogging with a series of articles on one of my favorite hobbies: restoring and reusing old computers. I'm going to focus on three very different computers: the Commodore Amiga, the DEC Alpha, and the Apple IIe.

What do these systems have in common (besides having names that start with the letter A)? The main feature that attracts me is that none of them uses an Intel processor or runs MS-DOS or Windows (and they're not Macs either). So if your definition of PC is a generic personal computer, then they all qualify, but far more people use the term PC primarily to mean a very specific type of computer, namely one with an Intel (or compatible) "x86" processor that runs Windows (as in Apple's famous Get a Mac ad campaign).

If you're under 25, you might be too young to remember this, but there was a time before Microsoft's relentless, predatory, and sometimes illegal, business practices forced nearly the entire computer industry to standardize on their Windows-branded operating systems, and the Intel, or Intel-compatible, CPUs that ran the most compatible flavors of Windows. It was a time of great innovation. For example, the first Web browser was invented (along with the early versions of the HTTP and HTML protocols that serve as the backbone of the Web itself) by Tim Berners-Lee, on the NeXT computer and OS, Steve Jobs's ill-fated proprietary UNIX-based platform of the 90's, which was failing miserably in the marketplace, despite being quite advanced technically, and would have disappeared completely from the mainstream, just like the Amiga, the Alpha, BeOS, OS/2, and many other quirky and often cool platforms disappeared under the onslaught of the MS-DOS/Windows juggernaut, if Mr. Jobs hadn't been canny enough to convince Apple to bring in their company, at which point Steve worked his way back into the reins of power and NeXTSTEP became the foundation of the now amazingly successful Mac OS X.

The significant thing about the invention of the Web is that it might never have happened if Mr. Berners-Lee was working on a PC running Windows, because that platform simply wasn't capable of designing something like a Web browser without running into all sorts of annoyances and weaknesses of the platform. He might have written the first Web browser on an Amiga, or perhaps on a Mac, but I think it's fairly well established that NeXT had the best development environment if you wanted to create powerful and reliable software with an extremely small team (Berners-Lee worked with another computer scientist and an intern) in a short amount of time. Anyway, Windows took another 10 years before its development tools had really caught up to what NeXT (itself a very small company) created back in 1990.

Similarly, the PC hardware of the 90's left a lot to be desired. For example, non-autoconfiguring devices, such as cards for the standard 8-bit and 16-bit ISA bus included the fun activities of setting jumpers or DIP switches on the card to manually select a port range, IRQ, and possible DMA channels to use, making sure your choices didn't conflict with other cards on the system, and editing your CONFIG.SYS and AUTOEXEC.BAT startup files to inform the OS and programs of your choices, along with possibly spending time trying to get all your MS-DOS drivers to load into "high memory" in order to maximize your "conventional memory" below 640K, and a whole bunch of other annoying stuff that I remember wasting literally hundreds of hours on myself as a young PC hacker before the PCI bus came along and things slowly started catching up to what the other platforms had been doing right all along.

That's one reason I want to point out the Apple II, because Woz had "Plug & Play" functionality in the Apple II in 1977, and IBM didn't bother to make their PC similarly user friendly, even though it came out 3 years later, and had the full weight of IBM's engineering resources (compared to a single geek in a garage). IBM didn't rectify this mistake until 1987, when they introduced a new proprietary computer bus called MCA in the PS/2 series, which other vendors couldn't use in their PC compatibles without paying patent royalties to IBM. This didn't sit well with Compaq, HP, Tandy (remember them?), and the rest of the Gang of Nine, who created their own alternative architecture called EISA which also didn't catch on. VESA Local Bus was another short-lived standard for video cards, but eventually PCI won out and we're better off for it. But the "PC compatible" architecture still includes a lot of legacy junk from the ISA days, even though it's all emulated inside a single chip these days.

The Apple II has auto-configuring cards, my two Alpha workstations use PCI (including a few slots supporting the rare 64-bit variant) and the Amiga includes a proprietary auto-configuring bus called Zorro.  So at least from a usability perspective, they're all as friendly to upgrade as a PC from the year 2000, and far superior than the PC's of just five years earlier. Similarly on the OS front, as MS-DOS and Windows (particularly the 16-bit and CE variants) were similarly broken and pathetic compared to the far more capable OS's of the other platforms, and it wasn't until Windows XP that the PC-compatible industry had an OS that was both easy to use and not completely broken when compared to a typical UNIX system. You could make the same argument about the Mac's OS, which didn't really get good until Mac OS X 10.3 came out in 2003 (in my opinion).

What did UNIX (and other heavy-duty OS's like VMS) do that the wimpier platforms like DOS and Windows could not? Well, quite a number of things. Virtual memory that protected buggy apps from being able to bring down the system or interfere with each other. Support for multiple users, with file system security so users couldn't access each others files without permission or modify critical system files. The X Window System, which evolved quickly from simple beginnings because it was extensible enough to support a variety of different approaches that evolved the platform to become more user friendly and better looking, such as the popular commercial graphical toolkit called Motif (now available as open source), and today's GTK and Qt platforms that form the basis of the popular GNOME and KDE desktops on Linux and other UNIX systems. X has been so wildly successful that one of the nice features of Mac OS X (for a UNIX geek at least) is Apple's inclusion of an X server with the OS, which makes running any type of UNIX app really easy, as compared to Microsoft's half-hearted efforts to provide UNIX compatibility in Windows, which they only grudgingly seem to support.

So I'll be running UNIX on the Amiga and the Alphas, specifically the NetBSD flavor, which is open source, and well supported on a huge number (57 at last count!) of different platforms, including x86 PC's, of course. I've been running the Amiga version of the 5.x stable branch, and it has been rock solid reliable for me. I'll be switching over to the current branch that will become NetBSD 6.0, so that I can help fix bugs and submit a few changes and improvements that I've written to the NetBSD community (the current branch is where active work is done for new features).

I'll have more to say about NetBSD and the Commodore Amiga series in the next post, so stay tuned if you're interested in this sort of stuff. Comments and feedback welcome. I also have a Google+ post about some hacking I did to get CD quality audio out of the Amiga, something that I didn't think possible until I learned about the 14-bit hack, the Paula calibration hack, and the 31.5 kHz video hack.