2009
11.25

I travel quite a bit and my favorite site to use is Kayak. It has a lot of features that make it the easiest to use and best airfare search out there. Within the last year, Kayak implemented a feature that allows you to search for airfare using flexible dates. Many times I’m traveling somewhere and I just want the lowest fare within a 3 or 4 day window. Perfect, I don’t have to load up 8 windows and toggle back and forth to find the lowest airfare.

I live in Salt Lake City, and every year I travel to somewhere far away from SLC for a few weeks of vacation. In this case, I’m traveling to Saigon (Ho Chi Minh City) right after Christmas. I’m somewhat flexible on my dates since it is a vacation, so I decided to give the flexible dates feature a whirl to find me the lowest airfare.

Kayak flexible dates
The expected result is that this will produce the same results as searching for SLC -> SGN departing on Dec 26 and returning on any of the following days: Jan 09, 10, 11, 12, 13, 14 or 15.
Here are the prices broken down by day on the results page:

Kayak flexible dates

The results seemed odd. Why would Jan 12 be ~$700 cheaper than returning Jan 13? And returning Jan 11 would be almost 3 grand? That didn’t seem to make sense, so I did a regular search on Kayak departing Dec 26 and returning Jan 13. Below is the result for that search.Kayak flexible dates

The actual cheapest roundtrip airfare for that itinerary: $1565. A difference of about $500.
I did a search departing on Dec 26 and returning Jan 11, the one that should cost almost 3 grand. Actual cheapest cost: $1678. That’s a difference of about $1300.
That’s a massive price difference for those days.
Shouldn’t the prices be the same whether I use the flexible dates feature or search for the days individually? Clearly they’re not.

So what’s going on here?
After some sleuthing it became apparent what was going on. Whether or not this is entirely Kayak’s fault is up for debate, but what happens is that when you search for a flight departing on let’s say Dec 26 and returning Jan 12, you get exactly that result. Kayak goes and searches all the travel websites with those parameters. When you choose the flexible dates feature, it does not, in fact launch parallel searches and aggregate the results as expected. Instead, it takes the results of your base search dates (Dec 26 -> Jan 12), and then asks the travel websites if any of the results are also available on other days. Since itineraries are often bundled together when you search for them by 3rd party companies like cheaptickets.com, what you get is the cheapest available itinerary that is available on your base date and available on one of your flexible dates. Usually the only places that have identical itineraries from day to day are the actual airline websites, so the prices are almost always higher using the flexible dates feature. From a user perspective it doesn’t make any sense, but from Kayak’s perspective it makes sense since it saves them from having to perform extra searches (searches cost them $$$). Kayak should seriously consider revamping this feature to actually perform parallel searches, or make it clear that the flexible dates feature is not the same as searching for the days individually.

Summary:
Using Kayak.com’s flexible dates feature does not give you the same results as searching for your itinerary on the days individually. Often it will be more expensive, sometimes a lot more expensive.

Update:
Kayak’s CTO says improvements are coming to the flexible dates feature. He also says they’ll try to make the user interface more clear on this issue.

2009
03.15

phpBrainz update

I’m going to be adding some new features as well as complete the documentation for phpBrainz in the upcoming days. The trac page for phpbrainz is located here. Any requests for features are appreciated.

Update: August 14, 2010

Fixed link.

2008
11.24

This is a problem from my class in Discrete Mathematics that I found fun/interesting. You have two measuring cups. One holds 8 ounces and the other holds 13 ounces. These cups have no marks to show individual ounces. All you can measure is either a full 13 or a full 8 ounces. If you want to measure, say, 5 ounces, you can fill the 13-ounce measuring cup, use it to fill the 8 ounce cup, and you will have 5 ounces left in the larger cup.

The Problem:

Show how to use the 13 ounce and 8 ounce cups to measure exactly 1 ounce. You may assume you have a large bowl for holding liquid, but this large bowl has no marks for measuring. At the end, the bowl should contain exactly 1 ounce. Remember, you cannot mark any of the cups. You have an unlimited supply of water (if you wish to pour some out).

… Update …

Solution is below the fold.

Read More >>

2008
04.14

UPHPU Talk on memcached

This Thursday (April 17, 2008), I’ll be giving a talk to the Utah PHP Users group about memcached. Specifically, how to increase site availability using memcached. I’ll be giving a brief powerpoint overview (around 20 minutes), and then I’ll delve into some real-world scenarios where memcached can be a real life saver.

Meeting Info:

April 17, 2008 @ 7pm

Bill Good Marketing
12393 Gateway Park Place Suite 600
Draper, Utah, 84020

I’ll post all my example code and slides to this blog after the talk.

2008
01.10

Well the fall semester is finally over and in the course of working on a new open-source project involving django, musicbrainz, and lots of music files I’ve come across a situation that they gloss over in my computer science classes. Complexity analysis, in short, is the process whereby you figure out the “cost” of an algorithm. The goal is to figure out where your algorithm could operate more efficiently. You want to avoid O(n^2) algorithms.

In this project we need to scan a bunch of music files and we have two different approaches. One is to just use id3 tags, and then query musicbrainz to see if they match up, if they do then get all the track information from musicbrainz and store it in the db. The songs that don’t match up or don’t have id3 tags will have puids generated and then musicbrainz will be queried using the puid. The problem is that we are shooting for accuracy, and the logic of verifying if the id3 tags and the musicbrainz entry line up is inherently flawed since we’re still basing it off of the id3 tag. This is one of those situations where theĀ  the algorithm we choose has an impact on the quality of the data retrieved as well as the speed. In addition to that, there isn’t a direct trade-off between quality and speed because there’s more factors involved as well as possible solutions to alleviate some of the issues (like downloading the musicbrainz server to query the data locally). The speed factor is huge, but I figure that if we’re going to scan the files, then we might as well do it right the first time.

In summary, I would just like to highlight the fact that in academia the focus seems to be on cut and dry examples (like sorting algorithms) where the result is the same, there’s just different ways to get there. But if your destinations are not the same, then it becomes much harder to compare algorithms. Some would say that you can’t compare them, but in many real-world scenarios you have to compare apples to oranges and determine the best course of action.

2007
12.19

memcached + mythweb

In the course of my research on the web about how to scale websites to hundreds of thousands of users (this is the kind of stuff I do in my spare time), I came across memcached which

” … is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.”

Since we have lots of processing power for our stuff at work, we don’t have much need to implement memcached other than my own perverse desire to see how fast it really is. I began to think about things that are slow that I’d like to speed up. The first thing that came to mind was my mythweb setup. Mythweb is the web interface plugin for mythtv. It uses php and it just frustratingly slow. Viewing the listings with 500+ channels takes a long long time, so I’m going to toy around with implementing memcache to speed things up. I’m pretty excited at the possibilities this may hold for speeding this up since the slowness of mythweb is the bane of my existence. Ok, maybe I’m being a little bit dramatic but it’s still very annoying how long it takes to do anything in mythweb.

2007
10.29

First off, phpBrainz is very much being worked on. I’m trying to incorporate some added functionality and the code is messier than I would like at the moment, which is why I haven’t committed it to the SVN repository (svn://svn.zzq.org/phpbrainz). I’m also noticing that sometimes the phpBrainz requests to the MusicBrainz XML WebService inexplicably fail. Further exacerbating the problem is the lack of any error checking in that section of the code. This will be fixed. As mentioned in a previous post, I’m working on a class that can identify an album given just a directory of songs. Right now it is still very much a proof-of-concept more than anything else. One of the issues I’m running into is that sometimes the genpuid program provided by MusicIP does not generate a puid for a track. I’m not sure why, and it seems consistent with regard to the track, but inconsistent as far as format, length, bitrate, etc. Another problem with quite possibly no solution is that CD-Single releases are very very difficult to identify for some reason. My best guess after manually examining the problem is that there simply aren’t enough tracks to widdle down the collection of all possible releases. To counteract this problem I’m thinking about developing a “confidence” level that goes with the prediction. This would take into consideration the “score” attribute returned by the MusicBrainz Webservice when doing a “find” lookup (there is no score attribute when doing a “get”).

Also, I upgraded my Macbook Pro to OS X 10.5 Leopard this past weekend. It was a giant pain the arse. Most notably I ran into this gem of a problem: Leopard Bug: Can’t Login Post-Install . My roommate was able to solve it from the instructions posted in that thread (not the solution Apple posted). However that solution did not work at all for me and I was forced to break out my portable hdd and copy all my important stuff off and start from scratch. What a pain. I thought Apple stuff was “just suppossed to work.” So much for that theory. Despite that, I’m still fairly happy with my Macbook Pro purchase. As for OS X Leopard sans installation issues, it’s actually pretty nice with some cool features. I think all in all it is an improvement, except for the stupid reflective dock and the crappy new folder icons.

2007
10.17

phpBrainz update

phpBrainz appears to be used by more people than I thought. I’ve added some functionality that will be committed to the svn repository within the next few days. It adds some comparison functionality to compare releases to one another as well as comparisons between artist and track objects. The reason for adding this functionality is to allow easier filtering of results. There are many instances of albums appearing multiple times in the MusicBrainz database. Hopefully the comparison methods will allow a user of phpBrainz to more quickly filter through those results.

In addition to the aforementioned reasons, which are valid in and of themselves, is that I am working on a mp3 jukebox application that ties in phpBrainz. One of the core features is the ability to figure out the album, artist, track title, etc using the phpBrainz class, genupid from musicdns, and some clever code to figure out the best match. Yes, yes, there are a million packages and programs that do this with id3 tags. But that’s my beef, I don’t like id3. It’s non-standarized and very prone to user-error. So instead, I’ve written a small php script (under 100 lines) that will take a folder, parse through all the files, generate the puids, then do a lookup on each track using phpBrainz, and then figure out which album most commonly appears amongst the results for all the songs. There are still a lot of boundary and edge cases to test since it is quite easy to fool at this point, but with some sophisticated scoring and filtering it should work like a charm. The only problem I’m having at this point is the speed of the genpuid program and the required 1 second interval between queries to the MusicBrainz Webservice. I might consider hosting a mirror of the database on a private server to speed up the whole process. But for now, here’s an example:


jeff-laptop:~/Documents/projects/phpbrainz/examples jeff$ php processDirectory.php /Users/jeff/Music/The_Shins-Oh_Inverted_World/
Best Guess
---------------
Artist: The Shins
Album: Oh, Inverted World

Yes, I could have just parsed the directory name, but that defeats the purpose. The purpose is to develop a script that can figure out the album, given just a folder with songs in it. I hope to have more soon!

Update: without the 1 second pauses in between webservice queries, it took 9.6 seconds to generate the puids, query the webservice and figure out the right album. Not too bad if you ask me.

2007
09.05

phpBrainz documentation

I’ve created an API Guide using phpDocumentor for the phpBrainz project. It is available here: http://www.hoodlu.ms/phpbrainz_phpDoc/

2007
07.30

MythFrontend on OS X (14088)

I’ve compiled the latest svn (revision 14088) on my Macbook Pro. Here is the MythFrontend.app file.

Revision: 14088 (trunk)

I’ve uploaded it as a tar.gz file and a zip file. Take your pick, they’re the same.

Zip file: 79MB MythFrontend_r14088_trunk.zip
md5sum: bbe94922c1fac6903b462cf2d765c22f

tar.gz file: 52MB MythFrontend_r14088_trunk.tar.gz
md5sum: d8310c78fa43e96dcef62b79ec5c6a73

Let me know if you have any problems.

2007
07.24

MythFrontend on OS X

I recently (finally) was able to compile and run MythFrontend on my Macbook Pro, so I thought I’d share the compiled application for those of you who don’t want to compile it yourself. It worked on my roomate’s Macbook Pro, so I guess it should at least work for other Macbooks and maybe all Intel Macs? Note that this is only the frontend, since I don’t know if the backend works or not in OS X (my backend is a linux machine on my home network). Here are the details and the link to the disk image.

Compiled on 10.4.10 Macbook Pro

Revision 14021 – Trunk branch (uses protocol version 35)

md5 checksum: a4c8d2b6600ee3459403c7b1fb28edce

size: 52 Mb

MythFrontend – r14021 disc image

2007
06.25

After doing many upgrades with my MythTV box in the past few weeks. Here are a few helpful queries, links and whatnot.

Dish Network Channel Chart

Bell ExpressVu Channel Chart

A script and instructions for creating a user job in mythtv that will automatically flag and remove commercials from your recordings. This saves space by removing the commercials completely from the recordings. I would highly recommend that this is only used on recording schedules where you know the commercial flagging works properly.

Instructions for getting channel icons (includes a script that is very helpful).

Query to delete channels whose corresponding multiplex does not exist for whatever reason.

DELETE FROM channel
USING channel LEFT JOIN dtv_multiplex
ON channel.mplexid=dtv_multiplex.mplexid
WHERE dtv_multiplex.mplexid IS NULL;

Query to remove programs whose starttime is greater than 2 weeks into the future. This is mainly to remove weird data that gets into the program table through the EIT.

DELETE FROM program WHERE starttime>DATE_ADD(NOW(), INTERVAL 2 WEEK);

Those are just a few for now. I’ll have some more later this week (maybe tomorrow morning).

2007
06.05

Random updates

phpBrainz development has slowed down a little bit over the past few weeks. The core of the project is quite functional, and the code is usable, so for the time being I’m going to put it on hold. I should be able to revisit the project in a few weeks when I have some more time. Right now a lot of my time is going into my MythTv setup. I have two satellite cards in my MythTv box and there are just odds and ends that require maintenance that I’m in the process of automating. I will also post some updates on various MythTv related projects as they transpire.

2007
05.03

Quick status update

phpBrainz has taken a backseat over the past week due to final exams. Also, I picked up a macbook pro and I’m thoroughly pleased. My roomate has had one for a while and he really likes it (and he’s a picky person), so I was pretty sure I’d like it. I purchased the 15″ 2.16ghz Intel Core 2 duo with 1gb of RAM. I figured I can always upgrade the RAM later on if it starts to bug me. I considered getting a refurbished or used Macbook Pro, but the price difference just didn’t seem to warrant the risk. A brand new MacBook Pro after the education discount is $1800 ($200 off from retail). The educational discount for the Applecare Protection Plan (2 additional years) was $240 instead of $350. I figured that after those discounts it was probably just better to get the new one.

As far as phpBrainz stuff goes, I hope to have some time to work on it next week. I’d like to at least get all the base functionality working before I worry about the relationships and disc objects.

2007
04.24

Well, a very basic version of the library is up for public consumption. Please note that it is very basic at this point and does not contain (and will not contain) all of the features of PythonMusicBrainz2. However, I do plan on implementing all the features that I can (within reason).

A few notes though. First off, the trac server is having issues, so if it gives you some nasty error, just hit reload until it works. The box admin knows of the problem and the solution, so it should be fixed soon. Also, if you look at the code, it will look like the OO got outta control. Well it kinda did, but it’s not all my fault. You see, the XMLWebService page for MusicBrainz states that

If you are planning to write bindings for another programming language, you are encouraged to follow the programming model, object oriented schema, and terminology of PythonMusicBrainz2 as far as it makes sense for your language.

Well, I tried to emulate it a little too much I think and some of it may seem excessively object oriented. To some extent, I agree with that criticism. You will also note that documentation is scarce. I am aware of this. Sometime this week (maybe tomorrow), I’ll add more method headers and examples to the examples folder. I should have a phpDoc generated documentation page up soon in addition to the trac wiki.

2007
04.22

phpBrainz is born!

The first few steps towards a PHP client library of the musicBrainz web service are completed. I have a trac page setup for the project at http://phpbrainz.zzqforge.org. After being pointed in the right direction, Muti pointed out that the MusicBrainz folk prefer if you implement a new client library in the same “style” as the existing python client library. Some of the features I’m not going to worry about for a while, like submitting new information to musicBrainz. Instead I’m going to focus on the most common types of information retrieval from MusicBrainz and then branch outwards towards the relations, and then probably submissions to MusicBrainz. Feel free to submit bugs and feature requests on the trac page.

2007
04.21

Over 2 years ago a local friend of mine, Muti, wrote a PHP class to retrieve information from the MusicBrainz RDF web service. A while ago MusicBrainz started to push everything over to their new XML Web Service which is probably a wise move. Unfortunately, as of this writing, there are only client libraries written in C, C++, Perl, and Python. An upcoming project I want to tackle will require a very robust PHP class to handle various types of queries to the MusicBrainz database. As a result, I’ve decided to venture forth and write the class myself since I can’t find any relevant information as to why someone has not already written a class in PHP. I’ve sketched out the core components of how it should work and figure it could be mostly written in about 10 hours worth of work. I’ll hopefully have a trac page setup for the project by the end of tomorrow and some basic code up and running to work with. It will be released under the GNU General Public License

2007
04.17

PHP’s Summer of Code

PHP is once again participating in Google’s summer of code, and as such they have an expendable source of labor. Of the various projects that are going to be worked on by the interns, a couple strike me as being quite vital to the progress of PHP as a legitimate language to use with OO concepts. Of particular interest is the first one.

The PHP Interpreter uses reference counting to keep track of which objects are no longer referenced and thus can be destroyed. A major weakness in the current implementation is that it cannot detect reference cycles, that is objects that reference each other in a circular graph structure which is not referenced itself from outside the circle. Mentored by Derick Rethans, David Wang will implement a new reference counting algorithm that will alleviate this problem.

This is an extremely important and tricky issue to deal with. Basically what happens right now with php objects is that if an object reference tree ends up becoming a loop, all those objects are maintained in memory even if there is nothing referencing the root object of the circular references. Here is a small diagram for reference:

Object A –> Object B –> Object C –> Object D–> Object E –> Object B

Now let’s say that for whatever reason Object A no longer needs Object B. That’s all fine and dandy, but PHP still thinks they’re needed because something is referencing Object B. This sounds like a trivial issue to deal with, but it’s not. Part of the problem is recognizing what objects are actually being referenced and what other ones are simply in some sort of self referencing circle (or loop). This is a very tricky issue to dissect and I wish David Wang all the best in dealing with this issue. I hope that the improvements that David Wang and his mentor are striving for are achieved because it would allow for much less memory-intensive classes and patterns.