Wednesday, December 2, 2009
Let the forcible re-rankings begin ... and end
Warning: The following post is long, esoteric and inscrutable. Read at your own risk.
As you know from about four previous posts -- check my labels if you're interested in reading them -- I am involved in ranking all the movies I've ever seen using Flickchart (www.flickchart.com). At least, all the movies I've seen that exist in their database.
I've been stopping every 10,000 rankings to take a snapshot of where the list stands, noting the changes in the rankings since the last time. I do this by recording the rankings in an Excel spreadsheet, a painstaking process that consumed numerous free moments over the last eight days. My wife thinks I'm crazy. Maybe I am.
The first time I did it, just after 20,000 rankings, I recorded only the ranking itself, along with the director and year. But at 30,000 and 40,000, which is the milestone I just passed, I also recorded the number of duels for each film, as well as the percent won. This made the recording process significantly more tedious -- to get the additional stats, I had to drill down one further level into each movie.
The number of rankings interested me only as an academic exercise. I wanted to take a random selection of 10,000 rankings and see how many times a film would come up randomly for duel in that selection. I guess in my mind this would help test how successfully the site randomized the duels -- and if any of the organizers of Flickchart are reading this, I don't doubt you, I just like this kind of anal number-crunching exercise. (As for the percent won, I recorded it mainly because I was already on the page, so might as well.)
The results validated the success of Flickchart's algorithm. The most-ranked film in those 10,000 was Saw with 24 duels, while only Ang Lee's The Hulk and Joel Schumacher's Bad Company didn't come up once. Based on the 2,565 films I had ranked, every film should come up an average of 7.79 times per 10,000 duels (which feature 20,000 films). Of course, my own ability to transcribe proved plenty fallible. For example, according to my records, Eyes Wide Shut had two fewer rankings at the end of that 10,000 than at the start, which obviously means I recorded its initial total incorrectly. And when I summed the total, I came up with 20,011 films ranked, when it should have been 20,000. That could explain the high total for Saw, which may have had 69 duels at the start of that period, rather than 59.
But as I said, this is all academic. I'm more interested in the project of getting the films ranked in the correct order, and thought I had a new method of doing this, which I planned to implement starting at duel #40,001.
You see, my great disappointment from the previous 10,000 rankings was that there wasn't a single change in my top 20. Several weeks ago I bemoaned that Cocoon had made it to #20, and taken root. Because none of the 100-200 movies that would beat it have randomly come up against it, it's still there. And because Cocoon should only come up 7.79 times out of every 10,000 -- it actually came up 7 times this period -- the chances of it dueling against one of those 100-200 are fairly remote. Not to mention that those 7.79 times are only good for this period of 10,000. Next time I'll have more movies, which means even fewer ranking opportunities.
By the same logic, films that deserve to be in the top 20 -- Raising Arizona, for example -- also fight only 7.79 other titles per 10,000. For the sake of simplicity, let's round that up to eight. Eight divided by 2,564 -- you have to subtract Raising Arizona because it can't duel itself -- is .003, or less than 1/3 of 1%. So on average, Raising Arizona will duel only .3 % of all my movies over the course of 10,000 rankings. Since the top 20 movies represent two-and-a-half times those eight titles, Raising Arizona has about a .75% chance of even coming up against any movie in the top 20. Then it actually has to beat that movie in order to move up, and perhaps it would only beat half the movies in my top 20. Which gives Raising Arizona roughly a .38% chance of actually moving into my top 20 during the course of 10,000 rankings. This math may be a bit fuzzy, and the films presented for Raising Arizona to duel are probably not entirely random, but you get the idea.
The solution? Make the duels less random.
That's right, Flickchart gives you the ability to forcibly re-rank any film at any time. You simply go to the movie's home page, click the link that says Re-Rank This Movie, and boom -- you are presented with three new duels, each of which features the movie in question. Once you've selected the winner in the first duel, you get the second duel, and so forth. In fact, I'm sure this feature exists precisely to address the Raising Arizona dilemma described above.
This is probably as good a time as any to tell you about a second concern I've had. Namely, when my friend Don and I originally discovered the site, we didn't know how to get more titles eligible for ranking than the original 350 or so presented to us. It took us about 8,000 rankings before we figured it out. That means that about 350 films have significantly more duels than the rest of them, which means they have had significantly more opportunity to entrench themselves in the rankings. Only three films in my top 20 -- Glengarry Glen Ross, Poltergeist and Cocoon -- are from outside that original list of 350, having won the lottery that Raising Arizona could not. Five of my bottom six films are also from that original 350. The more films there are -- and there are more and more all the time -- the fewer opportunities any of them has to move. The main difference is, those 350 films had an advantage at one point, and they used it to build a home in the top 20.
The solution, part II? Forcibly re-rank all the films that were not part of the original 350, until they have reached approximately the same number of duels as those 350. To figure out what that benchmark should be, I took the average number of duels of those 350 films, which came out to 67. And decided that any film that did not have at least 67 duels would be re-ranked until it did.
Here was my logic: Although the films with more than 67 duels would still have more, and would inevitably increase their own totals as part of the re-ranking process, they would ultimately gain little ground relative to the films that were being re-ranked dozens of times. And once all the films had reached that threshold, tens of thousands of rankings from now, the advantage of the original 350 would be nullified. Only then would the playing surface be leveled, and would I truly be able to tell which films belonged in the hallowed top 20. If I went alphabetically, it would take quite some time for Raising Arizona to come up, but eventually, it too would have its day in the sun. And by doing all the films rather than just select ones, I'd be preserving the element of randomness/fairness that I consider to be a defining and indispensable part of the Flickchart process.
So yesterday after work, I sped through the last 600 titles of my snapshot in a flurry of blurry vision, alphabetized my titles, and got started. The numbered titles came up first.
Robert Luketic's 21 had plenty of rankings, so the first one up for re-ranking was Mark Christopher's 54. And here was where Don, a veteran of the forcible re-ranking system, had warned me I would run into trouble. He described a fluke in Flickchart whereby you'd get the same titles over and over if you tried multiple re-rankings in a row. As each session of re-ranking involves three duels, and as 54 needed 43 more duels to reach 67, that meant I'd need either 14 or 15 consecutive sessions of re-ranking to reach the threshold. (I apologize if all these numbers are killing your brain).
At first, it seemed I was having the same experience Don had. 54 came up against Basquiat, lost to it, and then won two lesser duels. When I refreshed the page and submitted it for re-ranking again, Basquiat was again its first opponent. The cracks in my seemingly perfect system became clear. But on the third time through, Basquiat was not the first opponent, nor any opponent at all. In the dozen more rounds, it came up again once. I decided I could live with that.
300 and 1408 each had enough duels, so next up for ranking was 2012. And here a different kind of disturbing thing happened. As I drove it upwards from four times ranked to 67, it jumped in the rankings from #1495 to #389. Now, I already feel a bit of shame for liking this movie more than most people whose opinions of film I trust. Having it land in my top 400 of all time was just too much. Nonetheless, I felt satisfied that the process had worked correctly. Flickchart had spoken, with me as its medium, and who was I to contest the results?
So I moved onward: (500) Days of Summer, 10,000 B.C., 10 Items or Less, 101 Dalmations. It wasn't until after I'd finished 13 Conversations About One Thing this morning that I found a problem that did stop me in my tracks. And it related to something I'd only done originally as a curiosity.
Namely, the winning percentage of these films was getting artificially inflated.
You see, the way the re-ranking works, the film does not get put up against totally random opponents. Flickchart assumes that if you want to re-rank a film, it's because you are unsatisfied with its position in the standings. Therefore, it uses a goal-oriented approach to the three duels it presents. From what I can gather, the first is a film that is higher than it in the standings -- not significantly higher, but higher. If your film beats that film, you get another film that is again higher in the standings. If it wins there, you get a third higher film. Either it beats that film and lands one ahead of it, or it loses and stays one ahead of the film in the second duel.
But if the film loses its first duel, there is no reason to make the second duel against a film that is even higher. It's the transitive property: If Film A is better than Film B, and Film B is better than Film C, then there is no way that Film C is better than Film A. So when 54 lost to Basquiat, it got to fight something much easier. And it probably beat that film, and may have beaten the next one as well. (I may be anal, but I can't remember every little detail.) Every time this happens, 54 loses to one good film and then either beats or loses to two films that are not as good. End result: 54 is "randomly" dueling against a majority of not very good films.
And the new stats didn't lie. After 30,000 rankings, 54 had won 18.75% of its duels. After 40,000, it was down to 16.67%. But after 40,043 rankings -- the last 43 of which were duels in which it competed -- 54 had gone all the way up to a 39.39% winning percentage, not only reversing the trend, but more than doubling its peak winning percentage. Reason? Duels against crappy films.
But why should the winning percentage matter, right? It's "all academic," right?
I may have thought so, but now I don't. And it goes back to the randomness/fairness that I think is essential to Flickchart. There was something pure about the 16.67% victory rate of 54, a movie that is probably in the bottom 20% of movies I've seen. It's a victory rate that could only have seemed so perfectly accurate because it was up against a random assortment of movies, rather than a purpose-driven selection chosen to help re-rank it.
And suddenly I felt that my forcible re-rankings of about ten films had somehow diluted the purity of this wonderful process. For this same reason, I'm especially reluctant to just cherry-pick the films I'm rooting for. Forcibly re-ranking only select films would blacken Flickchart's eye even more.
As much as I would like to steer this resource in a particular direction, and as much as I would like to see Raising Arizona carve out its rightful place in the top 20 sooner rather than later, I now think that I may just have to be patient. It may not be a perfect ranking system, but it's the best one I've got, and even describing it in this way is vastly underselling it. More correctly, the very existence of Flickchart is a revelation for a list-obsessed film lover like me. It's just that patience is a hard virtue for someone who's obsessed.
I do still have the nagging desire to level the playing field. I do still regret the inequity between those original 350 films and all the ones that have followed. But the only way I can think of to truly make things equal would be to start over again. To wipe all my Flickchart rankings, and to enter all the titles at the same time. To make them duel with equal footing from the start. But even this would only be a temporary solution. With each new film I see, and each new film I add, a gulf is created between the films that have been ranked more, and those that have been ranked less.
Besides, to consider starting over, after all this work ... well, that might just put an obsessive like me in his grave.
All hail Flickchart. Thy will be done.
Subscribe to:
Post Comments (Atom)
2 comments:
But I think that there is a way to make forced re-ranking work for you…I’ve advocated for this method with you before, but when I re-rank a film in Flickchart, I re-rank it as you’ve described until it loses two battles. This works on a few different levels for me as it serves to level the playing field that for you is still dominated by the “original 350” films (films that weren’t in that original 350 but deserve to be very high up in my rankings are given an easier path to the top rankings), and because I stop after the film loses two battles I think the randomness of Flickchart is somewhat preserved and winning percentages aren’t artificially inflated for films that don’t deserve to be inflated.
I admit that this method violates the absolute purity of Flickchart, but I tend to think that this is a minor transgression that doesn’t sully the validity of my results. I can tell myself this because I’ve been strict in my methodology: films that are sitting inside the top 1000 are not subject to the forcible re-ranking (I chose this benchmark because I set 1000 as an imaginary dividing line between good and bad movies (and of the 2597 films that are currently in my Flickchart, I would say that it is conservative to estimate that I only consider 1000 at least in the generic “good” category) and I decided that if a film was within the threshold of “good” then I was comfortable enough letting the randomness of Flickchart work its magic); I only forcibly re-rank films that I have at one time described as “great” (meaning that they would have ended up in my top 5 of the year they came out in); and I always stop after two losses (even if those two losses are to the same film). Oh yeah…I will also apply this process to new films that I am adding to FlickChart for the first time, giving those newly added films a chance to compete for top prizes early on.
I tend to think of Flickchart as a religious experience, and what good religion doesn’t have seemingly random rules to protect its integrity? For me, I need my religion to instantly reward me with some concrete returns on my investment and thus I need Flickchart to more quickly show me an adequate representation of my rankings. However, I do also respect the time-investment aspect of Flickchart and fully believe that over time, the randomization of the rankings will give me the complete and perfect picture of how I feel about every single film I’ve ever seen…this will take forever, and since this is my religion, I’d rather get a little peek into nirvana now. Maybe this is cheating, or maybe my Flickchart-based religion is just a more liberal sect than yours.
As for your half-cocked idea of wiping the slate clean and re-ranking everything from scratch, it actually COULD work if you took it one step further to preserve the purity of the system…maybe you should start over with all 2500+ films you currently have in Flickchart. Get them all in to your Flickchart and then rank them against each other on a even playing field for a set amount of rankings (30,000??). During this period, do NOT add any new movies to Flickchart. Once you get to your pre-decided ranking benchmark, you would need to start all over again with the original 2500+, but you would also add in all the new movies you’ve seen into Flickchart. It would be interesting to see how your top 20s stack up against each other after a few cycles. That’s a lot of work just to remain pure…but that’s religion for you…
If you ever want to talk with us on the phone sometime, Jeremy and I (us 2 guys running and developing Flickchart) would love to chat with you about the site. Your blog posts are awesome, and it's great to see how much you're enjoying our site and what you're getting out of it. Send us an email and we'll find a time to talk with you! Happy ranking!
Post a Comment