This week I invested a bunch of time in improving my poker hand converter. Most online poker sites allow you to record your hands for later review, in some kind of plain-text format. Several sites will record any hand you observe, meaning you just have to leave a window open to gather data.
Unfortunately, every site has a different format and the formats are not well-specified. For example, several sites will include player chat in the record, with nothing to indicate what text is chat and what is actually part of the hand. Also, there aren't many restrictions on what players may choose as a handle, and things get tricky when a user named "$0.02" sits down. My converter parses the text input from one of several poker networks and converts it into a standardized internal format.
At the start of the week, my parser failed on around 10% of the hands in my database. I'm pretty happy with how much lower I was able to push the failure rate:
A failure rate of 0.018% is pretty nice!
I did find some hands that were completely bogus--I examined by hand and found them to be incomprehensible. For example, there's a bunch where a player folds... and then bets later. And another bunch where a chunk of the hand history is clearly missing. Odd.
My parser handles 8 different poker networks at the moment (Poker Stars, Party Poker, Ultimate Bet, Absolute Poker, Prima, CryptoLogic, Pacific Poker, and Full Tilt Poker).