Hack on Drizzle Full-Time for Rackspace!
Given the current state of the economy, here's a quick job plug for
anyone interested and qualified.
At the Drizzle Developer Day on Friday, I got to meet Adrian Otto
from Rackspace. Rackspace has
a cloud offering (think Aamazon EC2) that's called Mosso and is willing to employ full
time developers who spend all their time working on Drizzle.
Here's what he sent to the mailing list.
I was speaking with Eric Day at the developer conference,
and I mentioned that Rackspace is wiling to employ full time
developers for the specific purpose of furthering the Drizzle
project's mission. He suggested that I email you on this list becuase
he expected there would be interest in this offer. If you work on the
project now part time, and want to make it a full time job working
exclusively on the Drizzle project, let me know. The Rackspcae Cloud
believes in open source, and we want to do our part to make Drizzle a
wild success.
Talking with him a bit, the rationale is simple: Rackspace wants to
offer the best cloud resources they can. Part of that means having
infrastructure that their customers need and works well. They're
betting the Drizzle is part of their future, and hiring a few people
to work on it makes that future a reality sooner than later.
It looks like Mark Callaghan likes
the idea too.
Anyway, ping me if you're interested and I'll put you in touch. (comments)
Publication date: 2009-04-28
more
Slides from What Craigslist wants and needs from Drizzle
As I previously mentioned, on Friday I attended the Drizzle Developer Day at Sun in Santa Clara. While there I had the chance to speak to the group while everyone ate their salad, pizza, and cookies.
The talk was titles "What Craigslist wants and needs from Drizzle" and is available as a Google Docs presentation here. I've also embedded a version of the slides below.
I should note here, as I did at the talk, that this presentation is neither comprehensive or completely representative. That is to say that I'm sure there are things I've forgotten. Plus, the fact that I was working with MySQL in other high-volume web shops before coming to Craiglist means that there's definitely some personal bias and pet peeves addressed in there too.
Anyway, that's what I presented.
Thanks to the fine folks at Sun (soon to be Oracle) for hosting and organizing the day. And special thanks to the Drizzle developers for getting together and showing the rest of us how things work and taking time to talk about their plans. (comments)
Publication date: 2009-04-27
more
MySQL and Percona Conferences Rocked
Now that it's Friday, I can finally come up for air and say that both the MySQL and Percona Conferences (which I wrote about earlier) surpassed my expectations. Having the two going on semi-concurrently worked out pretty well. At no time did I find myself without at least two sessions I wanted to see. Often times I had to really cut lunch short to make sure I didn't miss anything.
Other MySQL Conference veterans I asked has very similar responses.
There's a ton of other stuff to digest, and I hope to write up some thoughts in the coming days and weeks.
As usual, a big part of the conference for me was being able to meet up with people I don't see often or who I've never met face to face. Meeting up with people solving similar problems at places like Facebook, Google, Mixi, and other high-traffic sites is invaluable.
Being able get a good sense of what new storage engines are out there and how modern hardware is changing database systems (SSDs, multi-core servers) proved to be very educational in ways I didn't expect.
All the politics and posturing around the Oracle/Sun buyout weren't as significant as I'd expected. People really wanted to get down to business and technology.
I enjoyed giving my talks and answering questions about problems that we have and haven't solved yet at Craigslist.
I'm heading over to Drizzle Devloper Day in a couple hours. (comments)
Publication date: 2009-04-24
more
MySQL and Percona Performance Conferece Lineup
Amidst all the
Oracle/Sun/MySQL news
today, the MySQL
Conference kicks off this week. So I just spent a few minutes
putting together my picks for the sessions I'd like to attend at the
MySQL Conference and
the Percona
Performance Conference
(schedule).
There's quite a lineup and I have some hard choices to make. Both
groups have put together excellent events. And, wow, there
are a lot of new storage engines
and appliances coming out.
To make my life easier, I'm putting the list of interesting
sessions from both conferences here so I can try to decide where to
spend my attention.
Sessions I'm Presenting or Part Of
I'm
presenting MySQL
and Search at Craigslist on Tuesday morning and am part
of The
Great Open Cloud Shootout on Wednesday morning.
On Friday, I'll be at
the Drizzle
Developer Day to talk about "What Craigslist Needs from
Drizzle."
Tuesday Sessions: MySQL
8:30am - State of the Dolphin - Karen Padir
9:15am - This is Not a Web App: MySQL at Google - Mark Callaghan
10:50am - MySQL and Search at Craigslist - Jeremy Zawodny
11:55am - InnoDB: Innovative Technologies for Performance and Data Protection - Ken Jacobs, Heikki Tuuri
2:00pm - Falcon Storage Engine - Designed for Speed - Kevin Lewis, Ann Harrison
3:05pm - The PBXT Storage Engine: Meeting Future Challenges - Paul McCullagh
4:25pm - Solving Common SQL Problems with SeqEngine - Beat Vontobel
5:15pm - Hadoop and MySQL: Friends with Benefits - Frank Mashraqi
Wednesday Sessions: MySQL
8:50am - The Great Open Cloud Shootout - panel
10:50am - Build Your Own MySQL Time Machine - Chuck Bell, Mats Kindahl
11:55am - Using Q4M: A Message Queue Storage Engine for MySQL - Kazuho Oku
11:55am - libdrizzle: A New Client Library for Drizzle and MySQL - Eric Day
11:55am - libdrizzle: A New Client Library for Drizzle and MySQL - Eric Day
2:00pm - Maria: The New Transactional Storage Engine for MySQL - Monty Widenius
2:00pm - SAN Performance on a Internal Disk Budget: The Coming SSD Revolution - Matthew Yonkovit
2:00pm - Crash Recovery and Media Recovery in InnoDB - Heikki Tuuri
3:05pm - MySQL Performance on EC2 - Mark Callaghan
4:25pm - Perl Stored Procedures for MySQL - Antony Curtis
4:25pm - High Availability and Scalability Patches from Google - Ben Handy, Justin Tolmer
5:15pm - Optimizing MySQL Performance for ZFS - Allan Packer, Neelakanth Nadgir
5:15pm - Redundant Storage Cluster: For When It's Just Too Big - Bob Burgess
5:15pm - Inserts at Drive Speed: Designing a Custom Storage Engine for Write-Mostly Applications - Ben Haley
Wednesday Sessions: Percona
9:00am - Maria In Depth - Monty Widenius
9:55am - The Return of Gearman - Eric Day
11:15am - Fighting Replication Lag - Peter Zaitsev
12:45pm - Evaluating Disk Backends for MySQL Servers - Ewen Fortune
1:35pm - Database Performance with Proxy Architectures - Robert Hodges
2:00pm - Covering Indexes: Orders-of-Magnitude Improvements - Dr. Bradley C. Kuszmaul
5:00pm - Sphinx and MySQL: A Perfect Match - Andrew Aksyonoff
6:15pm - InnoDB Performance Tuning - Peter Zaitsev
7:05pm - CouchDB: Behind the Buzz - Jan Lehnardt
7:55pm - Linux Filesystems: Who, What, and Where - Stewart Smith
9:10pm - Open Q&A: Performance - panel
Thursday Sessions: MySQL
8:30am - The SmugMug Tale - Don MacAskill
10:50am - SQL is Dead - Monty Taylor
10:50am - Map/Reduce and Queues for MySQL Using Gearman - Eric Day, Brian Aker
10:50am - Dormando's Proxy for MySQL - Alan Kasindorf
11:55am - Memory Management in MySQL and Drizzle - Stewart Smith
11:55am - Improving Performance by Running MySQL Multiple Times - MC Brown
2:00pm - MySQL Row Change Event Extraction and Publish - Gene Pang
2:00pm - InnoDB Performance and Usability Patches - Vadim Tkachenko, Ewen Fortune
2:50pm - Make Your Life Easier with Maatkit - Baron Schwartz
2:50pm - BLOB Streaming: Efficient Reliable BLOB Handling for all Storage Engines - Barry Leslie
3:50pm - Database We Can Believe In: Stories from the Front Lines (of the Obama Campaign) - many speakers
Thursday Sessions: Percona
9:55am - Pushing the Envelope - Don MacAskill
10:50am - The Life of a Dirty Page - Mark Callaghan
1:35pm - High Performance MySQL from a Boring Architecture - Baron Schwartz
3:15pm - Hypertable - Doug Judd
4:30pm - Drizzle's Approach to Improving Performance of the Server - Jay Pipes
7:50pm - MySQL Replication: Getting The Most From Slaves - Peter Zaitsev
9:05pm - Open Q&A: Feature Request Bonanza - panel
Friday: Drizzle Developer Day
Drizzle
Developer Day is on Friday at Sun in Santa Clara. I'm looking
forward to many of the talks (some of which will be completely
over my head but interesting anyway).
Back To Work!
And, with that, I need to go work on my presentation! See you at
the conference... I'll try to post interesting
tidbits on Twitter and tag
'em
with #MySQLConf
like others appear to be doing.
I just found out that I can share my public MySQL Conference Schedule. (comments)
Publication date: 2009-04-20
more
Oracle Buying Sun, Gets MySQL
Interesting news this morning, just as the 2009 MySQL Conference is starting. As is being reported all over the place, Oracle has agreed to buy Sun at $9.50 per share, giving them to a ton of great technology (Solaris, ZFS, MySQL, DTrace, etc.).
One one of the biggest threats to Oracle's core database business (at the low end, at least) for a while now has been MySQL. And now they're poised to own MySQL after Sun bought it not long ago. (It seems like yesterday that Oracle bought Innobase.)
As I noted a while back, the MySQL landscape is changing.
This news is sure to make the conference more... interesting.
Oracle, please get the InnoDB team together with the MySQL team and see about GPLing ZFS. (comments)
Publication date: 2009-04-20
more
Sponsor Our Ride For Diabetes (Tour de Cure 2009)
In early May, Kathleen and I will be participating it the Tour de Cure 2009, a bike ride to raise awareness and money for Diabetes. Craigslist (my employer) is sponsoring a team that we'll both be riding on. Collectively, our team is trying to raise $75,000 during this years ride.
If you have a few bucks to spare for a good cause, please consider sponsoring me or sponsoring my wife (or both!). It's for a very good cause.
We're both riding the 25 mile course and would love even a $1/mile contribution. As a bonus, Craigslist is matching all our donations. So if you donate $25, your contribution becomes $50 thanks to the company's generosity.
Here are links for a bit more information:
Our Team Page
Kathleen's Page
My Page
You can visit either of our pages to pledge on-line. And if you're interested in riding, visit our team page.
Thanks for any support you can offer!
(comments)
Publication date: 2009-04-06
more
Janmedia to build Strayer University's 2009 Virtual Commencement Site
Washington DC - March 19th, 2009 DC based Digital Agency Janmedia and Strayer University have teamed up to build Strayer's 2009 Virtual Commencement website. The new application will provide an interactive venue for Strayer's graduates to participate in a commencement ceremony recognizing individual achievement.
Publication date: 2009-04-02
more
The Real or Official MySQL? Does Not Matter!
Yesterday Patrick Galbraith asked What is the official branch of MySQL? which got a lot of attention, including on Slashdot (and the token PostgeSQL comments quickly appeared).
Here's the funny thing. It doesn't matter anymore. Patrick's question is interesting in an academic sense, but it's mainly a distraction from what really matters. (Hint: What's the official Linux and who really cares? Ubuntu? RedHat? Debian? CentOS?)
Storage Engines
Nowadays what matters is the set of available storage engines. InnoDB, Percona's XtraDB, PrimeBase's PBXT, Maria, Falcon, and several others are available or will be soon. I predict that for the foreseeable future, any MySQL distribution or derivative must support the storage engine plug-in API that MySQL 5.1 defined. And since that's the case, it largely won't matter which flavor you using.
Protocol(s)
Look at what's happened in the world of key/value databases in the last few years. More than a few of them speak the memcached protocol as either their native and default or an optional add-on. I suspect the same thing will be the case here. All MySQL distributions and derivatives will speak the "traditional" MySQL protocol (just like memecached has the old protocol). Some of them, notably Drizzle, will have other (newere, better) protocols available as well (much like memcached has the new binary protocol).
Summary
In summary, the choice of MySQL version or derivative won't matter as much as you might think because they'll have the same Storge Engine plug-ins available (thanks to the shared plugin-in API), they'll all speak a common protocol (this may not be true for replication--watch that area closely), and will largely offer the same subset of SQL and SQL extensions.
They'll all be supported by different groups/companies (including some "database appliance" vendors), will all be tuned differently and aimed at slightly different uses cases, and will certainly benefit from a lot of cross-pollination.
That doesn't sound so bad to me.
The fact that nobody can point to the "real" MySQL in a few years just won't matter. Does anyone ask (anymore) which is the "real" Linux? Nope. And for very similar reasons. Think of MySQL as "kernel" and Storage Engine as "filesystem" and you'll realize we've been down this road before.
We're looking at the upgrade from 5.0 to 5.1 soon at Craigslist and don't know if we'll be using InnoDB or XtraDB yet. Time will tell.
See Also: The New MySQL Landscape, which I wrote a few months back--before a good chunk of the MySQL team had left Sun. (comments)
Publication date: 2009-03-31
more
Garlic Shrimp and Scallops Recipe
Last week we tried a new Garlic Shrimp recipe that was so simple and delicious that we planned to try it again. On Saturday my wonderful wife came back from the local Safeway and presented me with 1 pound of shrimp as well as about 1/3rd of a pound of small sea scallops.
Here's what the final product looked like:
Preparation is simple and quick.
Cut 2 red chilies lengthwise and remove the seeds.
Rinse the shrimp and scallops, keeping the separate.
Using a garlic press, crush 6-9 cloves of garlic.
Add 5-6 tablespoons of olive oil to skillet or wok. Put the wok on high heat.
Once the oil is hot, add the garlic, chilies, and shrimp. Stir frequently. After about 2 minutes, add the scallops. Continue cooking for another 2-3 minutes until the shrimp are pink and the scallops are just starting to get a gold color on the outside (they'll still be tender inside).
Serve and enjoy! (comments)
Publication date: 2009-03-30
more
Enable Visual Effects in Ubuntu to Increase Performance
For a while now, I'd been running Ubuntu 8.04 on my Thinkpad T61 (work machine) with Visual Effects disabled.
Why?
There were weird bugs with compiz and xterm that caused corruption at times. So I shut it off and never thought about it again. But a few days ago, I upgraded to 8.10 despite the apparent increase in WiFi related lock-ups I can expect to see (apparently I don't have the Intel wireless in this machine... grumble).
Switching virtual desktops, or "Workspaces" as they're called, seemed to be even slower than before--almost intolerable. Just for kicks I decided to go play with the settings.
Imagine my surprise when switching that selection from "None" to "Normal" resulted in an dramatic increase in virtual desktop redraw perfomance.
Yay!
Counterintuitive, but yay anyway. (comments)
Publication date: 2009-03-25
more
Aircraft Insurance Surprise
I haven't flown my glider much in the last year and probably won't be flying it again for many months. While that may not be ideal, it means I can spend less money by not paying for an annual inspection and can greatly reduce or eliminate the insurance costs. Or so I thought.
It just so happens that my insurance carrier emailed the other day to ask about renewing my policy (it's that time of year). I explained that I probably wouldn't be flying it and would probably let the policy lapse. The countered with an offer of "storage only" or "ground" coverage, which means that they'd still insure it for non-flight related damage.
Now gliders are kind of expensive to insure in the first place. The annual insurance bill is roughly the same as it is for our Cessna 182Q (which is worth twice as much as my glider). So I was looking forward to paying a lot less.
Wrong!
It turns out that moving to storage only coverage still costs roughly 67% (2/3rds) of what the full flight coverage is. I'm still trying to process that figure. That's like State Farm Insurance telling me that if I agree to keep my car in the garage for a year, they'll give me a 33% discount.
Apparently, (1) there is a lot of overhead in the insurance industry, and (2) they think I'm far more likely to encounter non-flying damage.
And, the best part is this... If I were to cancel coverage all together for the year, I'd have higher rates when I come back next year because of discounts I've accrued with them. "If you let this policy go and then come back later, the new policy offered will be about 15% higher in cost just due to the loss of those discounts." Strangely, I thought those discounts were the result of earning additional ratings (like my Commercial) and gaining experience and flight time.
It's no surprise that the first four letters of the insurance company most glider pilots use are "Cost", huh? (comments)
Publication date: 2009-03-20
more
New Craigslist Search Features
I haven't said a lot here about what I've been working on at Craigslist recently. But Craig mentioned me today in his blog and that made me remember that I should say something. :-)
Much of my work has been behind the scenes infrastructure stuff, but some of that is translating into new features that craigslist users can see. And, as of this morning, a lot more users are seeing the fruits of that labor.
As I noted a few weeks back in Sphinx Search at Craigslist, I've been hacking a lot on search. Here's a screen shot to show you what I've been calling "nearby search" (though "nearby results" is probably more appropriate).
If you run a search in a city and there aren't many results, we'll also run the search in nearby areas to see if we can find matches there too. The above example was a search for "2008 mazda" in my hometown of Toledo, Ohio. The "nearby" results are clearly separated from local matches and local matches are still given priority.
The feedback has been generally positive so far. Though, with any change, some folks aren't happy. I can't say it's going to stay in this exact form. We may need to tweak the interface, the radius of the nearby search areas, and so on. But on the whole I think it's a helpful improvement when you're looking for something that's a bit harder to find and you're willing to drive an hour or two.
As of earlier today, it's available in most smaller and medium sized US cities. It'll probably come to the remainder of cities before long too. I've been testing it for about a week and a half, starting with about a dozen cities and then adding about twenty more late last week. This morning I mostly flipped the big switch.
Of course, this opened the flood gates for similar feature requests: custom radius searches, state wide searching, search ALL of craigslist, etc.
In related news, a couple months back I expanded the search help page to include advanced search syntax, including grouping, negation, OR queries, and more. (comments)
Publication date: 2009-03-06
more
Janmedia Teams Up with Fairfax Partnership for Youth
Washington DC
Publication date: 2009-03-04
more
Washington DC Agency Collaborates with Washington Gas Light Company
Washington DC
Publication date: 2009-02-20
more
Windows 7 Impressions
I'll admit it. I still run Windows on a few machines--mostly because I have software that needs it (like flight planning or my scanner tools). And it's good on a notebook where drivers are tricky in Ubuntu at times.
But I've also been using Windows XP Professional on all my Windows boxes (one desktop, one laptop, and one HTPC) for a long time now. However, as of a couple days ago I'm running the Windows 7 Beta on my Thinkpad T61. And you know what?
I completely agree with the reviews I've seen. It's good. I basically never touched Vista (since it was teh suck) but Windows 7 is snappy, easier to use, and the transition from XP isn't that hard at all. Plus it has drivers for everything.
This definitely doesn't feel like a beta at all. In fact, it reminds me of the Windows NT 4.0 beta days. I ran the beta as my desktop operating system for quite some time and loved it.
For a long time I believed that nothing produced by Microsoft would displace Windows XP Professional, but I'm really starting to think they've got a starting chance. And if it's even a bit faster and leaner when the full release comes, that's all the better.
I just hope there's an in-place upgrade option for those of us using the beta. And I hope they're smart about the pricing--especially if they really want to get folks off of XP. (comments)
Publication date: 2009-02-13
more
White House CTO? Bigger fish to fry first...
Over on the Sunlight Foundation blog, Ellen Miller asks White House: Where is the CTO?. Pardon my bluntness, Ellen, but what are you smoking? Don't you think there are higher priorities right now?
It seems to me that Obama and his administration have their hands more than full working on the economic problems we're facing along with rebuilding some of our important international relationships. I'm as much of a technology geek as the next guy, but it really won't bother me if the punt on the whole CTO thing for a few months while some of the bigger fish are fried.
I can't say quite why, but this call for immediate action on a CTO feels like a bit of headline grabbing and irresponsibility at the same time. Sure, they could come out and name a CTO tomorrow and I'd applaud the move. But I really hope they're keeping their priorities in check. Part of being a good leader is deciding what can wait and what cannot. Appointing a CTO can wait. Fixing our economy cannot.
Update: It looks like Kara has jumped on this too. (comments)
Publication date: 2009-02-12
more
Playing With CouchDB: First Impressions
About a week ago, Nat
posted Open
Source NG Databases on O'Reilly Radar. That caught my interest
because I'm playing with some "alternative" databases for some of our
data at Craigslist. Don't get me wrong, MySQL is great. But MySQL
isn't well suited to every use case out there either. (I'll talk more
about
this at
the MySQL Conference.)
Meanwhile, I
left a
comment on that posting about CouchDB and have been playing with
it a bit more since then--mostly loading in test data, figuring out
the data footprint, performance, etc.
Overall, I'm impressed and encouraged. I agree with
what Ben
Bangert said. The simple API is great but the lack of a schema to
worry about really makes my life simple in this application. I don't
have any initial plans for views, but writing them in Javascript is an
interesting idea. I can definitely appreciate the flexibility there.
And having good replication built-in solves one of my big needs.
I'm sure my thinking will have evolve after I've loaded a few
hundred million documents in, but so far I'm really liking it. The
CPAN modules
in Net::CouchDb
do a pretty good job and get you up and running quickly. I had a
knee-jerk response to tweak a few things there but quickly realize
that they're far from being the bottleneck anyway.
It seems that without any tuning or fancy work, I can get about
75-100 inerts/sec on my desktop class Ubuntu box (Intel Core 2 Duo,
2.66GHz, 1GB RAM, single 80GB SATA disk). That's not bad for
out-of-the-box performance. And doing the math on space used for a
document set (after compaction), I'm seeing roughly ~3KB/doc. That's
a bit more than I expected but really not bad at all.
I wonder if there's a future for gzip compression in CouchDB. Or
maybe we should just use ZFS...
(comments)
Publication date: 2009-02-11
more
DC Based Digital Agency Deploys New Center for Security Policy Website
Washington DC
Publication date: 2009-02-03
more
Janmedia has an Optimistic Outlook on 2009
Washington DC
Publication date: 2009-02-01
more
Making Homemade Pasta is Fun, Easy, and Delicious
Over the last year or so we've slowly been accumulating new kitchen toys and cookbooks. And we've been experimenting with new recipes during that time. See Jeremy's Crockpot or Slow Cooker Chili Recipe for an example.
But things seem to have been kicked into a higher gear recently. You see, we asked for (and received--thanks Mom and Dad) a KitchenAid Artisan 5-Quart Stand Mixer back during Giftmas. And my wonderful wife got me the KitchenAid KPRA Pasta Roller Attachment and the The Complete Book of Pasta and Noodles to go along with the mixer.
My expectation was to mostly use the mixer for the occasional bread mix (which I haven't tried yet) or cookie dough (ditto). But Kathleen is a big pasta fan and the meals in the book sounded quite tasty.
So a few weeks ago I began to experiment with making my own pasta. Much to my surprise, it's a fairly easy and fun process. To make basic pasta, all you really need is some eggs and flour. In fact, 3 larger eggs and 2 cups of all purpose white flour is enough to get started.
The real trick, as it turns out, is getting the moisture level of the pasta right and working with the resulting dough. You want it to stick together just the right amount with the right texture. No too dry and not too wet or sticky. And you need to let it "rest" long enough that you can work with it.
Anyway, last night I made my third round of basic pasta and feel like I'm getting the hang of it. Combined with grilled chicken breasts, grilled asparagus, and a tasty olive oil and garlic sauce, it's just fantastic. Fresh pasta really tastes so much better than the dried pasta you buy at the store. It's hard to describe the difference. It's lighter, tastier, and less prone to sticking. You simply must try it.
I highly recommend that pasta book too. If you're getting serious about pasta and want a variety of recipes (both for the noodles and sauces), it's a wealth of good information.
Next we need to try some of the more interesting pasta recipes that use more exotic flours and spices added in.
Pictures of my first and second pasta making adventures are on Flickr in Making Pasta.
Have you made your own pasta? What's your experience been like? (comments)
Publication date: 2009-02-01
more
Time To Change
It's the end of the year; a time for nostalgia and looking back on the past year. Nick Finck, Digital Web Magazine's founder and publisher, recalls where we've been, what we've achieved, and discusses the potential for dramatic change in where we are going as a publication. This is your chance to influence the future structure and focus of Digital Web.
Publication date: 2009-01-27
more
My Dumb Cat Video
It's Friday and this is the Internet, so I present to you Cats Eating Chicken, or "My Dumb Cat Video" (embedded below too).
The background is that we had a bit of leftover grilled chicken the other night and decided to bust it up and feed it to the cats. Amusingly, they all got together to partake of the feast, but a couple of them got curious about the camera too.
Both Timmy (white and grey) and Thunder (mostly grey) give the camera a sniff or two. My boys (Barnes and Noble) remained single-mindedly devoted to devouring the meat.
Anyway, we found it rather amusing.
Have a good weekend... (comments)
Publication date: 2009-01-27
more
The New MySQL Landscape
Interesting things are afoot in the MySQL world. You see, it used to be that the MySQL world consisted of about 20-40 employees of MySQL AB (this funny distributed Swedish company that built and supported the open source MySQL database server), a tiny handful of MySQL mailing lists, and large databases were counted in gigabytes not terabytes. A Pentium III was still a decent server. Replication was a new feature!
Hey, anyone remember the Gemini storage engine? :-)
How times have changed...
Nowadays MySQL is sort of a universe onto itself. There are multiple storage engines (though MyISAM and InnoDB are still the popular ones), version 5.1 is out (finally), and the whole company made it over 400 employees before it was gobbled up by Sun Microsystems (a smart move, IMHO, though history will judge that) a while back.
If I had to guess 5 years or so ago what would be interesting to me today about MySQL, I'd have been really, really wrong. The future rarely turns out like we think. Just ask Hillary Clinton.
Here's a little of what's rattling around in the MySQL part of my little brain these days...
Outside Support, Patches, and Forks
The single most interesting and surprising thing to me is both the number and necessity of third-party patches for enhancing various aspects of MySQL and InnoDB. Companies like Percona, Google, Proven Scaling, Prime Base Technologies, and Open Query are all doing so in one way or another.
On the one hand, it's excellent validation of the Open Source model. Thanks to reasonable licensing, companies other than Sun/MySQL are able to enhance and fix the software and give their changes back to the world.
Some organizations are providing just patches. Others, like Percona are providing their own binaries--effectively forks of MySQL/InnoDB. Taking things a step further, the OurDelta project aims to aggregate these third party patches and provide source and binaries for various platforms. In essences, you can get a "better" MySQL than the one Sun/MySQL gives you today. For free.
Meanwhile, development on InnoDB continues. Oh, did I mention the part where they were bought by Oracle (yes, *that* Oracle) a while back? Crazy shit, I tell you. But it makes sense if you squint right.
Anyway, the vibe I'm getting is that folks are frustrated because there's not a lot of communication coming out of the InnoDB development team these days. I can't personally verify that. It's been years since I corresponded with Heikki Tuuri (the creator of InnoDB). So folks like Mark Callaghan of Google have been busy analyzing and patching it to scale better for their needs.
And we all benefit.
Drizzle
Taking things a step further yet, the Drizzle project is a re-making of MySQL started primarily by Brian Aker, who worked as MySQL's Director of Architecture for years. Brian is now at Sun and, along with a handful of others at Sun and elsewhere, is ripping out a lot of the stuff in a fork of MySQL that doesn't get used much, needlessly complicated the code, or is simply no longer needed.
In essence, they're taking a hard look at MySQL and asking what it really needs to provide for a lot of it's uses today: Web and "cloud" stuff. He visited us at Craigslist a few months ago to talk about the project a bit and get our input and feedback. I believe it was that day I joined one of the mailing list and started following what's going on. Heck, I even build Drizzle on an Atom-powered MSI Wind PC regularly.
It's great to see a re-think of MySQL going on... keeping the good, getting rid of the bad, and modularizing the stuff that people often want to do differently (authentication, for example).
It's even better to see the group that's hacking on it. They really have their heads on straight.
Unanswered Questions
Why is all this even necessary? Are the "enterprise" customers and their demands taking focus away from what used to be the core use and users of MySQL? Is Sun hard to work with?
It's clear that both the MySQL and InnoDB teams could be doing more to help. But having worked at a large company for long enough, I realize that things are rarely as simple as they should be.
Will this stuff get integrated back into mainline MySQL? Will Linux distributions like Ubuntu, Debian, and Red Hat pick up OurDelta builds? What about Drizzle?
Will Drizzle hit its target and be the sleek and lean database kernel that MySQL once could have been?
Hard to say.
It's hard to guess what the future holds and too easy to play armchair quarterback about the work of others. But these are question worth wondering about a bit.
What's it all mean?
Nowadays MySQL has a much slower release cycle that it used to. It's still available in "commecial" and free ("community") releases. There's still a company behind it--a much larger one in fact. But one that also has a vested interest in showing how it works better on their storage appliances or 256 "core" computers and whatnot.
Clustering is still very niche. Transactions are not.
Meanwhile, all the cutting edge stuff (at least from the point of view of scaling) is happening outside Sun/MySQL and being integrated by OurDelta and even Drizzle. The OutDelta builds are gaining steam quickly and Drizzle is shaping up.
Heck, I'm hoping to get an OurDelta box or two on-line at work sometime soon. And I'd like to put a Drizzle node up too. I want to see how the InnoDB patches help and also play with the InnoDB plug-in (and its page compression).
The next few years are proving to be far more interesting than I might have expected from a project and technology that looked like was on a track straight for Open Source maturity.
And you know what? I like it. (comments)
Publication date: 2009-01-27
more
Talk Announcement: MySQL and Search at Craigslist
I recently learned that my talk has been accepted for the 2009 MySQL Conference in Santa Clara, California. It is currently scheduled for Tuesday the 21st and titled MySQL and Search at Craigslist.
Here's the abstract (which I've promised to expand upon soon):
Millions of people search for things every day on craigslist: tickets, cars, garage sales, jobs, events, and so on.
This talk will look at the recent evolution of database and search architecture at Craigslist, including performance, caching, partitioning, and other tweaks. We'll pay special attention to the unique challenges of doing this for a large data set that has an especially high churn rate (new posts, edits, and deletes).
And we strive to do this using as little hardware and power as possible.
If you're coming to the conference, drop by and harass me. :-)
If you're not sure check out the full schedule--there's a lot of good stuff packed into the conference already and a lot of talks are still not even posted. (comments)
Publication date: 2009-01-27
more
Twitter as a Dynamic DNS Service
I occasionally wish to know the IP address of my home Cable Modem or DSL connection but don't really care if it's available in DNS or not. It occurred to me that if I could programmatically detect the IP change, I'd be able to notify myself via Twitter.
At first, I wanted a simple web service that'd tell me my IP address--something like WhatIsMyIP.com but an API suitable for simple scripting.
Not finding anything, I created this massive PHP script instead and hosted it on my server:
That made it easy to write a simple bash shell script that can be run from cron every few minutes. It uses curl to hit that script and compares the result with the previous result (stored in ~/.last_ip). If they differ it updates the file and tells twitter, again using curl.
Of course, I had to create that new twitter account and then follow it in my main account. But, hey, that wasn't so hard. Now I have a Web 2.0ish social dynamic DNS thingy that uses Twitter.
Aren't I cool and buzzword compliant?! (comments)
Publication date: 2009-01-27
more
Don't Bet on Moore Saving Your Ass
Over on the 37signals blog, DHH writes Mr. Moore gets to punt on sharding. His argument is basically that if you continually delay fixing your data storage and retrieval layer, Moore's Law will be there to save our ass--over and over again.
Bzzzt. Wrong answer.
Depending on future improvements to fix your own bad planning is a risky way to build an on-line service--especially one you expect to grow and charge money for.
It's easy to forget history in this industry (as Paul pointed out in the comments on that post). There was a point a few years ago when people still believed the clock speed of CPUs would be doubling roughly every 18 months for half the cost. Putting aside that Moore's Law is really about transistor density and not raw speed, we all ended up taking a funny little detour anyway.
Until recently, the sweet spot (in terms of cost and power use) was probably a dual CPU, dual core server with 16 or 32GB of RAM. But soon that'll be dual quads with 32 or 64GB of RAM. And then it'll be quad eight core CPUs with 128GB or whatever.
But notice that nowadays we're not all running 6.4GHz CPUs in our servers. Instead we're running multi-core CPUs at slower clock speeds. Those two are definitely not equivalent.
A funny thing happens as you add cores and CPUs. You begin to find that the underlying software doesn't always... get this... scale. That's right. Software designed in a primarily single or dual CPU world starts to show its age and performance limitation in a world where you have 8, 16, or 32 cores per server (and more if you're running one of those crazy Sun boxes).
You see, David is talking specifically about MySQL (and probably InnoDB), which is currently being patched by outside developers precisely because it has multi-core issues . Its locking is expensive and not granular enough to utilize all those cores. It's expensive in terms of memory use too. And there are assumptions built into the I/O subsystem that don't scale well in today's world of fast multi-disk RAID units, SSDs, and SANs. People are hitting these issues in the real world and it's definitely becoming a serious bottleneck.
See Also: The New MySQL Landscape.
Moore's Law is no silver bullet here. A fundamental change has occurred in the hardware platform and now we're all playing catch-up in one way or another.
I'll discuss this a bit in my upcoming MySQL Conference Talk too. The world is not nearly as clear or simple as DHH is suggesting. Perhaps they can get by with constantly postponing the work of sharding their database, but that doesn't mean you should follow their lead. (comments)
Publication date: 2009-01-27
more
Jeremy's Crockpot or Slow Cooker Chili Recipe
I've been making variations on a crock pot chili recipe for the last few months and finally have a combination we really like.
Ingredients
1.5 - 2 pounds of ground beef
1 medium red onion
1/2 medium or large yellow onion
1/2 - 1 cup of frozen yellow sweet corn [see notes below]
1 green bell pepper
1 jalapeno pepper
1 14-16 oz. can of petite diced tomatoes [see notes below]
2 15-15 oz. cans of pinto beans [see notes below]
1 11.5 oz. can of V8 juice (hot if you can find it, regular otherwise)
1/2 tsp. salt
1 tbsp. chili powder
1 tbsp. cayenne pepper
Directions
Chop the red onion and add it with the ground beef. Brown over medium heat.
While the meat and onion are browning, add beans, V8, and spices to the crockpot. Chop green pepper, yellow onion, and jalapeno and add them as well.
Once the meat has browned and onions softened, add them to the crock pot as well.
Cook on low heat for 6-8 hours, stirring occasionally.
Serve with freshly made corn bread or fresh noodles. Optionally top with shredded cheddar cheese and onion. Enjoy with a nice cold beer, if that's your sort of thing. :-)
Notes
Safeway sells 15 oz. cans of Pinto Beans that are "Mexican Chili Pinto Beans." They work very well if you can find them.
Safeway also sells 14.5 oz. cans of Petite Diced Tomatoes with Garlic and Olive Oil. Also highly recommended. Some people use canned Stewed Tomatoes in their recipes but I find them to be too chunky. I like a nice uniformly thick chili.
Trader Joe's sells some truly excellent frozen Organic Super Sweet Cut Corn. Get it if you can. They also sell a good corn bred mix. (comments)
Publication date: 2009-01-27
more
A Job That Matters
In Tim O'Reilly's Work on Stuff that Matters he elaborated on three criteria that constitute "stuff that matters" for his readers:
Work on something that matters to you more than money.
Create more value than you capture.
Take the long view.
A number of folks where surprised when I announced that I was joining craigslist back in July but it's an organization that I really admire. Having been there about 6 months now, I can definitely say that it's a job that matters based on Tim's thinking and my own.
Every time I meet someone and tell them where I work, their reaction is quite positive. They've had a good experience with craigslist, like the service, love the philosophy, and so on. Craigslist matters ordinary people--not just technology nuts.
Similarly, I know that we create more value than we capture. The majority of our service is free and usage seems to be growing all the time. People I talk to get such good responses with craigslist classifieds (compared to, say, newspapers) that I know we're giving people more than their money's worth.
As for taking the long view, I think being a non-public company helps that a lot. I've rarely thought about what "the next quarter" will bring. It's quite a contrast from my years at Yahoo. When we're discussing technology infrastructure, I'm always trying to think ahead a year or two (or more). But the day to day ups and downs just don't feel as important the way we operate. I like that.
All in all, I've been very happy with the change and am glad that Tim posted something that helped me to explain what I like about it. (comments)
Publication date: 2009-01-27
more
Sphinx Search at Craigslist
A couple days ago, Andrew posted a news item titled Sphinx goes billions to the Sphinx web site.
Last but not least, Powered By section, now at 113 sites and counting, was updated and restyled. I had long wondered how much Sphinx search queries are performed per month if we sum all the sites using it, and whether we already hit 1B page views per month or not. Being open-source, there's no easy way to tell. But now with the addition of craigslist to Powered By list I finally know that we do. Many thanks to Jeremy Zawodny who worked hard on making that happen, my itch is no more. :-)
Well, I guess the cat's out of the bag! My first project at Craigslist was replacing MySQL FULLTEXT indexing with Sphinx. It wasn't the easiest road in the world, for a variety of reasons, but we got it all working and it's been humming along very well ever since. And I learned a heck of a lot about both Sphinx and craigslist internals in the process too.
I'm not going to go into a lot of details on the implementation here, other than to say Sphinx is faster and far more resource efficient than MySQL was for this task. In the MySQL and Search and Craigslist talk I'm giving at the 2009 MySQL Users Conference, I'll go into a lot more detail about the unique problems we had and how we solved them.
For what it's worth, the implementation isn't really done. I did update the search help page on the site to reflect some of the capabilities (hey, look! OR searches!) but there are features I have planned that I'd like to expose as time allows. (comments)
Publication date: 2009-01-27
more
Obama Interest in Kenya During the Primaries
A little over a year ago, my wife and I traveled to Africa for our honeymoon and wedding (and a lot of sight seeting--more on that over the next few weeks). Part of that time was spent in Tanzania and part of it was in Kenya. This was during the craziest part of the 2008 presidential primary race when Hillary Clinton had the perceived lead over Barak Obama and every other would-be democratic nominee.
What was surprising to us is how aware of Obama and the primary process the average folks in Kenya appeared to be. We were asked on many occasions if Obama was going to be President of the United States of America. Even back then, over a year ago when he was in second place, there was an undeniable interest, hope, and genuine excitement about his prospects.
Given the post-election turmoil that erupted in Kenya near the end of our trip, it's no surprise that Kenyans were celebrating his election and inauguration a few days ago. If anyone needed hope for change and a promising future after political unrest, it was the people of Kenya.
When is the last time that a presidential election had such a far-reaching affect on ordinary people? (comments)
Publication date: 2009-01-27
more
Hetch Hetchy and Wapama Falls Hike Pictures and Panorama
Last weekend we flew up to Pine Mountain Lake and drove into Hetch Hetchy to hike to Wapama Falls. The weather was fantastic for mid-January: clear and in the high 50s to low 60s. After about 15 minutes on the trial, jackets and outer shirts came off, and we were down to jeans and t-shirts.
Kathleen took several pictures of the Yosemite Valley area and the Sierra Nevada Mountains on the flight up with our Canon SD800 IS. Here are a few of them.
You can see the full set in the Flickr photo set titled January 2009 Flight to Pine Mountain.
I shot about 250 more with my Canon 300D and you can see a few here.
The full set is on Flickr in Wapama Falls Hike in Hetch Hetchy Valley.
The picture at the top of this post was stitched together with autostitch on Windows and touched up in Picasa.
There are still more pictures of the hike that she took with the SD800 IS to come as soon as I get them on-line... You can always watch my full photo stream is here. (comments)
Publication date: 2009-01-27
more
Is The Web Really Helping Us Find New Music?
With exactly one month to go until Christmas, Digital Web Magazine is changing pace for our last article of 2008. Tempers have flared in recent weeks over our coverage of idiosyncratic CSS techniques, so we thought we
Publication date: 2008-11-26
more
Happy Turkey Day!
Janmedia would like to wish everyone a Happy Thanksgiving!
Publication date: 2008-11-25
more
Opa! is Good Greek Food in Willow Glen
A month or so ago, the long under-construction Opa! opened its doors on Lincoln Ave in downtown Willow Glen. Wanting to try it for a while, we walked down on Friday night for dinner. And we were not disappointed.
The Good
The menu is straightforward and has a good variety of Greek food. We ordered the Keftedes (Greek Meatballs) as an appetizer. The dish consisted of two well prepared meatballs and an excellent sauce.
For the main courses, we selected a Beef Souvlaki Pita (hers) and Seafood Souvlaki (mine). Both came with the most excellent Opa! Fries. (Think: garlic fries with a twist.) The food came in a reasonable amount of time and our waitress was very friendly and helpful. It was very tasty and portions were not excessively large either.
Their drink menu contains a selection of beers and a good selection of Greek wines as well. The wine we sampled was quite good and is apparently available at Costco. Needless to say, we're going to have to verify that for ourselves. ;-)
The interior is well decorated. I especially like the large TV monitor that shows what songs are playing over the sound system.
Pricing was reasonable. Dinner for two with drinks, an appetizer, and desert (Baklava!) was about $50. Not the sort of thing we do often, but definitely not out of line with other favorite eating establishments.
The Bad
Opa! is a small sit down restaurant with tables for 2 and 4 (mostly) that also handles to go orders. It's often very full and could definitely benefit from more space inside. As a result, the tables are fairly close together and the waitresses occasionally bump into customers. But space isn't easy to come by in Willow Glen's downtown.
More
Opa! has over 60 ratings and reviews on Yelp and is also discussed a bit on Willow Glen 2.0.
If you're looking for good Greek food in the area, I'd highly recommend giving Opa! a try. (comments)
Publication date: 2008-11-24
more
Bash Trick: Watching Multiple Background Jobs
I recently had a need to add some error checking to a bash script that runs multiple copies of a Perl script in parallel to better utilize a multi-core server. I wanted a way to run these four processes in the background and gather up their exit values. Then, if any of them failed, I'd prematurely exit the bash script and report the error.
After a bit of reading bash docs, I came across some built-ins that I hadn't previously used or even seen. First, I'll show you the code:
wait.sh
This is the bash script that runs the parallel processes and gathers up the exit values.
#!/bin/bash
FAIL=0
echo "starting"
./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &
for job in `jobs -p`
do
echo $job
wait $job || let "FAIL+=1"
done
echo $FAIL
if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi
sleeper
And here's the Perl script that I wrote in order to test the functioning of wait.sh. It accepts to arguments. The first is the number of seconds to sleep (to simulate the delay associated with doing work) and the second is the exit value it should use (any non-zero value indicates a failure).
#!/usr/bin/perl -w
use strict;
my $time = $ARGV[0] || 1;
my $exit = $ARGV[1] || 0;
sleep $time;
exit $exit;
Discussion
New to me was the use of let to do math on a variable so that I can count up the number of failures. Is there a better way? There's no native ++ operator in bash. Similarly, using jobs to get a list of pids to wait on provided to be a very useful idiom.
The code is straightforward and works for my purposes. But since 99% of my time is spent in Perl rather than bash, I wonder what I could have done differently and/or better. Feedback welcome.
And, if this is at all useful to you, feel free to take it and run...
Finally, I'm starting to really dig gist.github for showing off bits of code. It's good stuff. (comments)
Publication date: 2008-11-22
more
RESTful CSS
With every web developer or agency worth their salt releasing a web application these days, it was inevitable that attention would eventually turn to how best to manage CSS within a modern MVC framework. Steve Heffernan pairs stylesheets with REST principles to present a new approach to CSS architecture.
Publication date: 2008-11-19
more
TV Watching and Happiness
In one of those "well, duh!" moments the other day, I came across a headline on Slashdot that said Unhappy People Watch More TV. Given that I mostly stopped watching TV quite some time ago and consider it to be one of the more rude devices in our culture, I clicked thru to read about how others have discovered what I'd already guessed was true...
A new study by sociologists at the University of Maryland concludes that unhappy people watch more TV, while people who describe themselves as 'very happy' spend more time reading and socializing. 'TV doesn't really seem to satisfy people over the long haul the way that social involvement or reading a newspaper does,' says researcher John P. Robinson. 'It's more passive and may provide escape--especially when the news is as depressing as the economy itself.
Imagine that... Stagnation and exposure to negative information leads to sadness. It goes on...
The data suggest to us that the TV habit may offer short-run pleasure at the expense of long-term malaise.' Unhappy people also liked their TV more: 'What viewers seem to be saying is that while TV in general is a waste of time and not particularly enjoyable, "the shows I saw tonight were pretty good.
Another shock. TV provides only a short-term reward (kind of like a drug hit).
If this resonates with you a bit, or you suspect deep down that there's more going on with the influence of TV in our culture, I highly recommend reading Amusing Ourselves To Death by Neil Postman if you have not already.
It's too bad this stuff doesn't get taught in school--where, I'm told, teachers are using PowerPoint more and more.
*sigh* (comments)
Publication date: 2008-11-18
more
Asynchronous MySQL Client in Perl
I recently found myself wishing for an async library for MySQL. My goal is to be able to fire off queries to a group of federated servers in parallel and aggregate the results in my code.
With the standard client (DBD::mysql), I'd have to query the servers one at a time. If there are 10 servers and each query takes 0.5 seconds, my code would stall for 5 seconds. But by using an async library, I could fire off all the queries and fetch the results as they become available. The overall wait time should not be much more than 0.5 seconds.
While I found little evidence of anyone doing this in practice, my search led me to the perl-mysql-async project on Google Code. It's a pure-Perl implementation of the MySQL 4.1 protocol and an asyncronous client that uses Event::Lib (and libevent) under the hood.
The code contains little in the way of documentation or examples, aside from the simple bundled test script. After a bit of mucking around with it, I managed to cobble together a working example. It looks like this:
#!/usr/bin/perl -w
use strict;
use Event::Lib;
use Data::Dumper;
use MysqlAsync;
use AsyncCaller qw/schedule/;
$Data::Dumper::Terse = 1;
$|=1;
my $expected_results = 25;
my $results = 0;
my $dbh;
for (1..$expected_results) {
# my $secs = int(rand(5));
my $secs = rand(5);
my $query = qq[select sleep($secs)];
schedule(0.001, sub{
my $dbh = MysqlAsync->new(
database => {
host => "localhost",
port => 3306,
database => "mysql",
passwd => "xxxxxx",
user => "root",
},
connect_timeout => 1,
max_requests => 25,
db_timeout => 10,
# logfile => "/tmp/mysqllog",
);
$dbh->get_array($query, \&result );
});
}
event_mainloop();
exit;
sub result
{
my ($result) = @_;
if (defined $result) {
print "result: " . Dumper($result);
} else {
print "error: " . Dumper($dbh->error());
}
$results++;
# all done?
if ($results == $expected_results) {
exit;
}
}
__END__
Sure enough, that code runs in just a bit more time than the longest query it executes, rather than the sum of all the query times.
What still surprises me is that this code doesn't appear to get a lot of use (or at least discussion) in the real world. In the PHP world, the mysqlnd driver offers async queries.
So count this as my contribution to demonstrating that Perl can do async MySQL queries too. (comments)
Publication date: 2008-11-15
more
Post-Election Thoughts: Equal but Not
I'm happy that Barack Obama won the election. I think it's time to stir things up a bit.
What really bothers me is that fact that we still don't have equal voting in this country. We certainly have the technology to share vote counts quickly and efficiently, so who not just do that? Why screw around with an electoral college anymore?
It seems disingenuous at best and an outright lie at worst to call Obama's victory a "landslide" when the actual percentages of the popular vote (the only vote that should count) were so close. Yet the large difference in electoral vote counts is supposed to make us believe that something very different happened. And the media was more than happy to play along with that deception (what a surprise, huh?).
It should not be possible to lose by having more votes than your opponent, but it is. Why does nobody seem to care? (See: electoral college, specifically this.)
Of all the countries that have tried to copy our model of democracy in the last 200 years or so, can you name a single one that adopted the electoral college as a piece of their political infrastructure?
I'd love to have my vote count as much as everyone in all the other states.
Why is that so hard? (comments)
Publication date: 2008-11-14
more
Review: Website Optimization
Is your website firing on all cylinders? We take a look at a book that has a little something for everyone, from marketers to developers, to help you polish your pages. Andrew Stevens returns to Digital Web to review Website Optimization.
Publication date: 2008-11-12
more
Are Accessibility Statements Useful?
Leona Tomlinson is back for a second article, detailing where accessibility statements fit into Web sites today.
Publication date: 2008-11-12
more
Extract: Know Your Site
In this extract from his forthcoming book, the Website Owner
Publication date: 2008-11-06
more
Interview: Aarron Walter
This week, Digital Web
Publication date: 2008-10-29
more
Kick Ass Fonts in Ubuntu: 3 Easy Steps
A few days ago I made yet another tweak to my Ubuntu laptop to make the fonts look a little better. The result is that I'm now quite happy--impressed even. Here are the three things I've done to make my day-to-day work easy on the eye.
First, enable subpixel smoothing in the System > Appearance control panel.
For a long time that's all I had done was was reasonably happy. Things looked okay but not great. But I used GNU Emacs for most of my coding and wanted fonts there that looked at good as those in gnome terminal.
That led me to the second tip: install emacs-snapshot and use the GTK version. Then you can add this to your ~/.Xresources file:
Emacs.font: Monospace-10
And bingo! The same font that's in your terminal is in Emacs.
That made me happy in Emacs, but my Firefox fonts were still a bit sucky. So when I read Tweak Your Font Rendering for Better Appearance in Tombuntu, I had to give it a try.
I created a ~/.fonts.conf file and added this to it:
<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
<match target="font">
<edit name="autohint" mode="assign">
<bool>true</bool>
</edit>
</match>
</fontconfig>
I logged out and back in and suddnely found myself staring at fonts in Firefox that looked as good as I've seen in Safari on a Mac.
That's all there was to it for me: subpixel rendering, emacs-snapshot, and enabling hinting via a .fonts.conf file.
It's worth noting that you can go even farther with the advanced font settings, but I really haven't needed to go that far yet. (comments)
Publication date: 2008-10-23
more
Random Updates
I've got several random things to say to the interwebs but none of them merit a blog post individually...
First off, I love data. But I hate the fact that the spreadsheet in OpenOffice 2.x and Gnumeric both have row limits of 65,536. I don't know who missed the boat on 32 and 64 bit CPUs, but it's rather annoying! And, yes, twitter people, I know that 65,536 is a 16 bit limit--not 8. I was trying to make a point.
Secondly, Yahoo can haz layoffs (again). Having lived through 3 rounds of layoffs in my 8.5 years at Yahoo, I know what that feels like. :-( If you're a kick-ass Perl hacker or an excellent systems and network administrator who'd like to work at a great company in San Francisco, let me know.
Thirdly, the dumbest bugs are often the ones that have been in your code a long time and are incredibly easy to keep glossing over as you read and re-read it.
Fourthly, Tie::Syslog is pretty handy but seems to not like being used multiple times in the same app. Each instance seems to think that it has the same "identity." Anyone seen that before? I haven't dug into that yet but probably will soon.
Finally, we're out of town for a few days while the house is being fumigated for termites. And we brought all four cats with us. That what I call an adventure.
Now back to your regularly scheduled... uh, stuff. (comments)
Publication date: 2008-10-22
more
Everything You Know About CSS Is Wrong
Digital Web running a provocative article on CSS techniques? Shurely shome mishtake! In this extract from the forthcoming Sitepoint book of the same name, Rachel Andrew explains how you can use tables for layout in modern web design with a clean conscience.
Publication date: 2008-10-22
more
Yahoo! Search Taste Test
In today's coverage of the new Yahoo! Search radio advertisements,
Erick Schonfeld at TechCruch says:
So can an advertising campaign change any of that? Search
is not like a soft drink. People use the search engine that they think
can do the best job in helping them find things. Now, maybe Google has
brainwashed all of us to believe that it does indeed produce more
relevant results. And in a blind taste-test more people might choose
Yahoo's results. But if that is the case, I'd rather take an
interactive quiz that puts each search engine to the test and make my
own decision. That would go much farther to convince me to switch than
Yahoo's current creative.
Funny he suggests that. I remember suggesting exactly that a few
years back when I worked in the marketing group for Yahoo! Search.
I suggested we do something inspired
by Twingine but which hid the
engine identity and let users judge for themselves.
Why didn't it happen? Because some of the same people who were
convinced that Yahoo! Search was "just as good as" Google (and
better in some cases, they said) were afraid that people would
realize that this was not the case.
The cognitive dissonance was amusing, but it was also frustrating
and stupid. "Either we believe we're better or we don't... Which
is it?" is the sort of argument I tried to make.
I guess that question eventually answered itself.
Oh, well... (comments)
Publication date: 2008-10-15
more
Great HTPC Wireless Keyboard: Adesso WKB-4000US
A few weeks ago I asked for HTPC Wireless Keyboard and Mouse Recommendations and got some excellent suggestions. After reading reviews on-line and checking out the various specs, I settled on the Adesso WKB-4000US.
I decided on this keyboard partly because I've liked previous Adesso keyboards and partly because it seemed to be the right combination of price, size, and range for our use. I was not disappointed.
The keyboard feels very solid--not cheap at all. Range is excellent and the feel, while not excellent, is better than I expected. The trackpad works well and no special drivers were required for Windows XP. It Just Works. They keyboard even came with a set of batteries!
If there's any negative to mention, it's that the USB dongle is about twice as long as I'd have liked. But that's a pretty small price to pay for being able to sit on the couch and control the home theater PC without reception concerns.
If you're looking for a good wireless HTPC keyboard, I highly recommend the Adesso WKB-4000US. It's available on NewEgg.com as well as on Amazon.com. (comments)
Publication date: 2008-10-09
more
head Conference Q&A with Aral Balkan
Aral Balkan talks to Digital Web about the conference: an experiment in online communities. Bringing a collection of varied and insightful speakers from around the globe to thousands of attendees without even stepping outside.
Publication date: 2008-10-08
more
< head > Conference Q&A with Aral Balkan
Aral Balkan talks to Digital Web about the <head> conference: an experiment in online communities. Bringing a collection of varied and insightful speakers from around the globe to thousands of attendees without even stepping outside.
Publication date: 2008-10-08
more
Back Seat Flying in the Citabria: Tailwheel Fun
About a week ago I finally got the chance to work on the back seat flying with my instructor in our Citabria. I'm not new to flying from the back. I've done so in gliders for a few years now, but I knew this was going to be a bit different.
I wasn't concerned about the actual flying. Flying is pretty much the same no matter where you are. The only question is how many of the instruments you can see from the rear seat. Luckily, I found that I was usually able to see the two or three that mattered: airpseed, altimeter, and engine RPM.
What I knew would be the most interested was the takeoff and landing--especially the landings. Being a tailwheel airplane, the nose is naturally much higher when on the ground or in a landing attitude. That means dramatically restricted visibility from the back. On takeoff it's not too bad, since you can pretty quickly get the tail flying and level out the airplane.
On landing, however, you end up using a lot of peripheral vision and a bit of faith. (This is assuming a normal three-point instead of a wheel landing. See also: Conventional Landing Gear).
But a funny thing happens after you practice it a few times: you start to get the hang of it and realize that it's not all that different than landing from the front seat. You're still trying to stay lined up on the runway and fly the airplane until it lands. In fact, you're trying to keep it from landing as long as you can so that when it finally touches down there's not enough energy for it to start flying again.
Aside from the satisfaction of learning something new and building confidence in flying your airplane, being able to fly from the back seat has another benefit.
You can now have your wife fly from the font seat and get used to the airplane that she'll be using to finish up her training too. And I may be biased, but I think she did a pretty darn good job on her first flight from the front seat. :-)
I'm not sure I'd want to put a non-pilot up front--or at least not someone who hasn't been around airplanes a lot. There are a some controls that I cannot reach from the rear. But I'd feel pretty comfortable giving rides from the back now. (comments)
Publication date: 2008-10-08
more
Steve Fossett Wreckage in Mammoth Lakes Area
Various news sources are reporting
that Steve
Fossett's wreckage has been found in the vicinity of Mammoth
Lakes, California. There are a few interesting bits about what I've
heard and read so far, but first have a look at the terrain that area.
View Larger Map
There are ski runs nearby and the crest of the Sierra Nevada mountains isn't far either. The Mammoth Yosemite Airport sits at an elevation of 7,128 feet and the nearby ridges and peaks easily top 10,000.
In fact, the impact appears to have happen around 10,000 feet and was consistent with flying directly into terrain. The fuselage apparently disintegrated and the engine was found several hundred feet from the impact location.
My suspicion is that Steve had some sort of in-flight medical problem. He was a very experienced pilot and likely wouldn't have been doing any acro at that altitude (though is plane probably could have). And even if there was engine trouble, he'd have had the sense to try to get it down safely or at least slowly.
The NTSB should be able to figure out if the engine was running at the time of impact. But first they have to get all the wreckage transported to somewhere it can be studied.
The other puzzling thing is that he was supposed to be out look at dry lakes. There aren't really any dry lakes up there. Maybe 15-30 minutes away, down in the Owens Valley and beyond, but not up near the Sierra Crest.
I'm curious to hear what the NTSB and FAA are able to figure from all of this. (comments)
Publication date: 2008-10-03
more
Programming Annoyance: Libraries that Exit on Me
This is something that's been bugging me for a long time now. Over the years, I've come to realize that programming time is 10% about writing the code to do the work, 70% about figuring out where failures might occur and dealing with them, 10% about documentation, and 10% about documentation. (That last 10% may be substituted with Desktop Tower Defense or something equally time wasting.)
Or something like that. The point is that writing the code to do what I want isn't hard. It's dealing with all the other things that do--especially error conditions. There are so many weird corner cases to consider. And when you're working on code for a high volume web site that has its servers under load 24 hours a day, it doesn't take long to encounter those odd situations.
Murphy is always watching.
Years ago, after battling similar problems at Yahoo, I began to develop certain ideas about how errors should be detected, handled, and reported. An important idea here is that the developer should always be in control of when the script/program/process dies. Aside from something truly fatal (like a segfault) library routines should detect errors and report them back to their caller in the form of a known-to-be-bad return value.
The problem is that I keep running into code I want to use that breaks that rule in multiple places. In Perl terms, that means that I'll be happily testing my code and suddenly something goes wrong and my script dies in a place I didn't expect. Upon digging into it, I find that the CPAN library I'm using has something like this lurking in it:
if (not $good) {
Carp::croak("bad stuff happened!");
}
Or...
if (not $good) {
die "badness here!";
}
Sigh!
This means I have to read the code a bit more and see if I can discern why the developer wants my script to die in some cases, but in others he's content to just do this:
if (not $good) {
$@ = "bad things happened";
return undef;
}
What is it about some errors that makes them fatal while others aren't so bad that I'm deemed able to deal with them? Why has this developer taken that decision away from me? It makes no sense at all.
What this means is that I then need to litter my code with ugly crap like this:
eval {
$object->methodThatMayDie;
};
if ($@) {
# handle error here
}
The problem with that, aside from the fact that I'm dealing with another developer's inconsistent coding, is that it pollutes my code and forces me to make yet another frustrating decision.
Do I use a small number of big eval blocks and give up knowing exactly where the code died? Or do I pollute my code with a larger number of smaller eval blocks so that I can react to specific problems with a more specific solution? That means the module developer would have had to document which methods or functions may die on me. Otherwise I have to go trudging through their code and waste my time figuring that out. Guess which is more frequent.
Or do I override the module's use of die or Carp or whatever. I can do it, but that has other side effects I probably don't want to deal with either.
Why do I even need to deal with this in the first place? Can't people provide consistent interfaces? Is there something so bad about returning an error code and leaving it up to the user of your code to decide how to handle error conditions?
Maybe they do want to exit() or die(). Maybe they want to retry the logic after waiting a bit. Maybe they want to page someone and log the failure. Maybe...
You get the idea.
This whole concept of "fatal" exceptions seems wrong to me. Unless things are so bad that the kernel is going to kill my process, I should be the one in charge of deciding when my code will blow up. And I shouldn't have to do extra work to asset that authority. Should I?
I know that in the Java world, it's common to do a bunch of stuff in a big try block and then try to figure out what, if anything, blew up later. But I'm a firm believer in dealing with specific problems at the exact place they occur.
I really wish more people thought that way. It'd make my life easier. (comments)
Publication date: 2008-10-02
more
Concept Design Tools
Does your creative process start with the same sketch of a web page every time? Or even the same Photoshop template? You could be missing out on the most innovative solutions by not putting enough thought into the concept, says Victor Lombardi. Here he outlines three methods for pulling apart a brief to tackle the underlying concept design.
Publication date: 2008-10-01
more
Ubuntu Kung Fu: Best Book Cover Ever!
I just ran across news that Ubuntu Kung Fu is Shipping and happened to look at the cover. As a cat lover and technical book author myself, I felt a little slighted.
That's right. Keir Thomas got a kitten on his book.
That kicks ass.
But even better, Ubuntu Kung Fu (PDF and printed) sounds like a real winner for day-to-day Ubutnu users. As the marketing blurb says:
Award-winning Linux author Keir Thomas gets down and dirty with Ubuntu to provide over 300 concise tips that enhance productivity, avoid annoyances, and simply get the most from Ubuntu. You'll find many unique tips here that can't be found anywhere else. You'll also get a crash course in Ubuntu's flavor of system administration. Whether you're new to Linux or an old hand, you'll find tips to make your day easier.
In other words, it's a book that nearly everyone using Ubuntu could benefit from. I'm hoping to grab a copy shortly. Have a listen to Keir Thomas on Ubuntu Kung Fu in this week's Pragmatic Podcast.
Also available on Amazon.com. (comments)
Publication date: 2008-09-26
more
Mustard Lime Beef Steaks Recipe
A few days ago I made a new grill recipe that turned out even better than we expected, so I've reproduced it here for your grilling and eating enjoyment.
Ingredients
4 sirloin beef steaks (roughly 1" thick)
1/4 cup of dry mustard (Colman's works well)
1/4 cup Worcestershire Sauce (Lea and Perrins works well)
Lime Juice or 1 large lime
Coarse salt (sea salt is what I use)
Freshly ground white pepper
Preparation
Cover the steaks on one side with 2 tablespoons of dry mustard. Pat it down and spread evenly with the back side of a fork. Sprinkle two tablespoons of worcestershire sauce over the steaks, allowing it to soak into the mustard--patting the steaks with the fork if necessary. Dribble a bit of lime juice over the steaks.
Season the steaks with a good amount of salt and pepper. Then flip the steaks and repeat on the other side. Let them marinate for 20-30 minutes while pre-heating the grill.
Cooking
Clean and oil the grill. Cook the steaks on high heat for roughly 4 to 6 minutes per side, aiming to keep them pink on the inside. Do not rotate the steaks to make those nice cross-hatched grill marks. Doing so may knock off some of the mustard and seasonings.
Let the steaks sit for a few minutes. Slice and enjoy. :-)
Unfortunately, I have no pictures to show. But they're most excellent to eat. Trust me. (comments)
Publication date: 2008-09-25
more
HTPC Wireless Keyboard and Mouse Recommendations?
Dear Lazyweb,
As of a few weeks ago, we
have a
computer hooked up to the 66 inch TV full-time. However, it
currently has a wired keyboard and mouse, both of which are less than
optimal when you'd prefer to keep your ass on the couch and pick a
movie from the server upstairs.
So I'm soliciting recommendations for a wireless keyboard and mouse
(or keyboard/mouse combo) that has decent range (20-30 feet, ideally)
and doesn't take up too much space. The keyboard doesn't need a
numeric keypad or even full-sized keys. It's only going to be used to
type a small amount: occasional hits
to IMDB and renaming a folder or
two.
The mouse should be a reliable two or three button optical that can
tolerate occasional attacks by our cats and possibly even a spilled
drink.
On option that is highly rated but also highly priced is
the Logitech
diNovo Edge 2-Tone 84 Normal Keys 9 Function Keys USB Bluetooth
Wireless Mini Keyboard. Reviews claim is has excellent size and
range. But the touchpad mouse seems a bit funky. However,
I do like the idea of it being built-in so there's only one
object to deal with.
Thoughts?
Thanks in advance,
Jeremy (comments)
Publication date: 2008-09-23
more
Understanding Disabilities when Designing a Website
It is easy to make accessibility a checklist item for your Web site launch, but Leona Tomlinson provides some insight into how users with different disabilities access websites and how simple changes can make a huge difference to their browsing experience.
Publication date: 2008-09-17
more
Never Buy a House from a Realtor
A couple weekends ago we embarked on a seemingly simple painting project at home. We wanted finally paint over the wall that was torn up when I had plumbing problems a few years ago (see: The Leak, Day #2, The Leak, Day #3: Leak Found, Pictures, Showering with a 90 Foot Hose, and other Fun Tidbits, The Leak, Day #7: Still Showering with a Hose, etc.).
There were numerous cans of paint in the garage that the previous owners had left behind. And since the house had mostly white walls, it seemed like a pretty trivial task. We got out the paint, spread the plastic and sheets, stirred, poured, and started putting paint on the walls.
After a bit of painting it became apparent that were we not using the right color. Apparently there was more than one white used in the house. This wouldn't normally be a problem. But as part of the painting we decided to touch up a few other walls in other rooms of the house. It looked fine while the paint was wet. But as the paint dried, we realized that there were actually three or more different flavors of "white" in use around the house.
Grr.
Realizing what pain in the ass this could turn into, we opted to chip a bit of paint off the affected walls, take them over to our neighborhood Orchard Supply Hardware, and get them to match the color.
They did an excellent job. The touched up spots look fine. And the colored paint we got for the previously repaired wall looks great. (Oh, we decide to use a non-white color after we realized the "white" was all wrong.)
So you're probably wondering what this has to do with realtors.
Realtors know what it takes to sell a house. They know where they can cut corners and get away with it. After thinking about it a bit, I realized what the previous owners of our house must have done. I suspect that they hired some cheap painters and asked them to bring along any leftover white paint from previous jobs.
They did. And they used one white for one room, a slightly different white for the next, and so on--thereby using up the extra paint and not having to spend a whopping $12/gallon to repaint the house before selling it.
I can't think of any other reason why someone would paint different rooms using shades of white that are just different enough to be different. It just doesn't make sense.
But to make matters worse, they didn't bother to label the spare cans so we'd know which room the colors applied to. At least the spare paint can I put away after we were done have things like "living room" or "bedroom" written on them in black marker.
Damned cheap-ass realtors.
Anyone need three or four cans of partially used off-white paint? (comments)
Publication date: 2008-09-11
more
Cooking With Stock
Unless you
Publication date: 2008-09-10
more
Long Term Data Archiving Formats, Storage, and Architecture
I'm thinking about ways to store archival data for the long term
and wanted to solicit anyone who's been down this road for some
input, advice, warnings, etc.
Background
Essentially I'm dealing with a system where "live" and "recently
live" data is stored in a set of
replicated MySQL servers and queried
in real-time. As time goes on, however, older "expired" data is
moved to a smaller set of replicated "archive" servers that also
happen to be MySQL.
This is problematic for a few reasons, but rather than be all
negative, I'll express what I'm trying to do in the form of some
goals.
Goals
There are a few high-level things I'd like this archive to
handle based on current and future needs:
Be able to store data for the foreseeable future. That means
hundreds of millions of records and, ultimately, billions.
Fast access to a small set of records. In other words, I like
having MySQL and indexes that'll get me what I want in a hurry
without having to write a lot of code. The archive needs to be
able to handle real-time queries quickly. It does this today and
needs to continue to work.
Future-proof file/data format(s). One problem with simply using
MySQL is that there will be schema changes over time. A column may
be added or dropped or renamed. That change can't easily be
implemented retroactively on a larger data set in a big table or
whatnot. But if you don't then code needs to be willing to deal
with those changes, NULLs appearing, etc.
Fault tolerance. In other words, the data has to live in more
than once place.
Support for large scans on the data. This can be to do full-text
style searches, looking for patterns that can't easily be indexed,
computing statistics, etc.
It's worth noting that data is added to the archive on a constant
basis and it is queried regularly in a variety of ways. But there
are no delete or updates occurring. It's a write heavy system most
of the time.
Pieces of a Possible Solution
I'm not sure that a single tool or piece of infrastructure will
ever solve all the needs, but I'm thinking there may be several open
source solutions that can be put to use.
You'll notice that this involves duplicating data, but storage is
fairly cheap. And each piece is especially good at solving one or
more of our needs.
MySQL. I still believe there's a need for having a copy of the
data
either denormalized
or in a star
schema in a set of replicated MySQL instances using MyISAM. The
transactional overhead of InnoDB isn't really needed here. To keep
things manageable one might create tables per month or quarter or
year. Down the road
maybe Drizzle makes
sense?
Sphinx. I've been
experimenting with Sphinx for indexing large amounts of textual data
(with some numeric attributes) and it works remarkably well. This
would be very useful instead of building MySQL full-text
indexes or doing painful LIKE queries.
Hadoop/HDFS
and Flat
Files or a simple record structure. To facilitate fast batch
processing of large chunks of data, it'd be nice to have everything
stored in HDFS as part of a small Hadoop cluster where one can use
Hadoop Streaming to run jobs over the entire data set. But what's
good future-proof file format that's efficient? We could use
something like XML
(duh), JSON, or
even Protocol
Buffers. And it may make sense to compress the data with gzip
too. Maybe put a month's worth of data per file and compress?
Even Pig
could be handy down the road.
While it involves some data duplication, I believe these pieces
could do a good job of handling a wide variety of use cases:
real-time simple queries, full-text searching, and more intense
searching or statistical processing that can't be pre-indexed.
So what else is there to consider here? Other tools or
considerations when dealing with a growing archive of data whose
structure may grow and change over time?
I'm mostly keeping discussion of storage hardware out of this,
since it's not the piece I really deal with (a big disk is a big
disk for most of my purposes), but if you have thoughts on that,
feel free to say so. So far I'm only thinking 64bit Linux boxes
with RAID for MySQL and non-RAID for HDFS and Sphinx.
Related Posts
The Long Term Performance of InnoDB
Open Source Queueing and Messaging Systems?
Dumber is Faster with Large Data Sets (and Disk Seeks)
The Perl UTF-8 and utf8 Encoding Mess
(comments)
Publication date: 2008-09-09
more
The Perl UTF-8 and utf8 Encoding Mess
I've been hacking on some Perl code that extracts data that comes
from web users around the world and been stored
into MySQL (with no
real encoding information, of course). My goal it to generate
well-formed, valid XML that can be read
by another tool.
Now I'll be the first to admit that I never really took the time to
like, understand, or pay much attention to all the changes in Perl's
character and byte handling over the years. I'm one of those
developers that, I suspect, is representative of the majority (at
least in this self-centered country). I think it's all stupid and
complicated and should Just Work... somehow.
But at the same time I know it's not.
Anyway, after importing lots of data I came across my first bug.
Well, okay... not my first bug. My first bug related to this
encoding stuff. The XML parser on
the receiving end raised hell about some weird characters coming
in.
Oh, crap. That's right. This is the big bad Internet and I forgot
to do anything to scrub the data so that it'd look like the sort of
thing you can cram into XML and expect to maybe work.
A little searching around managed to jog my memory and I updated my
code to include something like this:
use Encode;
...
my $data = Encode::decode('utf8', $row->{'Stuff'});
And all was well for quite some time. I got a lot farther with
that until this weekend when Perl itself began to blow up on me,
throwing fatal exceptions like this:
Malformed UTF-8 character (fatal) ...
My first reaction, like yours probably, was WTF?!?! How on god's
green earth is this a FATAL error?
After much
swearing, a
Twitter plea, and some reading (thanks Twitter world!), I came
across a section of
the Encode manual page
from Perl.
I'm going to quote from it a fair amount here because I know you're
as lazy as I am and won't go read it if I just link here. The
relevant section is at the very end (just before SEE ALSO) and
titled UTF-8 vs. utf8.
....We now view strings not as sequences of bytes, but as
sequences of numbers in the range 0 .. 2**32‐1 (or in the case of
64‐bit computers, 0 .. 2**64‐1) ‐‐ Programming Perl, 3rd ed.
That has been the perl?s notion of UTF−8 but official UTF−8 is more
strict; Its ranges is much narrower (0 .. 10FFFF), some sequences
are not allowed (i.e. Those used in the surrogate pair, 0xFFFE, et
al).
Now that is overruled by Larry Wall himself.
From: Larry Wall
Date: December 04, 2004 11:51:58 JST
To: perl‐unicode@perl.org
Subject: Re: Make Encode.pm support the real UTF‐8
Message‐Id:
On Fri, Dec 03, 2004 at 10:12:12PM +0000, Tim Bunce wrote:
: I?ve no problem with ?utf8? being perl?s unrestricted uft8 encoding,
: but "UTF‐8" is the name of the standard and should give the
: corresponding behaviour.
For what it?s worth, that?s how I?ve always kept them straight in my
head.
Also for what it?s worth, Perl 6 will mostly default to strict but
make it easy to switch back to lax.
Larry
Do you copy? As of Perl 5.8.7, UTF−8 means strict, official UTF−8
while utf8 means liberal, lax, version thereof. And Encode version
2.10 or later thus groks the difference between "UTF−8" and "utf8".
encode("utf8", "\x{FFFF_FFFF}", 1); # okay
encode("UTF‐8", "\x{FFFF_FFFF}", 1); # croaks
"UTF−8" in Encode is actually a canonical name for "utf−8−strict".
Yes, the hyphen between "UTF" and "8" is important. Without it
Encode goes "liberal"
find_encoding("UTF‐8")‐>name # is ?utf‐8‐strict?
find_encoding("utf‐8")‐>name # ditto. names are case insensitive
find_encoding("utf8")‐>name # ditto. "_" are treated as "‐"
find_encoding("UTF8")‐>name # is ?utf8?.
Got all that?
The sound you heard last night was me banging my head on a desk.
Repeatedly.
I mean, how could I have possibly noticed the massive
difference between utf8 and UTF-8?
Really. I must have been on some serious crack.
Sigh!
Needless to say my code now looks more like this:
use Encode;
...
my $data = Encode::decode('UTF-8', $row->{'Stuff'}); ## fuck!
Actually, I was kidding about the "fuck!" I
wouldn't swear
in code. (comments)
Publication date: 2008-09-03
more
Web Design by Designers
You
Publication date: 2008-08-27
more
Dumber is Faster with Large Data Sets (and Disk Seeks)
I remember
reading Disk
is the new Tape earlier this year and how much it resonated.
That's probably because I was working for Yahoo at the time and
hearing a lot about their use
of Hadoop for data
processing. In fact, I even did a couple videos
(1
and 2)
about that.
Anyway, I recently faced the reality of this myself. When I wrote
about The
Long Term Performance of InnoDB I'd been beating my head against
a wall trying to get millions of records out of InnoDB
efficiently.
It was taking days to get all the records. Yes, days!
After joking that it'd probably be faster to just dump the tables
out and do the work myself in Perl, I thought about Disk is the
new Tape and realized what I was doing wrong.
Allow me to offer some background and explain...
There are several tables involved in the queries I needed to run.
Two of them are "core" tables and the other two are LEFT JOINed
because they hold optional data for the rows I'm pulling. There are
well over a hundred million records to consider and I need only
about 10-15% of them.
And these records fall into roughly 500 categories. So what I'd
been doing is fetching a list of categories, running a query for
each category to find the rows I actually need, processing the
results, and writing them to disk for further processing.
The query looked something like this:
SELECT field1, field2, field3, ... field N
FROM stuff_meta sm, stuff s
LEFT JOIN stuff_attributes sa ON sm.item_id = sa.item_id
LEFT JOIN stuff_dates sd ON sm.item_id = sd.item_id
WHERE sm.item_id = s.item_id
AND sm.cat_id = ?
AND sm.status IN ('A', 'B', 'C')
That seemed, at least in theory, to be the obvious way to
approach the problem. But the idea of waiting several days for the
results let me me to think a bit more about it (and to try some
InnoDB tuning along the way).
While it seems very counter-intuitive, this was sticking in my
head:
I?m still trying to get my head around this concept of
"linear" data processing. But I have found that I can do some
things faster by reading sequentially through a batch of files
rather than trying to stuff everything in a database (RDF or SQL)
and doing big join queries.
So I gave it a try. I wrote a new version of the code that
eliminated the two AND bits in the WHERE clause. Combining that
with
using mysql_use_result
in the client API, meant it had to process a stream of many tens of
millions of records, handle the status filtering and shorting
records into buckets based on cat_id (and some extra
bookkeeping).
As an aside, I should note that there used to be an ORDER BY on
that original query, but I abandoned that early on when I saw how
much work MySQL was doing to sort the records. While it made my
code a bit easier, it was far more efficient to track things outside
the database.
Anyway, the end result was that I was able to get all the data I
needed in merely 8 hours. In other words, treating MySQL as an SQL
powered tape drive yielded a 12 fold improvement in
performance.
Put another way, taking the brain-dead stupid, non-SQL,
mainframe-like approach got me results 12 times faster than doing it
the seemingly "correct" way.
Now this isn't exactly what the whole disk vs. tape thing
is about but it's pretty close. I'm aware that InnoDB works with
pages (that will contain multiple records, some of which I don't
need) and that's part of the problem in this scenario. But it's a
really interesting data point. And it's certainly going to change
my thinking about working with our data in the future.
Actually, it already has. :-)
Dumber is faster.
As I've
mentioned before, craigslist is hiring. We like good Perl/MySQL
hackers. And good sysadmin and network types too.
Ping me if you're
interested in either role.
(comments)
Publication date: 2008-08-22
more
Lembert Dome Hike in Yosemite
Last weekend afforded an opportunity to explore the Lembert Dome Hike in Yosemite National Park.
Lembert Dome is the monolithic dome that dominates the eastern end of Tuolumne Meadows in Yosemite National Park. It's a justifiably popular ascent, particularly among day hikers in the area, with the summit offering magnificent views of Tuolumne Meadows to the west, the Cathedral Range to the south, and the Sierra crest to the east.
The trail starts out a bit steep but the views are definitely worth the trek up, as is a quick side trip to Dog Lake.
Here are a few pictures.
Some rocks to mark the start of the trial...
The clouds helped keep the heat down.
Almost there!
Looking back where we came from.
Hey, it's me!
Look at all those trees...
Dog Lake
Farm near Evergreen Lodge (just outside the park)
The rest are here: Lembert Dome Hike in Yosemite on Flickr (comments)
Publication date: 2008-08-20
more
Open Source Queueing and Messaging Systems?
Dear Lazyweb,
I'm interested getting an idea of what open
source message
queueing systems exist that are fast, stable, and have some good
replication (think multi-colo) and fault tolerance built-in. The
idea being, of course, that some processes want to send messages
into a queue (of work to be done) and other processes will fetch
those and do stuff with them.
Ideally, I'm looking for a system that allows for different message
priorities--meaning that I'd like to be able to mark some messages
as less important, so it's okay if we lose them in a crash. It'd
also be handy to have the ability to set expiation times on
messages.
Bonus points for stuff with good Perl libraries.
Put another way, if you wanted to run something
like Amazon's
SQS on your own infrastructure, what would you use as the
building blocks?
Stuff I already know of (some of which doesn't meet my own
criteria):
Open AMQ
Apache
ActiveMQ
ejabberd
Spread::Queue
which uses the Spread
Toolkit
RabbitMQ
But surely there's more. Feel free to spew others in the comments
below...
And even if you don't know of any others, I'd love to hear about
your experience with any of the above or already commented
systems. (comments)
Publication date: 2008-08-19
more
Getting The Most Out Of Your Library
Whether you
Publication date: 2008-08-13
more
The Long Term Performance of InnoDB
The InnoDB storage engine has
done wonders for MySQL users that
needed higher concurrency than MyISAM could provide for demanding web
applications. And the automatic crash recovery is a real bonus
too.
But InnoDB's performance (in terms of concurrency, not really raw
speed) comes at a cost: disk space. The technique for achieving
this, multiversion
concurrency control, can chew up a lot of space. In fact, that
Wikipedia article says:
The obvious drawback to this system is the cost of storing
multiple versions of objects in the database. On the other hand reads
are never blocked, which can be important for workloads mostly
involving reading values from the database.
Indeed.
Imagine a set of database tables will tens of millions of rows and
a non-trivial amount of churn (new records coming in and old ones being
expired or removed all the time). You might see this in something
like a large classifieds site,
for example.
Furthermore imagine that you're using master-slave replication and
the majority of reads hit the slaves. And some of those slaves are
specifically used for longer running queries. It turns out that the
combination of versioning, heavy churn, and long running queries can
lead to a substantial difference in the size of a given InnoDB data
file (.ibd) on disk.
Just how much of a difference are we talking about? Easily a
factor of 4-5x or more. And when you're dealing with hundreds of
gigabytes, that starts to add up!
It's no secret that InnoDB isn't the best choice for data warehouse
looking applications. But the disk bloat, fragmentation, and ongoing
degradation in performance may be an argument for having some slaves
that keep the same data in MyISAM tables.
I know, I know. I can do the ALTER TABLE trick to make InnoDB
shrink the table by copying all the rows to a new one, but that does
take time. Using InnoDB is definitely not a use it and forget about
it choice--but what database engine is, really?.
Looking at the documentation
for the InnoDB plug-in, I expect to see a real reduction in I/O
when using the new indexes and compression on a data set like this.
(Others
sure have.) But I don't yet have a sense of how stable it
is.
Anyone out there in blog-land have much experience with it? (comments)
Publication date: 2008-08-13
more
Fun with Network Programming, race conditions, and recv() flags
Last week I had the opportunity to do a bit of protocol hacking and found myself stymied by what seemed like a race condition. As with most race conditions, it didn't happen often--anywhere from 1 in 300 to 1 in 5,000 runs. But it did happen and I couldn't really ignore it.
So I did what I often do when faced with code that's doing seemingly odd things: insert lots of debugging (otherwise known as "print statements"). Since I didn't know if the bug was in the client (Perl) or server (C++), I had to instrument both of them. I'd changed both of them a bit, so they were equally likely in my mind.
Well, to make a long, boring, and potentially embarrassing story sort, I soon figured out that the server was not at fault. The changes I made to the client were the real problem.
I had forgotten about how the recv() system call really works. I had code that looked something like this (in Perl):
recv($socket, $buffer, $length, 0);
...
if (length($buffer) != $length) {
# complain here
}
The value of $length was provided by the server as part of its response. So the idea was that the client would read exactly $length bytes and then move on. If it read fewer, we'd be stuck checking again for more data. And if we did something like this:
while (my $chunk = ) {
$buffer .= $chunk;
}
There's a good chance it could block forever and end up in a sort of deadlock, each waiting for the other to do something. The sever would be waiting for the next request and the client would be waiting for the sever to be "done."
Unfortunately for me, the default behavior of recv() is not to block. That means the code can't get stuck there--it simply does a best effort read. If you ask for 2048 bytes to be ready but only 1536 are currently available, you'll end up with 1536 bytes. And that's exactly the sort of thing that'd happen every once in a while.
The MSG_WAITALL flag turned out to be the solution. You can probably guess what it does...
This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned.
That's pretty much exactly what I wanted in this situation. I'm willing to handle the signal, disconnect, and error cases. Once I made that change, the client and server never missed a beat. All the weird debugging code and attempts to "detect and fix" the problem were promptly ripped out and the code started to look correct again.
The moral of this story is that you should never assume that the default behavior is what you want. Check those flags.
Now don't get me started about quoting and database queries... (comments)
Publication date: 2008-08-08
more
Review: Web Form Design by Luke Wroblewski
2008 has been a quiet year for good web design and development books. The standout thus far is Luke Wroblewski
Publication date: 2008-08-06
more
How Environments, Real And Virtual, Influence Us
What sort of state is your desk in right now? Clutter on your website, just as in your environment, can have a negative effect on your visitors
Publication date: 2008-08-06
more
I'm Thinking
In Amazing Powers of Concentration, Brad Feld says something that resonated with me.
I've never really understood the phrase "I'm thinking." It's too abstract for me. I like to think I think all the time. So "I'm thinking" doesn't feel like it applies to anything. For example, when "I'm running", it's pretty clear what I'm doing. "I'm thinking" - not so much so.
That's so true. Thinking is an ongoing and difficult to see activity.
In fact, I know of some people who are so busy thinking at times that they find it difficult to sleep at night. I used to have that problem a lot. However, it's rare these days. I'm not sure why. Maybe I'm just more fond of sleep than I used to be.
I suppose that if you're into meditation, there is a time during the day when you force yourself not to think. But that's pretty rare, I suspect.
Oh, I almost forgot about television... (comments)
Publication date: 2008-08-05
more
Feline Diabetes or Living with a Diabetic Cat
About a week and a half ago, I noticed that Barnes (one of our two older cats) was thinner than he used to be--so much so that I felt his bones when I gave him the sort of back scratching that he loves so much.
Both he and his brother (Noble) are about 10 years old and have nearly always been on the heavy site. And, of course, don't get to a vet regularly because they utterly detest cat trips.
Last Thursday we realized that it wasn't getting any better and took him over to the vet (Kirkwood Animal Hospital and Dr. Ueno) to see what was going on. Some on-line reading led me to believe that it was likely a case of Hyperthyroidism, which I'd heard of and thought was somewhat common in aging cats.
However, the doctor called back on Friday morning to tell me that Barnes was diabetic. :-( Not only did that mean another trip to the vet and a 6-8 hour stay for glucose testing, it also likely meant insulin shots for the rest of his hopefully long life.
It wasn't long before I found the FelineDiabetes.com web site and began reading about what this was likely to mean: dietary changes, closer monitoring, daily shots, and so on.
To make a long story short, Barnes is doing better now. He and the other three cats are adjusting to eating a low-carb cat food (Purina DM). I have an appointment for his brother Noble to get checked out next week. If he's headed down the same path, a distinct possibility given the role that genetics can play, we'd like to catch it ASAP.
The food is more expensive and the insulin shots aren't nearly as bad as I expected. But I really wish this hadn't happened. Diabetes puts him at risk for other complications down the road--just like in humans.
What you need to know...
If you're a cat owner, here are a few suggestions from our experience:
Feed your cats a good diet--onc they were designed to eat. That means avoiding the cheap foods and excessive snacking.
Help them get lots of exercise. Use cat toys, catnip, a laser pointer, whatever works for them.
Keep your cats indoors--they'll live much longer lives.
Get you cats to the vet yearly. Eventually they'll get used to it. And even if they don't, it's for their own good.
Oh, I just dug up some of the pictures I took of Barnes and Noble back in 1999 when I first adopted them. There were about 3-6 months old at the time.
Just to lighten things up a bit, if you haven't already seen it, check out An Engineer's Guide to Cats.
There's probably a lot more I could say about this but will save it for another time. I'm sure we have much to learn yet. Now I'm off to get an injection ready. (comments)
Publication date: 2008-08-04
more
Two weeks into my new job at craigslist...
Many people have asked (via IM, email, Twitter) how my new job is going, what craigslist is like, etc. So here are a few thoughts about my first two weeks in the new job.
The Commute
Despite what folks said in the comments of my little announcement, the commute really isn't that bad. Taking I-280 from Willow Glen (San Jose) up to near Golden Gate Park is about 55 minutes from pulling out of the garage to parking in San Francisco. And I've been able to find parking on Lincoln each time I've gone up--usually within 4-6 blocks from the office.
So 55 minutes of driving plus about 10 minutes of walking (which is good for me anyway) is very manageable if you're not doing it every day. If I did, I'd be less up-beat about it, I'm sure.
Having said that, I am going to experiment with the mass transit options as well. I'd like to give all the reasonable options a fair shake.
The Hardware
My laptop, a Lenovo ThinkPad T61 running Ubuntu Linux is performing quite well. It's had one lockup that I cannot attribute to anything in particular. But other than that, it's a joy to work on--especially with Emacs Snapshot and it's most excellent font rendering. (Learning VIM is still on my todo list...)
The biggest hassle so far has been VPN related. Every once in a while my laptop decides to reconnect to my wireless router at home and when it does it replaces the custom resolv.conf file with my "normal" home one. That results in a VPN that sort of works and sort of doesn't. I'm getting better at noticing when this happens and fixing it, but I really need to find a way to keep that from happening at all.
The Culture
In two weeks, I've only had one experience that I would come close to classifying as a "meeting." There really aren't conference rooms (yay!) but it did involve a whiteboard. However, unlike meetings I'm used to, it involved only the most essential people, had a clearly defined goal, and was very useful to me.
The engineering team has a great old-school Perl and Unix mentality (and sense of humor) to it that I really dig. Our private IRC channel is filled with a mix of useful information sharing and old fashioned joking, complaining, and ranting. It reminds me a lot of Yahoo in the 1999-2000 time frame.
The Food
Unlike Yahoo, craigslist has an abundance of nearby eating establishments within very short walking distance. I suspect that it'll take months of time before I've sampled what's nearby.
The Work
What am I actually doing?
Well, it's a mix of things at this stage. Since I know only a little bit about how things actually work, I'm asking a lot of questions and trying to get a sense of what's what and where. That always takes time in a new environment and with a new code base. But eventually the day does come when you suddenly realize that it's not an issue anymore and you must have things mostly figured out.
I'm also playing with alternatives to our current search. I've spent a week or so getting to know Sphinx, the open source search engine by Andrew Aksyonoff. People often use it as a replacement for MySQL's full-text search capabilities.
So far I'm quite impressed with it's speed and capabilities, not to mention Andrew's willingness to offer advice and suggestions. I've also been using Jon Schutz's Sphinx::Search Perl module. I've had to slightly modify the code of both to get them to perform the way we'd like, but that modifications aren't terribly extensive. As is often the case, what took the most time was figuring out what I really wanted to do and then how to do it.
I may have more to say about all this later.
Hiring
It looks like we've got a bit more room at craigslist. As Jim mentioned on the craigslist blog:
Worth mentioning that the CL tech hiring bit remains set to "1" for star LAMPerl developers, systems heavyweights, and networking wizards.
If you're a great Perl hacker, amazingly skilled networking geek, or someone who really knows systems and data center stuff, we may be waiting for you.
Ping me if you're interested and we'll get the ball rolling.
Finally...
Anyway, that's the story so far.
Am I happy in my new role? You bet.
Do I miss some of my old colleagues at Yahoo? Of course. In fact, I missed Chad's going away party due to a sick cat, which is a whole separate and sad story I need to tell.
See Also: Settling in to a New Environment at Craigslist (comments)
Publication date: 2008-08-03
more
Great Pancakes with the Presto Cool Touch 20 Inch Electric Griddle
We recently bought an electric griddle so that I could make more than one pancake at a time. After reading a few reviews, I confirmed that the 20 inch electric model from Presto was the way to go.
While it's available on Amazon.com (and here), I bought locally and got the chance to put it to use last weekend when some of the family were visiting. I was surprised and impressed by how quickly and evenly it heated. But that was really just the beginning.
It turns out that this little electric wonder makes pancakes that light, fluffy, and far more evenly cooked than anything I've even been able to do on the stove top. And to put icing on the cake, the manual provides temperature settings so that you can cook a few dozen other things: eggs, sausage, bacon, and so on.
The non-stick coating is trivial to clean and the grease catcher appears to do it's job well too.
In summary, if you ever find yourself needing to make pancakes for more than a couple people at once, get yourself an electric griddle!
Damn, now I'm hungry for a pancake... (comments)
Publication date: 2008-07-31
more
Pages:
« 1 2 3 4 5 6 7 8 9 10 »
Site generated in 1.329sec.