Thursday, March 10, 2011

Advice for Google Summer of Code Students (and other prospective contributors)

Periodically, someone whose name I've ever seen before contacts me -- or posts to pgsql-hackers -- to say that either (1) they've already written a great patch for PostgreSQL and they'd like to know how to get it committed or (2) they're interested in writing a patch to implement some major new feature in PostgreSQL.  Some of these people are prospective Google Summer of Code students, while others are researchers or other people who, for whatever reason, are interested in PostgreSQL.  I am always thrilled to see more people take an interest in PostgreSQL, but unfortunately I've seen a number of people who seemed very smart and promising crash and burn when in terms of actually making a successful contribution of code.  In fact, no matter how promising things seem at the outset, the failure rate - in my experience - is very close to 100%.

On the other hand, there is a constant stream of new people who do succeed in getting involved in the community and making significant contributions to PostgreSQL.  For example, Kevin Grittner was the driving force behind the recent push to implement serializable snapshot isolation in PostgreSQL.  That patch is almost 15,000 lines of new code, documentation, and test cases, and is one of the single largest patches committed to PostgreSQL 9.1.  Kevin is not new in the sense that he's been reading the mailing lists and reporting bugs for a long time, but he's never had a patch anywhere near that size committed before, and I'm pretty sure that at one point he made a conference to me disparaging his own C programming skills.  Nonetheless, he's made a valuable and LARGE code contribution to PostgreSQL.

Similarly, Noah Misch, whose name I had not seen before on the PostgreSQL mailing lists I regularly read, recently contributed a number of small patches, the coolest of which is - in my opinion - teaching ALTER TABLE that it doesn't always need to rewrite the entire table when the data type of a column is changed.  That's not going to be a headline feature, but it's really cool, and lots of people are going to get good use out of it.  So it's clearly possible to show up and get started having patches committed in a relatively short period of time.

So, what's the difference between the first category of people, who almost never manage to make a meaningful contribution, and the second, who do?

Different people might give different answers to that question, but for what it's worth, here's mine: Successful contributors understand that writing good code is hard, and getting it committed is even harder.  They understand that it's going to take a lot of work and a lot of time, that both their code and ideas will be criticized, that they will need to offer convincing evidence that their project is both well-thought-out and well-implemented, and that they may need to rewrite the whole thing multiple times.  When they get criticized, they're not surprised; that was part of the plan, and they dust themselves off and keep on trying.  Unsuccessful contributors often think that writing a major new feature for PostgreSQL will be relatively easy, and are surprised when it turns out to take a lot more time and more work than they expected.

So here's my advice.  If you're applying to Google Summer of Code, don't try to implement a major new feature.  A seasoned PostgreSQL contributor who is in a position to spend 40 hours a week on a major new feature probably still wouldn't be able to get it done over one summer, so you probably can't either.  Pick something small, and ideally something that can be broken down into multiple parts, so that if it turns out to be harder than you think you can at least still do part of it.  Subscribe to pgsql-hackers, start participating in the discussions, and try reviewing some patches written by other people before you start writing your own.  Expect that it's going to be a lot of work - more work than you'd normally do in a whole semester for a challenging Computer Science class.

Similarly, if you're not interested in Google Summer of Code but you would like to contribute some major new feature to PostgreSQL, start small.  Even if it's not really what you're interested in, write some small patches that do uncontroversial things and get them committed.  Review some patches written by other people, and read the reviews that other people write and post.  Before you start writing the code for your major feature, spend several months thinking about the design and discussing it on the mailing list.  If there is any preliminary refactoring that can be done separately from the main patch, get that committed first.  Make sure that the design you ultimately settle on has buy-in from the community, and make sure you understand what's required to implement it.   A typical major feature requires anywhere from six months (at the very low end) to two years or even more from the time you start talking about to the time it gets committed.

If the above sounds discouraging, it's not meant to be.  I would love to see us incorporate more big, new features into PostgreSQL more quickly, and in fact I think we're doing a better job with that now than we have in the past, partly because our processes are better, and, equally important, because our community is growing and new people are getting involved all the time.  Provided you approach the task in the right way, you, too, can be a successful PostgreSQL contributor!

No comments:

Post a Comment