I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood


November 19, 2009

Buy Bad Code Offsets Today!

Let's face it: we all write bad code.

But not every programmer does something about the bad code they're polluting the world with, day in and day out. There's a whole universe of possibilities:

But that's a lot of work. Really freaking hard work! Wouldn't it be nice if you could do something a bit simpler and easier to, just … say … offset the bad code you're producing?

Well, now you can -- with Bad Code Offsets.

bad-code-offset-front.jpg

bad-code-offset-back.jpg

I am a proud member of the Alliance for Code Excellence, and this is our vision:

We envision a world where software runs cleanly and correctly as it simplifies, enhances and enriches our day to day work and home lives. Mitigating the scope and negative impact of bad code on our jobs, our lives and our world is our all-consuming passion. We foresee a time when bad coding practices and their rotten fruits have been eliminated from this earth and its server farms thereby heralding a new age of software brilliance and efficacy.

Nettlesome bugs and poorly written code have been constant impediments towards realizing our full potential as programmers and engineers. Bad Code Offsets provides the vehicle for balancing the scales of poor past practice while freeing us to pursue current excellence in code development. Until the dawn of the worldwide, bug free code base, each of us can take steps towards reducing our bad code footprint and remediate the bad code that we have each individually and collectively left behind on the desktops, servers and mainframes at school, at work and at home.

Yes, this is partly tongue in cheek, but we aren't just doing it for the lulz. Bad code offsets cost real money, because the Alliance has a goal:

Q: Where does my money go?

A: The proceeds from the sale of Bad Code Offsets are donated to various worthy Open Source initiatives that are carrying the fight against bad code on a daily basis. These organizations include:

This is the awesome part: the money you spend on Bad Code Offsets really does offset bad code!

All the money spent on bad code offsets goes directly to open source projects that actively make programmers' lives better. For every ten thousand lines of mind-bendingly bad code produced, we hope to subsidize a thousand lines of quality open source code.

So, please -- buy bad code offsets today. It is, quite literally, the least you could do.

[advertisement] JIRA 4 - Simplify issue tracking for everyone involved. Get started from $10 for 10 users.

Posted by Jeff Atwood    Comments (87)    View blog reactions

November 15, 2009

Parsing Html The Cthulhu Way

Among programmers of any experience, it is generally regarded as A Bad Ideatm to attempt to parse HTML with regular expressions. How bad of an idea? It apparently drove one Stack Overflow user to the brink of madness:

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML.

Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions.

Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes.

That's right, if you attempt to parse HTML with regular expressions, you're succumbing to the temptations of the dark god Cthulhu's … er … code.

kraken-cthulhu.jpg

This is all good fun, but the warning here is only partially tongue in cheek, and it is born of a very real frustration.

I have heard this argument before. Usually, I hear it as justification for seeing something like the following code:

 # pull out data between <td> tags
($table_data) = $html =~ /<td>(.*?)<\/td>/gis;

"But, it works!" they say.
"It's easy!"
"It's quick!"
"It will do the job just fine!"

I berate them for not being lazy. You need to be lazy as a programmer. Parsing HTML is a solved problem. You do not need to solve it. You just need to be lazy. Be lazy, use CPAN and use HTML::Sanitizer. It will make your coding easier. It will leave your code more maintainable. You won't have to sit there hand-coding regular expressions. Your code will be more robust. You won't have to bug fix every time the HTML breaks your crappy regex

For many novice programmers, there's something unusually seductive about parsing HTML the Cthulhu way instead of, y'know, using a library like a sane person. Which means this discussion gets reopened almost every single day on Stack Overflow. The above post from five years ago could be a discussion from yesterday. I think we can forgive a momentary lapse of reason under the circumstances.

Like I said, this is a well understood phenomenon in most programming circles. However, I was surprised to see a few experienced programmers in metafilter comments actually defend the use of regular expressions to parse HTML. I mean, they've heeded the Call of Cthulhu … and liked it.

Many programs will neither need to, nor should, anticipate the entire universe of HTML when parsing. In fact, designing a program to do so may well be a completely wrong-headed approach, if it changes a program from a few-line script to a bullet-proof commercial-grade program which takes orders of magnitude more time to properly code and support. Resource expenditure should always (oops, make that very frequently, I about overgeneralized, too) be considered when creating a programmatic solution.

In addition, hard boundaries need not always be an HTML-oriented limitation. They can be as simple as "work with these sets of web pages", "work with this data from these web pages", "work for 98% users 98% of the time", or even "OMG, we have to make this work in the next hour, do the best you can".

We live in a world full of newbie PHP developers doing the first thing that pops into their collective heads, with more born every day. What we have here is an ongoing education problem. The real enemy isn't regular expressions (or, for that matter, goto), but ignorance. The only crime being perpetrated is not knowing what the alternatives are.

So, while I may attempt to parse HTML using regular expressions in certain situations, I go in knowing that:

  • It's generally a bad idea.
  • Unless you have discipline and put very strict conditions on what you're doing, matching HTML with regular expressions rapidly devolves into madness, just how Cthulhu likes it.
  • I had what I thought to be good, rational, (semi) defensible reasons for choosing regular expressions in this specific scenario.

It's considered good form to demand that regular expressions be considered verboten, totally off limits for processing HTML, but I think that's just as wrongheaded as demanding every trivial HTML processing task be handled by a full-blown parsing engine. It's more important to understand the tools, and their strengths and weaknesses, than it is to knuckle under to knee-jerk dogmatism.

So, yes, generally speaking, it is a bad idea to use regular expressions when parsing HTML. We should be teaching neophyte developers that, absolutely. Even though it's an apparently neverending job. But we should also be teaching them the very real difference between parsing HTML and the simple expedience of processing a few strings. And how to tell which is the right approach for the task at hand.

Whatever method you choose -- just don't leave the <cthulhu> tag open, for humanity's sake.

[advertisement] JIRA 4 - Simplify issue tracking for everyone involved. Get started from $10 for 10 users.

Posted by Jeff Atwood    Comments (126)    View blog reactions

November 9, 2009

Whitespace: The Silent Killer

Ever have one of those days where everything you check into source control is wrong?

Also, how exactly is that day is different from any other? But seriously.

Code that is visible is code that can be wrong. No surprise there. But did you know that even the code you can't see may be wrong, too?

These are the questions that drive young programmers to madness. Take this perfectly innocent code, for example.

code-whitespace-invisible.png

Looks fine, doesn't it? But hold on. Wait a second. Let's take another, closer look.

code-whitespace-visible.png

OH. MY. GOD!

If you're not a programmer, you may be looking at these two images and wondering what the big deal is. That's fine. But I humbly submit that, well, you're not one of us. You don't appreciate what it's like to spend every freaking minute of every freaking day agonizing over the tiniest details of the programs you write. Not because we want to, you understand, but because the world explodes when we don't.

I mean that literally. Well, almost. If one semicolon is out of place, everything goes sideways. That's how programming works. It's fun! Sometimes! I swear!

We got into this industry because, quite frankly, we are control freaks. It's who we are. It's what we do. Now to imagine, to our dismay, that there's all this stupid, useless whitespace at the ends of our lines. Stuff that's there, but we can't see it. Well, those are the nightmares OCD horror movies are made of. I have a full-body itchiness just talking about it.

Depending on how far down the rabbit-hole you want to go, there's any number of things you could do here:

  • Have a post-build step, perhaps something with a regular expression like \s*?$ in it, that auto-cleans extra spaces checked into source control
  • Execute a local macro which removes whitespace from ends of lines
  • Have a special rule to highlight extra spaces
  • Run your IDE in whitespace-always-visible mode, or toggle it frequently

OK, fine, so maybe the world won't explode if there are a few extra bits of whitespace in my code.

But all the same, I think I'll go back and make extra double plus sure no more of that pesky whitespace has accumulated in my code when I wasn't looking. Just because I can't see it doesn't mean it's not out to get me.

[advertisement] JIRA 4 - Simplify issue tracking for everyone involved. Get started from $10 for 10 users.

Posted by Jeff Atwood    Comments (223)    View blog reactions

November 5, 2009

Preserving Our Digital Pre-History

I've spent a significant part of my life online. Not just on the internet, I mean, but on modems and early, primitive online communities. Today's internet is everything we couldn't have possibly dared to imagine twenty-five years ago, but there is a real risk of these early, tentative digital artifacts -- and for some, the beginnings of our Hacker Odyssey -- being lost forever in the relentless deluge of online progress. Sure, every single thing that happened in 2004 is documented exhaustively online. But 1994? 1984? Not so much.

That's where Jason Scott comes in.

You may know Jason Scott from BBS The Documentary. Or, perhaps you're familiar with textfiles.com, his massive (and growing) archive of what passed for blogs and forums in the earliest online era.

A wonderful thing happened in the 1980s: Life started to go online. And as the world continues this trend, everyone finding themselves drawn online should know what happened before, to see where it all really started to come together and to know what went on, before it's forgotten.

When a historian or reporter tries to capture the feelings and themes that proliferated through the BBS Scene of the early 1980's, the reader nearly always experiences a mere glimpse of what went on. This is probably true of most any third-party reporting, but when the culture is your own, and when the experiences were your own, the gap between story and reality is that much wider, and it's that much harder to sit back and let the cliche-filled summary become "The Way It Was." You want to do something, anything so that the people who stumble onto the part of history that was yours know what it was like to grow up through it, to meet the people you did, to do the things you enjoyed doing. Maybe, you hope, they might even see the broader picture and the conclusions that you yourself couldn't see at the time. This is history the way the chronicled want it to be.

Jason is nothing less than our generation's digital historian in residence. When GeoCities went permanently offline a week ago, he was there to help preserve it for posterity.

bbs-documentary.png

BBS: The Documentary was a major milestone in his ongoing effort to document our digital pre-history. But it's only the beginning; there's also a huge documentary on text adventures, Get Lamp, that's been in the works for a few years now. Unfortunately, progress has been slow. Because while being a digital historian is great, it's not exactly something you get paid to do.

But maybe we can change that. Witness Jason's kickstarter proposal:

Throughout all this, I had a day job - computer administration. It paid well, but I paid for it with my health. When my most recent employer and I parted ways, I decided I'd take this time finish some of the bigger projects I've been working on.

I suddenly thought back to Kickstarter and got this crazy idea - what if I simply asked the world and fans to contribute a bit of money towards keeping me somewhat solvent, and give me the opportunity to go full-time with computer history? If I was able to get all these things done over the years, what if I just asked people to subscribe or give me some patronage and in return I fill their free time with cool stuff to look at, learn from, and enjoy?

There are so many people whose online presences I greatly admire. But very few of them will go on to become part of the permanent written history of this era. I have no doubt whatsoever that Jason Scott is one of those people who will, thanks to his tireless efforts to preserve the flotsam and jetsam of our digital past, stuff that would otherwise be overlooked by the mainstream and lost forever.

I've pledged $100. It is an honor to support his ongoing work of preserving our shared digital pre-history. His history, is my history, is our history. A history of geeks, dorks, dweebs, nerds, and generally computer-obsessed misfits, but nonetheless -- it's something we all share.

If this is something you believe in, I urge you to pledge as well.

[advertisement] JIRA 4 - Simplify issue tracking for everyone involved. Get started from $10 for 10 users.

Posted by Jeff Atwood    Comments (59)    View blog reactions

November 3, 2009

Stack Overflow Careers: Amplifying Your Awesome

That Stack Overflow thing we launched a year ago? It's been going pretty well so far.

Of course, everyone knows you could code Stack Overflow in a long weekend. It's trivial. Assembling a worldwide community of smart, engaged software developers? That's a whole different ball of wax. Stack Overflow is a site by programmers, for programmers; it's only as good as the programmers who choose to participate.

Stack Overflow isn't about me. Or anybody else on the Stack Overflow team for that matter.

Stack Overflow is you.

This is the scary part, the great leap of faith that Stack Overflow is predicated on: trusting your fellow programmers. The programmers who choose to participate in Stack Overflow are the "secret sauce" that makes it work. You are the reason I continue to believe in developer community as the greatest source of learning and growth. You are the reason I continue to get so many positive emails and testimonials about Stack Overflow. I can't take credit for that. But you can.

I learned the collective power of my fellow programmers long ago writing on Coding Horror. The community is far, far smarter than I will ever be. All I can ask – all any of us can ask – is to help each other along the path.

I am continually humbled by the skill and expertise of the programmers who volunteer time to Stack Overflow. These programmers graciously donate tiny slivers of their day to help us -- and themselves -- become better programmers. These 5 and 10 minute slices of effort, across hundreds of thousands of questions and answers, become a permanently archived (and creative commons wiki licensed) bread crumb content trail for future programmers to follow, edit, and contribute to themselves over time.

I'm thrilled to see Stack Overflow working so well for both askers and answerers; the "pay it forward" model of programmers helping their peers is exactly what we were shooting for. We'll never change the world, but it sure is nice to be able to improve our small corner of it just a little bit. Remember: bad code that isn't written, is bad code that another poor programmer won't have to debug. If we don't reach out to slaphelp new programmers and teach them the lessons we learned the hard way, who will? I'm only exaggerating a little when I say that the future of our entire profession depends on it.

If you're actively participating on Stack Overflow, we now have another way to convert those slices of effort into something that actively furthers your professional goals – Stack Overflow Careers.

Stack Overflow Careers

What is careers.stackoverflow.com? It's a few things:

  • a completely free, public CV hosting service for programmers, to share the cool stuff you've coded and created with the world.
  • a way to explicitly link your Stack Overflow profile with your CV, to provide concrete examples of your communication skills and individual expertise to anyone who is interested.
  • a better way to connect great programmers with the best programming jobs, for those who opt into the small annual listing fee.

In short, Stack Overflow Careers amplifies your awesome.

I won't lie to you. This is also a business. That's why there are nominal opt-in listing fees for those programmers interested in seeking employment, and substantial fees for hiring managers who want to tap into the smart developers who grok Stack Overflow.

update: I apologize if I wasn't clear. It is 100% free, forever, to create a public CV, put whatever HTML content you want in it, and link it to your Stack Overflow profile. Like so:

These are of course freely indexable and searchable on the web.

Beyond the free public component, there is a private (and completely optional) subscription component. For those programmers actively seeking employment, a small annual subscription fee allows inclusion in a private employer search UI. This is also explained in the faq and about.

That said, we're also trying to do something a bit different here. Something better than the endless, mind-numbing acronym sea of monster.com, dice.com, et al. Joel and I believe current hiring practices for programmers are incredibly broken. We think we can do better.

dilbert-interview.png

We love our work, and so should you. Our goal isn't to put warm bodies in front of interviewers. Our goal is to create love connections. Instead of avid programmers pursuing disinterested and distracted companies, it's the other way around -- savvy companies who understand the competitive advantages of having the best programmers will pursue you. We connect smart, engaged hiring managers who "get it" with top programmers who love to code.

computer-engineers-number-puzzle.jpg

If you love to code, too, I encourage you to create your own Stack Overflow CV. Keep it private, or make it public via the URL of your choice -- it's completely free either way. If you think you might be actively looking for a job in the next 3 years, take advantage of our outrageously low promotional pricing of $29 for a 3 year filing. That way, at any point in those 3 years, you can flip a switch and become visible to hiring managers. Or not. It's totally up to you.

(also, if you're hiring, and your company appreciates top software engineers -- and you think you can convince our tough audience of that -- email us)

[advertisement] JIRA 4 - Simplify issue tracking for everyone involved. Get started from $10 for 10 users.

Posted by Jeff Atwood    Comments (119)    View blog reactions
Read older entries »
Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.