Saturday, June 18, 2016

The Strobl Interjection

The other day I was giving a Ruby programming tutorial to the team at Razor Risk, since we're introducing a suite of tools to effect automated wholesale code transmogrification of our codebase, in part to bring up-to-date the very large C++ codebase with C++ 11 / C++ 14. The element in question resides within a chunk of Ruby code that is a pattern I've been using for well over a decade, involving two loops:

  • An outer loop, that searches a given directory for a given set of patterns, using the recls.Ruby library; and
  • An inner loop, that processes the contents of a given file on a per-line basis.
In each case, we're applying each.

I've been applying this pattern for almost as long as I've been using Ruby, and it looks something like this:

Recls::FileSearch.new(root_dir, patterns, Recls::RECURSIVE | Recls::FILE).each do |fe|

  IO::readlines(fe.path).each_with_index do |line, index0|

    index = 1 + index0
    line = line.chomp

    if line =~ / . . . some regex that finds something interesting .../

      . . . do something interesting to log or change the line, ...
    end
  end
end

The first two lines of the inner-loop have been part of my idiom forever, and are explained thusly:

  • Since humans and IDEs use 1-based numbering, I always obtain a 1-based index from the 0-based one, which I name index0, obtained as the second parameter to each_with_index()'s block. So far, so good;
  • Since lines come with their end-of-line sequence - usually "\n" - this needs to be chomped off. Since it's bad practice to change objects that are held by other objects (in this case it doesn't matter, but making a habit of good practice is, er, good practice) I don't call chomp! to change the actual line, but instead obtain a chomped copy from it.
All seems quiet on the western front. Alas, as I was showing this to the team, little did I know I was about to be hoist on my own petard.

Some of the things I'm always banging on about wrt the C++ codebase include being totally unambiguous and making absolutely everything immutable that can possibly be so. One of the consequences of these two exhortations to good practice is that variables should be initialised (rather than declared and then assigned) and should be made immutable (by using const), as in:

  size_t const number_to_process = container.size() - offset_length;

rather than, say, 

  size_t number_to_process;

  number_to_process = container.size();
  number_to_process -= offset_length;

In the former case the compiler will stop you from mistakenly changing or reusing the variable; in the latter all bets are off.

Anyway, one of our implementation consultants, Bernard Strobl, has a keen interest in programming and pays a lot of attention during code reviews and tutorial sessions. As is his wont, Bernie jumped immediately and with great relish upon my transgression of good practice in my reuse of the variable line. Notwithstanding my love of my own opinion and my ability to lever bombast at high speed with almost no warm-up, when you're wrong, you're wrong.

And so I learned something, or at least remembered something that now informs a modified idiom:


Recls::FileSearch.new(root_dir, patterns, Recls::RECURSIVE | Recls::FILE).each do |fe|

  IO::readlines(fe.path).each_with_index do |line0index0|

    index = 1 + index0
    line = line0.chomp

    if line =~ / . . . some regex that finds something interesting .../

      . . . do something interesting to log or change the line, ...
    end
  end
end

As of version 0.4.12, our code transmogrification tool suite now contains corrected forms for every tool, and the commit bears the message "The Strobl Interjection" lest I forget my error.

Thanks, Bernie! :-)

Monday, June 13, 2016

Important semantic effects of small syntax errors : C++ conditional expression

Here's a typo that had dramatic effects:

if(processed_count =! container.size())
{
  . . .

Result: the code always thought that the whole container had been processed, even when it wasn't.

Had the codebase been running at maximum warnings, and treating warnings as errors, this would never have happened. Those compilation conditions are now active.

Always compile at maximum warnings and treat warnings as errors.

Saturday, May 21, 2016

Revisibility in Node declarations

One of the things that strikes me about Node.js is how variable declarations are done messily in the canon. Consider:

  var express = require('express'),
      app = express();

For some reason, Node programmers (or is that JS programmers) don't want to use too many var keywords. Perhaps there is a shortage.

Problem is, when it comes time to add in a new variable, we're up against revisibility problems. To add another variable thing,

  var express = require('express'),
      app = express(),
      thing = 'thing';

involves changing two lines, because of the need to add a comma at the end of the second line to continue the variable declarations.

This is just nuts. Working with extant code - reading it, changing it - is always more important than writing it. The remedy is simple: use one var per line (as has been advocated strongly for pretty much every other C-family language since the dawn of time). Now, changing

  var express = require('express');
  var app = express();

to

  var express = require('express');
  var app = express();
  var thing = 'thing';

is at maximum revisibility, and your revisiology (which is something you will do a lot if you are writing code professionally, or in teams, or in professional teams) will be much eased.

Saturday, December 19, 2015

Important semantic effects of small syntax errors : Ruby regex

Here's a recent typo that had interestingly important effects:

Meant to write

 if line =~ /some-reg-expression/

Actually wrote:

 if line = ~/some-reg-expression/

Result: found nothing!

FYI, the type of ~/some-reg-expression/ is NilClass