Semantic linewrapping

February 21, 2014

Sometimes when editing a Markdown file, I wrap the lines semantically. Instead of inserting a newline at 70 columns (or whatever), or making paragraphs one long line, I put in newlines at a point that seems logical to me. For example:

Sometimes when editing a Markdown file, I wrap the lines semantically.
Instead of inserting a newline at 70 columns (or whatever),
or making paragraphs one long line,
I put in newlines at a point that seems logical to me.

This may seem silly, but it produces better diffs. Let’s say I want to change the word “newline” in that first paragraph to “carriage return”. With semantic linewrapping, the diff makes it easy to see what changed (and what didn’t):

@@ -1,4 +1,4 @@
 Sometimes when editing a Markdown file, I wrap the lines semantically.
-Instead of inserting a newline at 70 columns (or whatever),
+Instead of inserting a carriage return at 70 columns (or whatever),
 or making paragraphs one long line,
 I put in newlines at a point that seems logical to me.

Compare non-semantic linewrapping:

@@ -1,4 +1,4 @@
 Sometimes when editing a Markdown file, I wrap the lines semantically.
-Instead of inserting a newline at 70 columns (or whatever), or making
-paragraphs one long line, I put in newlines at a point that seems
-logical to me.
+Instead of inserting a carriage return at 70 columns (or whatever), or
+making paragraphs one long line, I put in newlines at a point that
+seems logical to me.

More lines change than need to, and there’s a higher probability of edit conflicts. The longer the paragraph, the worse this effect gets.

Semantic linewrapping also makes editing snappier. I can delete, edit or insert sentences easily using linewise operations. Code-oriented text editors like Vim and Emacs are really good at this kind of manipulation. For example:

Here is a paragraph with some stuff in it.
This is the second sentence.
This sentence is really long, and ugly,
and the truth is that it basically says nothing at all.
This is the last sentence of the paragraph;
thanks for reading!

In Vim, if my cursor is on the ‘H’ at the beginning, I can just hit jj2dd to remove the third sentence. Move down two lines, delete two lines. Conversely, with something like this:

Here is a paragraph with some stuff in it. This is the second
sentence. This sentence is really long, and ugly, and the truth is
that it basically says nothing at all. This is the last sentence
of the paragraph; thanks for reading!

Thinking it out, I would probably do jWDj^df.xgqip. That’s probably incomprehensible to a non-Vim user, but if you’re thinking it looks like more work, you got the idea.

A program I don’t have time to write

It would be cool if there were a program to semantically linewrap text automatically in a deterministic way using properties of the English language.

By “in a deterministic way” I mean that given two Markdown files differing only by within-paragraph newlines, the output Markdown files would be identical. By “semantically linewrap” I mean all the lines, where possible, would be shorter than some set number of characters long (say 70), and the position of newlines would approximate where a human would put them.

This program would do for prose what Gofmt does for Go code — ensuring a sane and consistent formatting.

It would also be analogous to using a Rabin fingerprint, or rsync’s rolling checksum to split data (in this case text) consistently into chunks (in this case lines) for more efficient diffing.

I don’t have time to write this program at the moment, but because it’s on my mind, I thought I would share.

Acknowledgement

Brandon Rhodes wrote a blog post making a similar proposal, which I found by searching the web for “semantic linewrapping”. Parts of Brandon’s post look vaguely familiar. I’m not sure if I read his post years ago and it planted this idea in my head, but that wouldn’t surprise me.

You can follow me on Mastodon or this blog via RSS.

Creative Commons BY-NC-SA
Original text and images (not attributed to others) on this page are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.