TJ-2: A Very Early Word Processor

Introduction


This is an annotated copy of a 1963 memo describing TJ-2. TJ stood for "Type Justifier." TJ-2 ran on a Digital PDP-1 computer with 4096 eighteen-bit words of core memory with a 5 microsecond cycle time; in modern terms, 9K of RAM and a 200 KHz (kilohertz, not megahertz) clock. Pete Samson demonstrated TJ-2 to me and gave me the impression he was the author; the TJ-2 memo does not credit any individuals.

The text below was keyed in by hand. To the best of my proofreading ability and understanding of HTML, it is a precise copy of the original 1963 memo, faithfully reproducing the exact places where TJ-2 inserted spaces for justification, with the following provisos:

  1. This document uses the Netscape text color attribute TEXT="#400060" to suggest the light purplish color of the actual memo, which was reproduced by means of a spirit duplicator.

  2. Numbers in square brackets in the right margin refer my own notes, which follow.

  3. Material that is underlined in the original has been given the HTML attributes em and ul and will have the following appearance on your browser:
         Normal text; text that is underlined in the original.
    
  4. The PDP-1 character set included a nonspacing overbar. This key sequence
         ¯X
    
    would print as an overbar directly above the X, like the "macron" diacritical mark. Whereever you see this sequence it should be understood that in the original the overbar is printed directly above the character that here is shown as following it.

 


                     PDP-1 COMPUTER                             [1]
         MASSACHUSETTS INSTITUTE OF TECHNOLOGY
              CAMBRIDGE 39, MASSACHUSETTS                       [2]











                        PDP-9-1







                          TJ-2                                  [3]                                  



                TYPE JUSTIFYING PROGRAM                         [4]
                                         May 9, 1963





     TJ-2  accepts English  text from  typewriter or reader,    [5]
and reproduces it at any  line length via typewriter  and/or
punch.   So much as possible both left and right margins are    [6]
aligned in  the  output.   To accomplish  this  the  program
doubles  some  of the  spaces in  the  output line,  and may
hyphenate words, getting hyphenation data from its  diction-
ary or from the operator via the display.

     Normal Mode:  All hyphens in the input text are assumed    [7]
to mark  compound  words,  and will  be  reproduced  in  the
output.  Carriage return is treated as a space, except:  (a)    [8]
a string of carriage returns is ignored immediately  follow-
ing a hyphen; (b) non-ignored carriage returns are translat-
ed  into  the  same  number  of  carriage  returns  if  they
immediately precede a tab.  Any number of adjacent spaces is
treated as a single space.  Backspace and the unused Concise    [9]
codes are illegal.  The program simulates tabs by spacing to
the nearest imagined tab stop,  as determined from the  Test    [10]
Word.

     Quote  Mode:   The  justification process  may  be sup-
pressed for a portion of the input by the appearance of  "¯q"   [11]
(overbar-q) which puts TJ-2 into the Quote Mode.  Subsequent
characters from the input are copied exactly into the output    [12]
until  the occurrence of  "¯e".  It is  wise to have carriage   [13]
returns as  the  first and  last  characters in  the  quoted
material.   All  100 octal  Concise codes  are legal  in the
Quote Mode, but  overbar should be  used with great  caution
because  it  is  a signal  for  some special  action  by the
program.

     Centering Mode:  At the appearance  of "¯c" the text  up
to  the next carriage return is  centered on a separate line
in the output.  Should quoted material appear in the text to
be  centered,  it is  reproduced in  the  output but  is not
counted for the centering process.

     Figure  Mode:   The  sequence  overbar,  f,  a  decimal
number,  overbar, is  taken  by TJ-2 as  a command  to leave
blank lines for a figure  in the output.  The number  given,
minus one, blank lines are inserted in the output immediate-
ly if there is  room on the  current output page,  otherwise
the space is left at the beginning of the next page.  If the
"f" is omitted, the space is always left immediately.

     Inverted Indenting Mode:  This  mode is entered by  the
occurrence of "¯i".  A counter c is set to zero when the ¯i is
encountered.  While in the Inverted Indenting Mode all input
lines  not  preceded  by  a  multiple  carriage  return  are
indented c tab stops in the output.  Each line in the  input
which  is preceded by a multiple carriage return is preceded
by that many  carriage returns  in the output.   Also it  is



	
indented in the output by as many tab stops as in the input, and it sets c equal to that number of tabs plus one. The normal paragraphing mode is restored by "¯n". Quoted Overbar: An overbar is copied into the output if ¯/ (overbar-slash) appears in the input. The following describes usage of the Sense Switches. [14] Sense Switch 1: on -- input from typewriter [15] off -- input from reader Sense Switch 2: on -- output to typewriter [16] off -- no typed output Sense Switch 3: on -- output to punch off -- no punched output Sense Switch 4: on -- prohibit hyphenation off -- allow hyphenation as per dictionary or display. Sense Switch 5 used by individual error halts Sense Switch 6 turn on at end of run to indicate no further input The Test Word is divided into four fields, with use as follows: Bits Meaning 11-17 length of justified line 7-10 output tab spacing (equal, starting at left margin) 1-6 number of lines per page (0 means no pagination) 0 if on, stop after current output page. To hyphenate a word that appears on the display, point [17] the light pen at the + figure between letters; it will [18] become a small =. If an erroneous + was hit, all = marks can be restored to + by penning the "OOPS" dot. When all [19] the satisfactory hyphenation-points are marked, pen "SAVE" to preserve the word in the dictionary or "FORGET" to accept [20] the hyphenation but not to retain it. (There will be no SAVE dot if the dictionary is full.) The Console version of [21] TJ-2 starts with a blank dictionary. Error halt aspects, indications, and recovery proced- ures are tabulated below. AC IO Meaning SS5 down, SS5 up, [22] Continue Continue -0 0 output line too insert carriage (likewise) [23] long return +0 +0 run completed start new run (likewise) 1 char illegal Concise ignore accept code
2 char illegal parity ignore accept 3 0 word too long forget word insert space 4 ... word to be hyphen- take it anyhow don't hyphenate ated ends in non- spacing character 6 page end of output continue (likewise) no. page 10 ... dictionary full clobber program don't save [24] 12 ... too much stretch- stretch as much don't stretch ing needed as possible 17 ... TW calls for very start over accept short line

NOTES: Following notes are copyright 1993, 1997 by Daniel P. B. Smith. All rights reserved. Permission granted for Internet distribution and nonprofit use.

Was TJ-2 a word processor?

TJ-2 does not resemble modern word processors with "on-screen formatting" or "WYSIWYG editing." It somewhat resembles earlier word processors like RUNOFF, TROFF or WordStar in which the editor and formatter are separate. Comparing TJ-2 to contemporary "word processors," TJ-2 had:

  1. rudimentary paragraph formatting
  2. centering
  3. word wrap
  4. justification.
TJ-2 lacked:
  1. page number
  2. headers
  3. footers
  4. any kind of text attributes or text emphasis (not even underlining).
The hyphenation feature was surprisingly sophisticated and used a sort of graphic user interface.

Notice that none of the following terms are used anywhere in the memo: application, document, editor, font, format, software, text, word processor, word wrap.

(I first encountered the term "word processing" in an IBM advertisement of the mid-seventies, where the term was understood to encompass manual typewriters and dictating machines as well as magnetic-card Selectrics).

The general writing style of the memo is similar to that, e.g. of the Digital PDP-1 Handbook. The capitalization of terms such as Test Word, Indented Text Mode, etc. is similar to that in the PDP-1 Handbook, which contains phrases such as "the Light Pen status bit is set to one," "the contents of the extended Program Counter in bits 2 through 17," etc.

[1] PDP-1 COMPUTER:The memos in this series were all attributed simply to the "PDP-1 COMPUTER." In actuality, they were produced at a facility on the second floor of Building 26, which comprised two large rooms containing the one-of-a-kind TX-0, originally built by MIT, and a commercially built PDP-1, donated by Digital Equipment Corporation. The PDP-1 was used as an experimental testbed by the Electrical Engineering department for experiments in timesharing, and was also generally available to undergraduates. Back

[2] CAMBRIDGE 39, MASSACHUSETTS: The United States Post Office introduced zip codes about a year after this memo was written. Back

[3] TJ-2:" I don't know if there ever was a TJ-1. Back

[4] The title line, "TYPE JUSTIFYING PROGRAM," is underscored in the original. Back

No name is credited as either author or programmer. The program was demonstrated to me by Pete Samson. It was my impression that he wrote it but others may have contributed.

This memo was obviously produced with TJ-2, but most of the other memos in the same series were not. Back

[5] I've tried to duplicate the precise locations at which TJ-2 placed spaces, hyphens, etc.

"Typewriter:" the PDP-1 console typewriter. This was an IBM electric typewriter to which a company called Soroban Engineering had added switches and solenoids capable of sensing and initiating keystrokes.

This IBM typewriter was a traditional design with typewriter-style typebars (not a "golfball" Selectric). As was customary in typewriters, each typebar carried a pair of characters, one uppercase and one lowercase. The shift key raised or lowered the entire heavy type basket, selecting which of the two characters struck the ribbon. The Soroban mechanism was quite unreliable, and, in particular, often missed a case shift. However, it provided what in eighties terminology would have been called "true letter quality" printing. An additional feature of this unit was a two-color red-and-black ribbon. The PDP-1 debugger, for example, printed user commands in black and debugger responses in red.

"Reader:" high-speed 400 "line"/second photoelectric punched paper-tape reader.

"Punch:" 63 character/second paper tape punch manufactured by Teletype. Back

[6] The period ending a sentence is always followed by either two spaces or three. Back

[7] "Normal mode:" in the eighties, this would have been called "word wrap." At the time, not only was there no succinct name for the feature, but the concept itself was unfamiliar enough need a precise explanation. As I write this in 1997, word wrap is so taken for granted in GUI text entry that there is no longer any need to have a name for it, and the term is falling into disuse. Back

[8] "Carriage return:" both returned the carriage and advanced the line on the typewriter mechanisms used on the PDP-1. There was no distinct "line feed" operation to be concerned with. Back

[9] "Concise code:" the character set used internally on the PDP-1, a six-bit code formed by truncating the full "FIO-DEC" codes, which included a parity bit. A portion of the table is reproduced here; notice that every code has both a "lower" and "upper" character; notice too the upper-case mappings of the numerals.

Back

[10] "Test word:" bank of eighteen toggle switches on the PDP-1 console, often playing the same role that a command-line argument would play today. There was no provision for varying margins within a document. Back

[11] Letters preceded by a overbar here, e.g. ¯q, appear in the original with an overbar above them. A nonspacing (dead key) overbar was part of the PDP-1 character set and frequently used as a syntactic convention to identify commands.

On lines containing these characters I have chosen to transcribe the spaces inserted for justification precisely as they appear in the original. I have not succeeded in finding a good way in HTML to display an overbar above a character, so lines containing these characters are now longer than the others. Back

[12] Note that the terms used are simply "input" and "output," not "input stream," "input file," "input text," etc. Back

[13] The period is outside the quotation marks. As The New Hacker's Dictionary, 1991, comments, "Hackers tend to use quotes as balanced delimiters like parentheses... This is incorrect according to standard American usage (which would put the ... final period inside the string quotes); however, it is counter-intuitive to hackers to mutilate literal strings with characters that don't belong to them." Back

[14] "Sense Switch:" a bank of six toggle switches on the PDP-1 console, often playing the same role that a command-line switch would play today. At the time, sense switches were common on computers, even large IBM computers. Back

[15] Since there was no text editor, "input from typewriter" was not much use except for testing, demonstrations, or tutorials. Back

[16] Irregular vertical spacing here is accurately transcribed from original. Back

[17] It's typical of the computer documentation style of the time that this is described verbally, with no picture of the screen. When hyphenating the word "justification," the display would initially show

J+U+S+T+I+F+I+C+A+T+I+O+N

The user would point the pen at hyphenation points until the display showed

J+U+S=T+I+F+I=C+A=T+I+O+N

Back

[18] "Light pen:" a pointing device. It could not sense position, like a mouse. It merely signaled the presence of a point of light directly under it. Somewhat complicated strategies were needed to acquire and track the pen.


PDP-1 Type 30 Precision CRT Display, with light pen
Back

[19] The word "all" is underscored in the original. Back

[20] The "dictionary" was a temporary dictionary, kept in core (RAM) only for the duration of the session. This is not explained because it was taken for granted. There was no permanent online storage for user files. Back

[21] "Console version:" near the console was a device like a set of very small cubbyholes, designed to store fanfolded paper tapes. Programs for general use were stored in these cubbyholes and were referred to as the "console copies." There were about forty or fifty of them, including Expensive Typewriter, TECO, TJ-1, the MACRO and MIDAS assemblers, machine diagnostics, and SPACEWAR. Back

[22] AC, IO -- the two main active registers of the PDP-1, whose contents were displayed in a set of indicator lights on the front panel. Back

[23] -0 -- the PDP-1 was a one's complement machine. -0 was distinct from +0 and would have displayed as all lights on. Back

[24] The slight misalignment in the last column -- i.e. the "d" in "don't save" aligns with the "l" in "likewise," rather than the left parenthesis -- is an accurate copy of the original. Back

Additional key phrases: computer history; computer folklore; computer nostalgia; history of computing; DEC; Digital Equipment Corporation; TMRC.


Date created: 1/25/97
Last modified: 1/25/97
Copyright © 1997, Daniel P. B. Smith; All Rights Reserved
Maintained by: Daniel P. B. Smith
dpbsmith@world.std.com