Thinkage Ltd.
85 McIntyre Drive
Kitchener, Ontario
Canada N2R 1H6
Copyright © 1998 by Thinkage Ltd.
1. Introduction
1.1 Overview
2. Constants
2.1 Escape Sequences
2.2 Decimal Constants
2.3 Octal Constants
2.4 Floating Point Decimal Constants
2.5 ASCII Character Constants
2.6 BCD Character Constants
2.7 String Constants
3. Data Objects
3.1 Identifiers
3.1.1 Keywords
3.2 Simple Variables
3.3 Vectors
3.4 Manifest Constants
4. Functions
4.1 External Variables
4.2 Function Definition
4.3 Mechanics of Function Calls
4.3.1 Use of the Stack
5. Statements
5.1 Comments
5.2 Null Statement
5.3 Expression Statement
5.4 Storage Declaration
5.4.1 The extrn Statement
5.4.2 The auto Statement
5.4.3 Labels
5.5 Transfer of Control
5.5.1 The goto Statement
5.5.2 The return Statement
5.5.3 The break Statement
5.5.4 The next Statement
5.6 The if Statement
5.7 Iterative Statements
5.7.1 The repeat Statement
5.7.2 The while Statement
5.7.3 The do-while Statement
5.7.4 The for Statement
5.8 The switch Statement
6. Expressions
6.1 Primary Expressions
6.1.1 Function Calls
6.2 Rvalues and Lvalues
6.3 Unary Operators
6.4 Binary Operators
6.4.1 Shift Operators
6.4.2 Bitwise Operators
6.4.3 Multiplicative Operators
6.4.4 Additive Operators
6.4.5 Relational Operators
6.4.6 Logical And
6.4.7 Logical Or
6.5 Query Operator
6.6 Assignment Operators
7. The B Library
7.1 Useful I/O Functions
7.1.1 Stream I/O Functions
7.2 String Operations
7.2.1 Random String Processing
7.2.2 String Utilities
7.3 Dynamic Memory Allocation
8. Introduction to Input/Output
8.1 Units
8.2 Unit Opening
8.2.1 Access Actions
8.2.2 Mode Actions
8.2.3 Error Actions
8.2.4 Default Units
8.2.5 File Access Conventions
8.3 Unit Closing
8.4 Unit Switching
8.5 .BSET - Redirection of I/O
8.6 Sequential Stream I/O
8.6.1 Terminal vs. File
8.7 Random File I/O
8.8 String I/O
9. Using B
9.1 Compiling and Running
9.1.1 Using the B Commmand
9.1.2 Debug Tables
9.1.3 Random Libraries
9.1.4 B Command Options
9.2 Compiler/Loader Interface
9.3 Line Numbers
9.4 Source File Inclusion
9.5 Compiler Directives
9.6 Readability
9.7 Some Pitfalls
9.8 Library Explain Files
9.8.1 BOFF
Appendix A: Escape Sequences
A.1 String and Character Constant Escapes
A.2 Source Code Escapes
Appendix B: Binding Strength of Operators
Appendix C: B Compiler Error Messages
C.1 Diagnostics
C.2 TSS Loader Warning Messages
Appendix D: Partial Index of B Library Routines
Appendix E: Interface with FORTRAN Subroutines
Appendix F: DRLs and MMEs
This manual describes the B programming language as accepted by the GCOS8 B compiler.
B is a descendant of the programming language BCPL. B was first designed and implemented by D.W.Ritchie and K.L.Thompson of Bell Telephone Laboratories, Inc., Murray Hill, N.J. The original implementation of the run-time package was done by S.C.Johnson, also of Bell Labs.
There are several differences between this version of B and Bell Laboratory versions. The switch statement has been extended. Floating point operators and proper logical operators have also been been added. Finally, the order in which operators are evaluated has been changed.
The current run-time package works in both TSS and accommodation mode batch; it reads almost any "media code" found in the GCOS8 environment. The run-time package runs under GCOS8.
Although this manual makes an effort to explain most of the concepts used here, we assume that the average B user is already familiar with some computer terminology and that the user knows at least one other computing language (Fortran, Cobol, PL/I, etc.). New B users are warned that this is a Reference Manual and not a Beginner's Guide. Beginners may wish to start with the B Tutorial Guide before reading this manual.
The manual begins with a discussion of the various types of constants used in B. From there it moves on to the use of variables and vectors in the language. Next is a chapter on functions, followed by chapters on the various types of statements accepted by B, and the wide variety of operators in the language.
The final chapters deal with the B Library routines, I/O, and a number of miscellaneous details about working with the compiler. You should read these last chapters as carefully as you read the chapters about the basics of the language. It is virtually impossible to use B without a working knowledge of the run-time library routines. For one thing, B itself has no constructs for input or output; all I/O is done by function calls to library routines.
Before you begin your first major program in B, you should try a few smaller programs just to get the feel of the language. It is also very helpful to have a look at programs written by a more experienced B programmer, to get an idea of how the features of the language can be put together effectively.
However carefully we explain things in this manual, it's inevitable that we'll leave some of your questions unanswered. One of the best ways to get the answers you want is simply by experimentation. If you don't know how a given feature works, write a small program and try things out. This may take a little time, but it can also save a lot of the headaches which arise from guessing about how the language behaves.
B lets you define octal, decimal, floating point, ASCII character, BCD character, and string constants. All but the last of these are stored internally in a single machine word. On GCOS8 systems, a machine word is 36 bits (which is equivalent to four 9-bit ASCII characters or six 6-bit BCD characters).
Because some terminals do not have all the characters recognized by B, the compiler lets you "manufacture" these missing characters using two characters which do appear on your keyboard. For example, B translates "$(" into the single curly brace bracket "{". Pairs of characters that can be translated into single characters are known as escape sequences.
Escape sequences may be used in character constants and string constants for characters which may be difficult or impossible to enter any other way. For example, "*n" may be used in character and string constants to represent the "new-line" character, "*b" is a backspace character, and so on. Remember that these are stored as a single character internally, even if they are typed as a pair of characters.
Appendix A gives a complete list of escape sequences.
A decimal constant consists of an ordinary integer number with no leading zeroes, as in
25 4737 981 32
An octal constant consists of an integer number formed only from the octal digits zero through seven. To distinguish an octal constant from a decimal constant, an octal constant must begin with at least one leading zero, as in
01 077 026 0004000 0777777777777
A floating point constant is any number containing a decimal point. It must not begin with the decimal point, but it may have leading zeroes. It may also be followed by the letter 'e' and a signed integer exponent. (If the sign of the exponent is positive, the "+" may be omitted.) Here are some sample floating point constants:
3.2 1. 0.5 1.e5 001.5 4.987e-2
An ASCII character constant consists of one to four ASCII characters inside single quotes. The result is a word that contains the internal form of the ASCII characters. If there are less than four characters specified, the characters are right-justified in the word and padded on the left with ASCII null characters (000, escape sequence '*e'). Thus 'a' is the same as '*e*e*ea'. Here are some sample ASCII character constants:
'a' 'abc' 'abcd' 'ab*'*n'
In the last example above, the constant is "ab" followed by a single quote followed by a new-line character. This is because "*'" is the escape sequence for a single quote in character constants, and "*n" is the escape sequence for the new-line character. Thus the constant is only four characters long, even though it is typed with six characters.
The compiler counts the number of characters inside character constants and issues an error message if there are more than four.
A BCD constant consists of one to six characters enclosed by grave accent characters. The result is a word containing the characters transliterated to BCD, right-justified, and left-padded with BCD zeroes (zero bits). Thus `a` is equivalent to `00000a`. Characters which do not have exact equivalents in the BCD set are converted to BCD blanks. Here are some sample BCD character constants:
`a` `ot` `123456`
Note that lowercase characters are equivalent to uppercase characters in BCD constants.
If your terminal does not have the grave accent character, you may write a BCD constant as an ASCII character constant preceded by a dollar sign, as in
$'a' $'ot' $'123456'
The run-time package provides functions to convert BCD to ASCII and vice versa.
A string constant is any sequence of ASCII characters enclosed in double quotes, as in
"this is a string"
""
"the above is the null string"
"this is a line*nand another line*nand another"
The "*n" new-line characters break up the last string into three lines.
When processing a string, B packs the string four characters to a word. The compiler marks the end of the string by appending one extra character, an ASCII null (000, string escape sequence '*e'). Thus the string "abc" is stored internally as "abc*e".
The value of all other constants (decimal, octal, floating point, ASCII and BCD characters) is a single word containing the internal representation of the given constant. Obviously, the same thing cannot be true of string constants since most strings are too long to fit in a single four-character word. For this reason, the value of a string constant is a single machine word containing the memory address of the actual string. This is very important. If you were to say
A = "this string";
in a B program, the variable A receives the address in memory where "this string" is stored. A does not get the string itself. Thus if you printed A without being careful, you would be printing an essentially meaningless address, not the string you wanted. Chapter 4 explains this in more detail.
When constructing a string, the compiler gobbles all the characters it sees, translating escape sequences if necessary, until it finds an unescaped double quote to end the string. If the compiler encounters any true new-line characters (as opposed to '*n' escape sequences) it does one of the following things.
a = "this is a very *
long string without a new-line";
b = "this is a very*n*
long string that contains a new-line";
c = "this will get
a warning";
When a string is broken over several lines as in the examples above, the compiler skips over any blanks and tab characters which appear at the beginning of each new line. Thus c above is equal to "this will get*na warning"; the spaces before "a warning" are not included in the string being collected. If you want to include spaces at the beginning of one of these lines, use the '* ' escape sequence for the first space in the sequence. For example,
d = "this string is two lines*n*
* with spaces";
points d towards the string "this string is two lines*n with spaces".
In a language like FORTRAN, variables are always of one data type or another and you can only do certain operations with certain types of variables. For example, FORTRAN won't let you add a floating point number to a character, or perform logical operations on integers.
B, on the other hand, is a typeless language. The compiler doesn't keep track of whether variables refer to integers, characters, octal numbers, and so on; that is left up to you. You can subtract the letter 'a' from 1.5 without getting an error in B...of course, the answer won't mean much, but you can do the subtraction.
This typeless nature of B gives you a great deal of freedom. Naturally, it gives you a certain amount of responsibility too; it's up to you to make sure that the operations you are asking B to perform make sense. For example, when you are multiplying two variables with the floating point multiplication operator, you are responsible for ensuring that the variables really are floating point numbers. If they aren't, the multiplication still takes place, but the result is bound to be useless.
B really only has two types of data objects: words and vectors of words. The rest of this chapter describes these different objects, as well as a special kind of construct called a manifest. First though, we must talk about the names that data objects may have.
In B, an identifier or name is formed from the characters a-z, A-Z, 0- 9, the underscore (_), and the dot or period (.). The first character of an identifier cannot be a digit. Since the run-time package uses names which contain one or more dots, you can prevent name conflicts by avoiding the use of dot in any names defined in your programs.
Names in B may be arbitrarily long, but only the first eight characters are significant. Thus the compiler believes the names function1 and function2 refer to the same thing. Furthermore, the GCOS and TSS loaders only pay attention to the first six characters of a name. For this reason, the first six letters of all function names and external variable names should be different from one another. (Chapter 4 discusses functions and external variables.)
Normally the compiler ignores whether letters are in upper or lower case, so the identifiers SUM, sum, Sum, etc. all refer to the same thing. When you can compile a program, you can tell B to pay attention to the case of letters by specifying the proper option in the command line which invokes compilation (see Section 9.1). Even with this option, however, B ignores case distinctions in external names (external names are discussed in Chapter 4).
There are 15 keywords in the B language. These may not be used as identifiers in B programs. Be especially careful that you don't inadvertently use next as one of your variable names.
The keywords of B may be listed in groups as follows:
If you are using the option which tells B to pay attention to the case of letters, the keywords above must be written in lowercase. Otherwise, the case of the keywords makes no difference to the compiler.
A simple variable is a single word of memory with an associated name. As we have pointed out before, B does not keep track of what kind of data you have stored in this word. Thus you can say
a = 'm';
b = 5;
c = a + b;
without getting an error. The above code places the value of the constant 'm' in the memory location referred to by the name a, places the integer 5 in the memory location of b, and then adds the letter to the integer. The result is put into the memory location of c and can be thought of as the character 'r', the octal number 0162, or the decimal integer 114, depending on how you want to use it.
If you have experience with other programming languages, you are probably familiar with the idea of a vector. However, the actual implementation of vectors in B is quite different from vectors in most other languages, and so we will go to some effort to explain the concepts involved.
At its simplest, a vector is just a collection of consecutive words in memory along with an associated name. The name of the vector does not refer directly to the vector's data; instead, the vector name refers to a single word of memory which in turn contains the address of the first word of the vector. (We sometimes call this address the vector pointer.) It is very important to keep this distinction clear in your mind. If, for example, vec is the name of a vector, the statement
vec = 5;
does not set any of the words in the vector to the value 5. Instead, the statement puts the integer 5 into the single word of storage which previously held the address of the vector. Unless you've saved this address somewhere else, the statement above has just written over the only way you had of finding your vector.
The simplest way to access an element of a vector is to use the name of the vector followed by an expression in square brackets. This is known as subscripting. For example,
vec[10]
refers to the tenth word of the vector vec. B calculates this address by adding the number 10 to the address of the vector contained in the vector pointer vec. Thus vec[0] refers to the start of the memory pointed to by vec, vec[1] refers to the next word in memory, and so on.
Whenever B sees a subscript in square brackets, it adds the value of the subscript to the value of the vector pointer and uses the result as the address of a vector element. If your subscript is in the variable i you can get the ith element of a vector a by writing
a[i]
Surprising though it may seem to those used to other languages, you can write exactly the same thing as
i[a]
since in either case B adds the contents of i and the contents of a to get an address. Because B is typeless, it makes no effort to check whether a or i is actually the name of a vector. As far as the compiler is concerned, they are the names of single words in memory.
B does not check to see if an index has gone past the end of a vector. For example, even if you have only defined vector vec as 20 words long, you can still talk about vec[21], vec[30], or vec[100] without getting an error message. Of course, this is a dangerous practice, since it's hard to tell what lies in memory beyond the end of a vector. You may be looking at garbage, you may be looking at storage for another variable or vector, or you may be looking at some of the internal workings of your program. If you go too far past the end of your vector, you may even try to look at memory that hasn't been allocated to your program and end up with a memory fault error from the system. In B, it is the programmer's responsibility to make sure that indices don't run off the end of a vector.
Generally speaking, B treats every word in a vector as a separate element. Sometimes though you want to simulate an array of records, each of which is several words long. It is usually very simple to perform this simulation. For example, suppose xxx points to a vector which is made up of "records" which are three words long. Then
xxx[3 * i]
accesses the first word of the ith record in the vector. B simply multiplies i by three and adds this to the address in memory location xxx. To access the second word of the ith record you could type
xxx[1 + 3 * i]
since B performs the multiplication before the addition.
B does not explicitly support arrays of higher dimension than one. However, you can use as many subscripts with a vector name as you want. For example, if x is the name of a vector, the construct
x[i][j]
is interpreted as follows. The value of i is added to the vector pointer x to get the address of a word in memory. The contents of that word are then added to j to get a second address and that second address is the one to which x[i][j] refers. Note that in this way, B is assuming that the elements of the vector x are such that adding j to them gives a meaningful address. In Section 4.1, we describe a way in which the elements of a vector can be initialized as pointers to other vectors. Thus x[i][j] would be the jth element of the vector pointed to by x[i]. Using constructs like this, you can manipulate an array of any dimension.
A manifest constant (or more briefly, a manifest) is not really a data object at all. A manifest is just a symbol which can be used in the compilation process to stand for a string of characters.
A manifest constant is defined in a statement of the form
name = text;
where name is any valid identifier, and text is simply a collection of any characters except ';'. Some examples of manifests are
SIX7 = 0777777;
VECSIZE = 10;
B = VECSIZE + VECSIZE;
TWICE = 2 * ;
As in all identifiers, the letters in manifest names may be upper or lowercase. However, the common convention is to write all manifest names in uppercase. This convention helps you to distinguish manifests from variables when you are looking at the source of your program.
When a manifest is defined, the compiler enters the name in its symbol table and stores the associated text in an internal buffer. Absolutely no processing is done on the text at the time of definition.
When the compiler reads an identifier during compilation, it checks to see if the identifier is a manifest. If so, the identifier is replaced by the text associated with it. This replacement does not take place if the manifest name occurs inside a string or character constant.
Because manifests are changed into text in this way, manifests cannot be redefined. If you were to attempt
VECSIZE = 10;
...
VECSIZE = 20;
the second statement would be changed into
10 = 20;
which is a syntax error. The compiler replaces manifest names with their corresponding text values before analyzing the syntax of the line.
Manifests may be used anywhere in a B program, including inside other manifest definitions. In general, a manifest must be defined before it is used in any other statement. The exception is when a manifest occurs in another manifest definition. Since B stores the text associated with a manifest name without analyzing the text, other manifests appearing in the text need not be defined at the time. However, if one manifest contains a second manifest in its text, the second manifest must be defined by the time the first manifest is used in a B statement (otherwise B doesn't know what to replace the second manifest with). The safest approach is to define all your manifests at the very beginning of your B programs. The B compiler lets you nest up to 10 levels of manifests inside other manifests.
Manifests are just names for strings of text and don't really represent data objects which have memory space allocated to them. For example, consider
A = 10;
B = A + A;
C = B * B;
When C is used in a program, it is expanded to
C = A + A * A + A;
and substituting for A
C = 10 + 10 * 10 + 10;
Thus C is actually the number 120, not 400 as you might have thought if you said to yourself that B was equal to 20 and C equals B * B. Because of this kind of problem, it is usually a good idea to put parentheses around most manifest constants so that they are evaluated in the way you expect; thus you might say
B = (A + A);
Manifests are used as convenient shorthand symbols in B programs. For example, SIX7 is easier to type than the octal constant 0777777 (and easier to read too). With TWICE standing for 2*, the expression
TWICE i
stands for 2* i. It is very common to use a manifest like VECSIZE for the length of a vector. In this way, you can change the length of a vector just by changing the manifest VECSIZE, rather than going through your source and changing every occurrence of your vector length. You can also use manifests to define structures in a vector. For example, if vec is a vector whose records are three words long, you can define
WORD1 = 3 * ;
WORD2 = 1 + 3 * ;
WORD3 = 2 + 3 * ;
In this way, vec[WORD1 i] is the first word of record i, vec[WORD2 i] is the second word, and vec[WORD3 i] is the third.
To end this section, we give an example where manifests are used in a program that prints a binary tree. The tree is represented by a vector of records which are each three words long. The first word of each element gives the contents of a node, the second word is the address of the node's left descendant, and the third word is the address of the node's right descendant. If a node doesn't have a left or right descendant, the descendant pointer is set to -1.
(Note: This example contains a number of statements and operators which have not yet been discussed in this manual. Still, you should be able to appreciate what this program does, even if you aren't familiar with all the details.)
/* binary list structure */
NULL = -1;
CONTENTS = 0;
LEFT_PTR = 1;
RIGHT_PTR = 2;
...
printree( ptr )
if( ptr != NULL) {
printree( LEFT_PTR[ptr] );
print_contents( CONTENTS[ptr] );
printree(RIGHT_PTR[ptr] );
}
/* end printree */
A program written in B can contain three kinds of components:
A program can contain any number of each kind of component, and components can occur in any order, provided that manifests are defined before they are used in other program components.
Chapter 3 discussed manifest constant definitions. An external variable definition defines a simple variable or vector which can be used by any function in your program, provided the function explicitly declares its intention to use the external variable with an extrn statement. (extrn is described in Section 5.4.1). An external definition automatically allocates sufficient static memory storage for the simple variable or vector.
A function definition defines a piece of the executable code of a program. Since all executable code must appear inside a function body, functions can be thought of as the building blocks of any B program. A function definition includes the name of the function, the arguments it accepts, and the statements which determine what the function actually does.
External variables are the only form of "global" variables in B. We begin with the possible forms of external variable definitions, then look at several examples.
In the following external definition forms, ival stands for "initialization value". This may be any valid constant, constant expression, or manifest. (A constant expression may be a string constant or any valid combination of character or numeric constants, binary operators, unary operators, and parentheses. Chapter 6 gives the rules for forming constant expressions from constants and operators.)
The initialization value ival may also be the name of an external variable or function. In this case, the value of ival is taken to be the memory address of the variable or function. If the name given is the name of a vector, the variable is initialized to the address of the vector pointer, not the address of the vector itself. Thus if a variable is initialized to vec and if vec is a vector, the variable holds the address of the single word vec which in turn holds the address of the actual vector.
Here are the possible forms of external definitions:
In effect, this defines a vector which does not have a word set aside as a vector pointer. The symbol &name stands for the address of name. Consequently, (&name)[0] refers to the contents of the zeroth word of the vector, (&name)[1] the contents of the first word, and so on. name on its own refers to the first ival given. This is the way a vector is set up in FORTRAN, but it is not the same as a B vector. (Note that the external definition form name { ival }; is just a degenerate case of this kind of external definition.)
vec [10];
allocates eleven words of storage for vec, so that your indices can run from zero to ten.
The const-expr in brackets may be any valid combination of numeric or character constants, unary operators, binary operators, and parentheses. You must make sure the value of the expression is reasonable, since B accepts absurdities here like floating constants and negative numbers. For practical purposes, const-expr must give an integer result.
For compatibility with a previous version of the compiler, B also accepts an ival or ival list which is not surrounded by braces. In this case, the compiler does not permit a constant expression to appear. Only a numeric, character, or string constant is acceptable, although an integer constant may be prefixed by a minus sign.
Here are some examples of external definitions:
z = (&c)[0];
also sets z to 'ab' and
z = (&c)[1];
sets z to 'abc'.
z = &c;
gives z the address of c and therefore makes z a pointer to the vector which begins at c (so that z[0] is 'ab' and z[1] is 'abc').
z = f;
gives z the same pointer to "a string".
As mentioned in the last chapter, B has no explicit facilities for handling arrays of more than one dimension. Usually if you need more than one dimension, you build it during execution by calling the library function GETMATRIX. GETMATRIX obtains storage, constructs the necessary edge vectors, and returns a pointer to the array (see Chapter 7 for more on array-handling library functions).
Although you cannot explicitly define a multi-dimensioned array, you can still construct one as an initialized external. The secret is that any ival (inside braces) may be replaced by an ival or ival list surrounded by braces. The compiler then constructs the ival list and places a pointer to it in the original ival list. For example,
x[ ] {
{ 00, 01, 02 },
{ 10, 11, 12 },
{ 20, 21, 22 }
};
puts this feature to use. In this case, x is initialized as a pointer to a vector containing three pointers. Each pointer points to a vector of three words. In an expression, the value of x[0] is a pointer to the first vector of three words, while the value of x[1][2] is 12.
The maximum depth to which these array-like initializations can be nested is seven levels.
B functions are similar in purpose to subroutines in FORTRAN and procedures in PL/I. Every working B program must contain one function called main: the function where execution of the program begins. Most B programs contain a number of other functions as well, since B is designed to encourage structured or "modular" programming.
The general form of a function definition is
name( arg1, arg2, ... ) statement
The name must be a valid identifier and is automatically defined by the compiler as an external (and therefore "global") symbol. Any function in a B program may call any other function; you can even call main. A function may also call itself recursively (discussed later in this chapter).
The argument list shown above need not contain any arguments at all. If it does contain arguments, they are given as a list of legal identifiers separated by commas. These arguments are "local" or auto variables; in other words, storage is allocated to these variables for the duration of the function's operation. (Section 5.4.2 discusses such variables in more detail). Once the function is finished, this storage is released and the local variables "disappear".
(To be precise, these variable values are not automatically wiped out when a function is finished. However, the storage they have been using is made available for other purposes so you can't really count on the storage remaining the way it was when the function was still executing.)
The statement in the function definition above defines what actions the function takes. Most of the time, this is a compound statement consisting of a number of statements enclosed by brace brackets. Chapter 5 gives the rules for forming statements.
When one function calls another, the arguments (if any) are always passed by value. This means that altering an argument inside a function has no effect on the value of the argument passed by the calling function. However, if one of the arguments passed is the address of a variable, the function which receives the address can use it as a pointer to where the variable is actually stored and thereby affect the caller's variable. For example, consider a call like
func (vec);
where func is a function and vec is a vector. func receives the value of vec which is the address of the beginning of the actual vector. Using this address, func can change the values of the vector elements. However, func cannot change vec itself, since func receives only the value of vec, not the location in memory where that value is stored.
A function may return a one-word value to its caller any time during its operation using the return statement (see Section 5.5.2). The caller and the callee do not have to agree on whether or not a function returns a value. If a value is returned but not expected, the value is ignored. If a value is expected but not returned, the value received by the caller is garbage.
A function can determine how many arguments it has been passed by its caller by invoking the library routine nargs. For example, the statement
x = nargs();
sets the variable x to the number of argument words used to call this invocation of the function. Because a function has this way to determine how many arguments it has been passed, B lets you write functions that take a variable number of arguments. Most of the time, this means that a function is called with fewer arguments than are defined for it in the function definition. In this case, the function has to determine the actual number of arguments it has received and establish default values for the arguments it does not have. If a caller specifies more arguments than a function needs, the surplus arguments are ignored.
Function calls in B are handled by use of an internal stack. This is a large block of memory which is used for storing various types of information.
When a function is called, internal information about the call is stored on the stack. Storage for the local or auto variables is also allocated on the stack. If a function calls itself recursively, the auto variables of the new invocation are again stored on the stack, so that the new function has its own set of local variables to work with. (Naturally there is only one copy kept of any external variable, and storage for external variables is not allocated on the stack.) Because B operates this way, you can nest functions (recursively or otherwise) to as many levels as your stack space allows. If you perform too many function calls, you fill up your stack area and run into memory used by your program for other purposes. Obviously, this can be a dangerous situation.
By default, the compiler allocates 500 words of memory as stack space. This is ample storage to handle a reasonable number of function calls. If this stack-size is not sufficient (or if you want to reduce this stack-size to reduce your memory requirements), you can specify a different stack- size in the command that compiles your program (see Chapter 9).
When a function returns to its caller, the stack space that was used by that function is released. The next time a function is called, it uses the stack space that belonged to the previous function (or at least a portion of it). For this reason, you cannot expect to be able to use the contents of an auto variable after the function that used the variable has finished operation: even if you save the variable's address, the value at that address is bound to be written over the next time a function is called.
This section has been a very brief introduction to the mechanics of function calling, but it should be enough for the average user. Those who wish to know more about this process should read the B Environment Manual ("expl b environment manual"). This manual is of a highly technical nature; it requires a knowledge of GMAP and other GCOS8 features.
Statements are used to define the actions to be taken by a B function. Statements may only appear in the body of a function definition. In certain cases, a complete statement may occur inside another statement.
Many types of statements may contain expressions. Since the rules for formulating expressions are discussed in the next chapter, we will merely note here that an expression may be a statement, but a statement may not appear inside an expression.
In every case where a statement is permitted, it may be replaced by a compound statement consisting of one or more statements enclosed in curly braces, as in
{
statement1
statement2
...
}
The compiler does not permit a null compound statement like
{ }
All statements, except compound statements, must end with a semicolon.
When B is compiling statements (and everything else for that matter), it treats its input as an unbroken stream of characters. Formfeed, tab, and new-line characters are all converted to space characters, except where they occur inside string or character constants. New-line characters are counted so that the compiler can tell you the line number of a line in which an error occurred. Lines may be of any length. Unlike some other compilers, B does not recognize any part of a line as a "sequence field" (although it does accept line numbers as described in Section 9.3).
In the formal definitions which follow, keywords are printed in bold face. When parentheses are shown in a definition, they are required. By convention, the word "statement" stands for a B statement ended by a semi-colon, or else a compound statement enclosed in braces as described above.
Comments are not true statements: they can be included anywhere in a B program that a space could be used, inside or outside of function definitions. The beginning of a comment is signalled by /* in the input stream. The compiler ignores everything in the input stream until it encounters */ to end the comment. For example,
/*
* This is a comment.
*/
is a comment that takes up three lines of input.
Comments may not be nested. For example,
/* This comment /* has a comment inside it. */ */
causes a syntax error.
;
The null statement does absolutely nothing. It is typically used to supply a null body to a while statement, as in
while( putchar( getchar() ) );
or to provide a convenient place on which to hang a label as in
label: ;
expression;
Any valid B expression followed by a semi-colon is acceptable as a statement. To be meaningful, the expression usually involves an assignment operation or a function call, as in
x = min(a,b) + x;
open( "/myfile", "r" );
++i;
However, the compiler happily accepts statements which do absolutely nothing, such as
a < b;
open;
i;
Remember that in B, assignment is an operator in an expression, not a statement.
Before discussing the statements which pertain to storage allocation or memory referencing, we must briefly review what we have said about the various types of data objects used in a B program.
External variables are stored in a global pool of memory which is accessible to any function in your B program. If a function intends to use a particular external variable, it must state its intentions by naming the variable in an extrn statement.
The local or auto variables of a function are allocated storage on the stack each time the function is called. When the function returns, the storage allocated for local variables is released for use in future function calls.
Constants used inside functions are allocated storage in the area of memory used by the executable code of the function. The compiler does not accept constructs that would result in directly changing a constant's value. Thus the storage used by constants should be considered "read-only".
Labels are also stored in the body of a function's executable code, and are not directly accessible to the user.
Finally, there is a pool of free storage which can be dynamically allocated by the library function GETVEC and dynamically released by the library function RLSEVEC (described in Section 7.3). This free storage area automatically grows as required up to the limits imposed by the operating system.
Identifiers used in the body of a function must be formal arguments of the function, labels, or variables which have been explicitly declared in extrn or auto statements within the function itself. The only exception is a function name used in a function call. The compiler automatically classifies any name immediately followed by a left parenthesis as external unless the name has already been classified in some other way.
auto and extrn statements may appear anywhere in a function body. However, it is highly recommended that you group both types of statement at the beginning of the function.
extrn name1, name2, ... ;
This statement declares the names of one or more external variables which the function intends to reference. Once the statement has been issued, the function may use any of the external variables named. Although an identifier may be externally declared as a vector pointer, you should not indicate this in the extrn statement. All the function needs to know is that the one-word vector pointer is an external variable. The actual elements of the vector can be obtained using subscripts with this single word, as in
extrn vec;
vec[1] = 1;
auto name1, name2[const-expr], ... ;
The auto statement declares local storage unique to each invocation of a function, as in
auto x;
auto i, j, x[10];
To create a local vector, you must specify the vector's name and length in an auto statement as shown above. The length that you specify for the vector must be a constant expression since it is established at compile time. Because auto vectors are allocated storage on the stack, auto vectors should not be overly large; otherwise, you may use up all your stack space in just a few function calls. For vectors longer than 64 words, it is general practice to use GETVEC to obtain storage from the free storage area, rather than declaring the vectors as auto and taking up room on the stack.
The const-expr above is any valid combination of numeric or character constants, unary operators, binary operators and parentheses. Make sure that what you use is sensible, because the compiler accepts nonsense constructs like
auto x[-1];
which lead to unwanted results. Usually, array dimensions are either simple integer constants or expressions involving a manifest, as in
MAX = 10;
func() {
auto x[MAX*2], y[MAX + 7];
All auto statements should appear at the beginning of a function because they actually do a certain amount of work. An auto statement points the variables towards the stack storage allocated for them. Thus if you have a loop of the form
loop: auto a[10];
...
a = "string";
...
goto loop;
the word a is first set as a pointer to a vector on the stack, and later as a pointer to the string "string". When the function loops back to loop, the auto statement points a back to the vector. Putting all your auto statements at the beginning of a function avoids problems like this where you may be surprised by a sudden re-assignment.
The initial contents of an auto vector or auto variable are always undefined when a function begins execution. Thus you must make sure that your functions always initialize their local variables explicitly.
Any unique identifier followed by a colon and preceding a statement is defined as a label, as in
again: ;
nxt: x = getchar();
A statement may be preceded by as many labels as you find necessary, as in
lab1: lab2: lab3: printf("hi there");
There are four kinds of statements which transfer control from one part of a program to another. The goto statement jumps to a specified label. The return statement exits from a function. The next and break statements simplify loop control.
goto identifier ;
transfers control to the statement which has the label identifier. If identifier has not already appeared in the function, B assumes it is a label.
You may transfer to any location inside a function body, even into or out of a compound statement; however, it is almost always a bad idea to transfer into a compound statement. Not only is the action confusing to understand when reading code, it can lead to unpleasant surprises.
Because B is a typeless language, the compiler has no way of knowing whether the identifier you supply in a goto statement will really turn out to be a valid label. It is valid, but probably erroneous, to say
extrn b;
...
goto b;
Never try to pass a label as an argument to a function and then use that label to transfer to another function. The program will end up in the destination function, but with the previous function's stack pointer. This is bound to result in disaster eventually.
return;
return ( expression ) ;
The return statement ends the execution of a function, returning to the function's caller. Upon return, all temporary storage in use by the particular invocation of the function is released for future use by other functions.
The first form of the return statement merely returns control to the calling function. The second form passes back a one word value.
The construct
return();
is considered a syntax. If you do not want to return a value, omit the parentheses entirely.
If there is no return statement at the end of a function definition, B implicitly inserts one. This means that control returns to the caller at the end of a function, whether or not there is an explicit return statement there.
Normally, a B program terminates immediately after executing the last statement of main. The library function EXIT can terminate execution of your program at some other point (see "expl b lib exit"). Several other run-time functions can also terminate your program, including ERROR and .ABORT.
break;
break drops out of the innermost enclosing while, for, switch, repeat, or do-while statement. The compiler generates a fatal error if a break statement is not inside one of these.
next;
next skips all further statements in the innermost enclosing while, for, or do-while loop, and transfers to the test which determines whether looping should continue. Inside a repeat statement, next transfers to the top of the repeat block. Note that next is only valid in a switch statement when the switch itself lies inside one of these looping statements.
if ( expression ) statement
If the result of the expression is non-zero, if executes the statement. Note that the expression must be enclosed in parentheses.
A more complicated form of the if statement is
if ( expression ) statement else statement
If the result of the expression is non-zero, the first statement is executed; otherwise, the second statement is executed.
If a nested if statement has fewer elses than ifs, the compiler associates each else with the closest if at the same level of nesting. For example,
if ( ex1 ) if ( ex2 ) stmt1 else stmt2 else stmt3
resolves to
if ( ex1 ) {
if ( ex2 ) stmt1
else stmt2
}
else stmt3
Think of ifs and elses being placed on a pushdown stack as they appear. In this way, an else is paired with the if immediately preceding it, and both are popped off the stack together at the end of the else. If a new else then occurs, it is pushed onto the stack and paired with the next if that is still on the stack.
Here are some examples of if statements:
if( a ) y = x;
if( a < 2 ) y = a; else y = 0;
if( a != b )
z = g( y );
else {
a += x;
b -= y;
}
Iterative statements repeat zero or more other statements until something stops the looping.
repeat statement
repeat merely executes the given statement forever unless a break statement is encountered, or a goto passes control to a statement outside the loop. The statement in a repeat statement is almost invariably compound. next and break statements are valid inside a repeat.
while ( expression ) statement
If the expression is non-zero, the statement associated with the while is executed. After execution of the statement, the expression is re-evaluated. If the expression is again non-zero, the statement is executed again. In other words, while the result of the expression is non-zero, the statement is executed. When the result of the expression is zero, control passes to the next statement following the while statement.
If the given expression is initially zero, the statement is not executed.
break and next statements are valid in a while statement.
do statement while ( expression ) ;
The do-while provides a loop with a test at the bottom of the loop. It is equivalent to
repeat {
statement
if( !expression ) break;
}
Thus, if the given expression is zero, the statement is still executed once.
break and next statements are valid in a do- while statement.
for ( expr1; expr2; expr3 ) statement
The for statement may be used to set, test, and increment a variable in order to control a loop. The for statement is equivalent to
expr1;
while ( expr2 ) {
statement
expr3;
}
The first expression (generally the initialization of a controlling variable) is evaluated. Then, while the result of the second expression (usually a test) is non-zero, the statement is executed. Before returning to re-evaluate the second expression, the third expression, (often incrementing the controlling variable) is evaluated.
Both break and next are legal in a for statement. The effect of next is to pass control to the evaluation of the third expression.
Any or all of the expressions in the for statement may be null. Furthermore, they need not involve the same controlling variable, if a controlling variable is even involved. Note that the second expression is always treated as a logical expression. Here are some examples:
for( i = 0; i < 10; ++i )
x[i] = j[i];
for( i = 10; i <= x; i += 2 )
for( j = 1; j < y; ++j )
g[i][j] = f( i + j );
for( ; i < n; ++i )
y[i] = z[n - i];
NULL = 0;
NEXT = 1;
DATA = 0;
...
for( p = startlist; p != NULL; p = p[NEXT]; )
if( p[DATA] >= x ) break;
The switch statement provides a conditional branch depending on the one-word result of an expression. The switch has the following formal syntax:
switch ( expression ) statement
The statement is always compound, and hence can never be null. Special labels are allowed inside the statement to indicate where processing starts for a given case, as in
switch ( expression ) {
case const-expr: statement
case const-expr :: const-expr: statement
break;
case op const-expr: statement
/* op. is one of <, <=, >=, > */
default : statement
}
The switch evaluates the expression and compares the result with the constant or constant bounds in each case label. It selects a case, if there is one that matches the calculated result, and begins executing the compound statement at the statement immediately following the appropriate case label. If the expression result fits no case, execution continues at the label default (if supplied) or at the next statement following the switch, if default is not supplied.
Once a case is selected, execution always falls through into the next case, unless the program finds a statement that alters the control flow. break is often used to jump out of a switch statement after executing the code for a particular case.
A statement may have more than one label or case label, just as a label or case label may be followed by more than one statement.
As shown above, a case may be satisfied by
Overlapping bounds draw a fatal diagnostic from the compiler.
As usual, we use const-expr to denote any legal combination of numeric or character constants, unary operators, binary operators and parentheses which can be evaluated at compile time as some constant value. String constants are not permitted in this context.
Attempts to switch with floating point values may have unusual results, since the generated code performs integer comparisons. There is no problem with exact floating point comparisons, but if a range of floating point numbers is specified, the odds are that things won't work the way you want.
The compiler is capable of generating code for a switch statement in several different forms. It chooses which form to use on the basis of efficiency considerations.
As an example of switch, here is a function which determines whether a character is valid in a B identifier. The function also converts uppercase letters to lowercase.
alphnum( c )
switch( c ) {
case 'A' :: 'Z' :
/* converts upper case to lower */
/* and falls through to return */
c += 'a' - 'A';
case 'a' :: 'z' :
case '0' :: '9':
case '.':
case '_':
return( c );
/* would use break if return not used */
default:
return( 0 );
}
/* end of alphnum */
Expressions in B are constructed according to rules that govern combinations of operators, identifiers, square brackets, and parentheses. B has a large set of operators, which are described in this section.
Because B is typeless, the compiler always assumes a given operation on a word is appropriate. Although this forces you to do more checking yourself, it also gives you the scope to do almost anything you want. This typeless characteristic often causes trouble for beginning users of B, because the compiler happily accepts questionable operations, such as adding one to a function name, or using a pointer as a function call. Such is the price of freedom.
The compiler takes no responsibility for the validity of expressions. There is no run-time monitoring of possible arithmetic overflows or faults. In the B run-time environment, overflow faults are inhibited (unlike the Pascal environment where they are not). A divide error (such as dividing by zero) leads to an error message and program termination.
Expressions are evaluated according to a precise order of binding. This includes both the hierarchy of evaluation (which determines the order in which different types of operators are evaluated), and grouping (which determines the order in which operators of the same type are evaluated). This chapter lists the hierarchy of evaluation from the operators ranked highest (and so evaluated first) to those ranked lowest. The hierarchy of evaluation and the grouping rules are summarized in Appendix A.
Note: The rules of binding and grouping do not completely dictate the order in which expressions are evaluated. In particular, arguments to functions may be evaluated in any order that the compiler chooses. This may make a difference, as in
func(i,i++)
Suppose i begins with the value 1. If the compiler chooses to evaluate the first argument before the second argument, the call turns into
func(1,1)
However, if the compiler chooses to evaluate the second argument before the first, the call turns into
func(2,1)
because the value of i has already been incremented by the time the compiler evaluates the first argument. Since there is no way to guarantee the order in which the compiler evaluates function arguments, you must avoid function calls that have this kind of ambiguity.
In the same way, consider an expression like
func1(a) + func2(b)
The compiler may evaluate either operand first; it may call func1 then func2, or vice versa. Therefore, you should avoid code where the order of operation makes a difference. This applies to all operators where the order of operand evaluation is not stated specifically. (Do not confuse order of operand evaluation with grouping. Grouping states that
A + B + C
is evaluated as
(A + B) + C
In other words, the left addition takes place before the right. However, there is no guarantee of the order in which individual operands are evaluated.)
The primary expression is the basic building block used in constructing more complex expressions. Primary expressions are defined recursively as follows:
Here are some examples of simple primary expressions:
x getchar() (a + b) 6 x[i] 6[x]
In cases where a primary expression is composed of other primary expressions, grouping occurs from left to right. Consider, for example,
x[i][j] x[i]()
In the first case, x is treated as a pointer to a vector of vectors. In the second case, x is treated as a pointer to a vector of functions, and the expression calls one of those functions. In both cases, x[i] is evaluated first and placed in a temporary storage location; for the purpose of illustration, we will call this temporary storage location y. The two expressions above are evaluated as y[j] and y(), respectively.
The general form of a function call is:
primary( expr1, expr2, ..., exprn )
Most commonly, primary is just the name of the function you wish to call, but the generality of the construction also lets you use vectors or lists of function addresses.
B gives no error if a caller expects a function to return a value but no value is returned. In this case, the caller receives an undefined value, which of course can lead to errors. It is up to you to make sure that functions return values when the caller expects such values.
You must make sure that a function is called with as many arguments as it needs. It is possible to write a function that can cope with receiving too many or too few arguments, but if the function is not ready to handle variable numbers of arguments, specifying the wrong number of arguments will lead to errors.
The compiler recognizes function calls by the parentheses around the argument list; thus these parentheses must always be present. For example, suppose you have a function named proc which requires no arguments. To call it, you must write
proc()
If you wrote
proc
the compiler would not recognize this as a function call.
When dealing with operators and expressions, it is important to distinguish between the contents of a word and the address of a word.
The term Rvalue refers to the contents of a word or the value of an expression. The term comes from the fact that Rvalues frequently appear on the Right hand side of an assignment (though they can certainly appear in other instances as well). Any expression in B may be evaluated for an Rvalue. For example, the Rvalue of a subscripting operation is the contents of the word addressed by the sum of the vector pointer and the index.
Everywhere in this manual where we say "expression", we mean an expression whose result is some Rvalue.
The term Lvalue refers to the address of a word. The term comes from the fact that Lvalues frequently appear on the Left hand side of an assignment (though they can also appear in other places as well). Only a name, a subscripting operation, or a primary expression prefixed by the unary indirection operator '*' may be evaluated for an Lvalue. The Lvalue of a subscripting operation is the address calculated by adding the vector pointer and the index.
Context determines whether an expression is evaluated for its Rvalue or its Lvalue. For example, consider the assignment
a[3] = 2 + x
The expression on the right yields an Rvalue which is the sum of the contents of x and the constant 2. a[3] adds the index 3 to the contents of the vector pointer a and yields an Lvalue which is the address of a memory location to store the Rvalue 2+x.
The expressions
6 = x
(a + b) = x
are both invalid because the expressions on the left of the assignment operator do not yield a proper Lvalue. If they did yield Lvalues, you could change the value of a constant in the first case, and make a meaningless assignment in the second.
A unary operator acts upon a unary expression to change it in some manner. A unary expression is either a primary expression or a primary expression already modified by one or more unary operators. In the definitions below, "Rvalue" or "Lvalue" must be a unary expression. Unary operators are applied from left to right. The unary indirection operator '*' operates on an Rvalue to produce an Lvalue; all other unary operators produce Rvalues.
The unary operators recognized by B are:
*6 = x
stores the contents of x in memory location six.
y = *x
sets y to the contents of the word pointed to by x.
There is a fundamental relationship between the '*' operator and subscripting which should help you understand how addressing works in B. Both seek to generate an address using one or more Lvalues. In the case of '*', the Lvalue is used directly as the address; in the case of subscripts, an offset or index is added to the Rvalue of the Lvalue first. The following are exactly equivalent everywhere:
a[b] *(a+b) b[a]
For example, a[0], *a, and 0[a] are completely equivalent in B. To be able to write or understand B programs, it is vital that you understand the validity of this relationship.
Here are some examples involving unary expressions:
i = i + 1
*(*(a + b) + c)
a + i
The equivalence can break down if one of a, i, or a+i is too big to be represented in 18 bits. Since addresses in B are only 18 bits long, treating longer numbers as addresses may lead to unexpected consequences. When one of a, i or the sum exceeds 18 bits, the value in the upper 18 bits is undefined.
All other operators are binary operators: they require both a left and a right operand. Each operand must be an Rvalued expression. The result of any binary operation is also an Rvalue.
With one exception, the order in which the two operands are evaluated is arbitrary, so the evaluation of one side should not depend on a side effect of the evaluation of the other side (a function call, for instance). Logical operators are the only exception. Their operands are always evaluated strictly from left to right.
The code that B generates for floating point operations is correct, but not especially efficient, since the compiler generates a separate load and store each time a floating point operand is used. This means that floating point capability is available if you need it, but for intensive use of floating point, you should probably call a Fortran or Pascal routine to do the job.
The binary operators that follow are listed in the order in which they are evaluated: the operators described first are evaluated first.
Shift operators group from left to right.
The chart summarizes the results of bitwise operations. These operators are described in more detail below. The table shows the effect of each operation on one bit.
Operands Results
A B A&B (and) A|B (or) A^B (exor)
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
The three bitwise operators group from left to right.
These operators perform multiplication and division.
Multiplicative operators group from left to right.
These provide integer or floating point addition and subtraction.
Additive operators group left to right.
The result is 1 if the given relation between two integer operands is true; otherwise, the result is zero.
The following operators perform the same function for floating point operands:
#== #!= #< #<= #> #>=
The left hand operand is always evaluated first. If its result is zero, the result of the expression is zero and the right hand operand is not evaluated.
The left hand operand is always evaluated first. If the result is non-zero, the result of the total expression is non-zero and the right hand operand is not evaluated.
expr1 ? expr2 : expr3
The first operand is evaluated. If the result is non-zero, the second operand is evaluated and returned, while the third expression is ignored. If the result of the first operand is zero, the third operand is evaluated and returned, while the second is ignored. In both cases, the value that is returned is an Rvalue.
This is analagous to
if( expr1 ) expr2; else expr3 but has the advantage that it may be used in an expression. For example, a function to calculate the maximum of two numbers might be coded as
max( a, b ) return( a > b ? a : b );
Grouping is right to left, so that
a ? b : c ? d : e
is equivalent to
a ? b : (c?d:e)
Lvalue = Rvalue op ( expr )
where op can be any one of
* / % + - << >> & ^ |
and Rvalue refers to the contents of the word addressed by Lvalue. Note that neither floating point nor relational operators may be used in compound assignments.
As an example of compound assignment,
x *= a + b;
is the same as
x = x*(a + b);
In all cases, the right hand operand is evaluated first, even though the operator in the assignment may have higher binding strength than an operator in the expression.
Assignments group right to left.
x = y = 0;
is taken as
x = (y = 0);
This points out that the value of an assignment expression is the value that is being assigned.
Remember that assignment is an operation, not a statement, and so is valid almost anywhere, including conditional expressions, such as
if( (x = y[i++]) == z )...
Parentheses are used in this case to alter the order of precedence. These are required here because the assignment operators have the lowest evaluation precedence, which means they are evaluated last. If the parentheses were removed, x would be assigned either zero or one, depending on the outcome of the comparison
y[i++] == z
One of the advantages or programming in B is the large library of functions available to the language. These simplify your programming problems and also supply a reasonable interface with the GCOS/TSS environment.
Every B library function available for public use has an expl file under "expl b lib". There is also an index of all documented routines in "expl b lib index".
This chapter only discusses the routines you need to get started using B. Furthermore, we do not discuss all the features of the functions mentioned here. The "explain" files are more up to date and detailed than this manual can be.
The B library functions were written to run quickly in as small a space as possible. For this reason, most library functions do no error checking. It is up to the programmer to make sure that library functions are passed valid arguments. If invalid arguments are passed to a library function, the function could very well give meaningless results or cause your program to abort. In particular note that a large number of library routines may "lockup fault" if called with the wrong number of arguments.
For historical reasons, the library is known as the B library, but in fact, the library supports several languages. Many library routines are sensitive to the format of pointer arguments and must have the upper 18 bits zero in order for the routine to function correctly. If the upper 18 bits are non-zero, the function takes the lower 18 bits as a machine pointer and the upper 18 bits as the actual word address. In addition, non-zero values in the upper 18 bits of a pointer may cause the function to behave in slightly different ways. In particular, you must remember to mask off the upper 18 bits when dealing with values passed in the argv vector to the routine MAIN.
Some functions may return a value. This chapter indicates the nature of return values using sample assignments. Also, some functions are called with a variable number of arguments. If you want to use an optional argument, you must usually specify all the optional arguments which precede it in the function's argument list. Throughout this chapter, optional arguments are shown in square brackets. For example, if a function is shown as
return = func( arg1 [, arg2, arg3] );
and you want to specify a value for arg3, then you must also specify a value for arg2.
Sometimes, the first argument may be optional. This often occurs in I/O functions where the first argument is a unit. In this case, the function examines its first argument to see if it is a valid unit number; if the number is not valid, the function adjusts argument references accordingly.
Chapter 8 describes I/O in detail. However, since all I/O in B is performed through library functions, we will describe the most frequently used I/O functions in this chapter.
Chapter 8 also provides a detailed discussion of read and write units. For the time being, all you need to know is that unit is a way to specify the physical device used in a read or write operation.
extrn .float;
to force the loading of the floating point output routines.
Note that PRINTF strips off all null ('*e') characters wherever they appear and does not print them. This is a common feature of most B output functions.
Below we give some examples of uses of the PRINTF routine.
printf( "%c", 'a');
prints the character constant 'a'.
printf( "The number is %d.", num );
prints the contents of the variable num as a decimal integer. If num contains the integer five, the print line has the form "The number is 5."
printf("Floating %f*nis octal %o.*n", x, x);
prints two lines because of the '*n' new-line escape in the format string. The first line contains the floating point value of x and the second line contains the octal equivalent of this value. Note that the second line ends in another new-line character.
If a character in the format string is not part of a recognized format, it is printed as it appears. If a format does not have a corresponding argument, it is printed as a literal string. To print out '%', you must use '%%'.
There is more to PRINTF than is described here; for full details, see the explain file "expl b lib printf".
There are a few other routines which should be mentioned, but which will not be described in detail here:
The compiler itself does not provide any string-manipulation abilities; all operations on strings are handled by function calls. Recall that a string is a sequence of characters packed four to a word in a vector and terminated by the ASCII null '*e'.
char("the string",2)
is 'e'.
For BCD strings, you should use the function CHARB instead of CHAR and LCHARB instead of LCHAR. The calling sequences are the same, but they take and return BCD characters, not ASCII.
print(x,"The answer is %d.",2)
assigns the string "The answer is 2." to x. The print string created by PRINT is always a string of ASCII characters. Because of this, PRINT can be used to convert from BCD to ASCII; for example,
print(x,"%b",y);
takes y as a pointer to a string of BCD characters, converts this string into ASCII characters, and places the ASCII string in the memory location indicated by x. PRINT returns a pointer to the created print string as its value.
concat( xstr, ystr )
CONCAT copies one string into another. CONCAT returns its first argument as its value (i.e. the output string).
compare( "string1", "string2" )
is -7.
stopstr is a string which contains a list of acceptable delimiters to end the string being collected, while skipstr is a string containing a list of characters to be skipped before beginning to collect characters for the string. For this reason,
scan(x,y,0," *n"," ;.")
begins at position zero in string y, skips over any spaces and new-line characters until it finds a different character, thens collect characters in x until SCAN finds a space, a semi-colon, or a period. SCAN implicitly assumes that stopstr contains the end-of-string character '*e'. If the stopstr argument is omitted, the default is the same string of characters as in skipstr (plus '*e'); if skipstr is omitted, the default is " *t" (spaces and tabs). The value newpos which SCAN returns is the position of the character which ended the scanning. In this way, SCAN can be called repeatedly to obtain arguments from a command line.
There are a number of other functions which perform string operations, including:
All these functions have explain files.
The library supplies several functions which let you obtain or release memory dynamically. All such memory is located in a free storage pool called the heap. Dynamic allocation from the heap is useful when you need storage for a vector, but won't be able to decide how much space you need until run-time. (If you did know the amount of space, you could just declare the correct length at compilation time.)
x = getvec( 63 );
x[1] = 3;
All memory allocation is done by manipulating a linked list of free memory called the free list. The free list initially includes the so-called "core hole". You can use RLSEVEC to return any area of memory which is not on the free list, as long as the address of the memory is greater than the load address of RLSEVEC. If you attempt to release memory which is already on the free list, in whole or in part, RLSEVEC immediately aborts your program.
GETVEC obtains more storage from the operating system if it is required. In TSS, a subsystem is aborted with the message "not enough core to run job" if a request for memory cannot be satisfied. In batch, GETVEC aborts with a "0K" abort code if the system denies a request for memory. Whether or not your program aborts, indiscriminate use of GETVEC without corresponding RLSEVECs can dramatically build up "garbage" storage and increase the size of your program unnecessarily.
Finally, here are three useful routines, each described fully by an explain file, which call GETVEC and RLSEVEC.
The largest class of functions in the B library are those concerned with input and output. Sequential input routines read any sequential file in standard system format, including media 0, 2 or 3 BCD, media 5, 6 or 7 ASCII and media 1 compressed source decks (comdks); in the process, the input routines convert the input to ASCII if necessary. Output is ASCII (media 6) unless specified otherwise.
The I/O package creates output files if necessary and "grows" them as required up to their maximum size or to the limit of the file space quota for a userid.
A B program may have several files open for reading or writing at the same time. Each file is associated with a number called a unit, and every I/O function refers implicitly or explicitly to the I/O unit number.
There are five units whose function is predefined and may not be altered by the program.
Unit 0 is initialized as the standard input unit. In TSS, this is the terminal but input may come from a file if you redirect the standard input on the command line that invokes the program. In batch, reading from unit 0 uses filecode I* if the filecode is defined and if input has not been redirected. If I* is not present and there is no input redirection, unit 0 is placed in the end-of-file condition.
Similarly, unit 1 is initialized as the standard output unit. In TSS, the default standard output is the terminal, but this may be redirected on the command line that invokes the program. In batch, output to unit 1 goes to the filecode P*, which is associated with the printer unless redirected.
Units 2 through 49 may be assigned by or to you, using various library functions. These units are usually associated with permanent or temporary disk files. It is permissible to open units 0 or 1, but the usual practice is to leave them alone, so they may be redirected.
Before any I/O may be performed on a unit, it must be initialized by a call to OPEN. OPEN has the form
ret = open( [unit,] filename, action );
where:
Only a few of the various situations accepted by OPEN are dealt with in this section. For a full description of OPEN, see "expl b lib open".
Most of the time, the only thing you need to specify with OPEN is how you want to access the unit. Here are the alternatives:
For ordinary sequential file processing, you usually only have to specify one of the above actions.
OPEN offers three ways to specify the mode of the unit being opened in the action string. They are listed below.
If the action does not contain any of these modes, OPEN assumes that the I/O unit is a sequential file.
Normally, OPEN never returns an error status, since the default action is to abort the program with a reasonably understandable error message.
The OPEN function lets you specify options which let you handle file opening errors, file I/O errors, or both. These options are in the form of characters which may appear in the action string:
As an example, suppose you write
open( "/myfile", "rfm" );
and /myfile cannot be opened with read permission. OPEN prints an error message, then returns a negative value to indicate that the open operation failed.
For most errors, OPEN returns the negative of the file system error status. For example, OPEN returns -5 for "permissions denied".
At any given time, there is a default read unit and a default write unit, used by functions like GETCHAR and PUTCHAR. When you use OPEN to open a unit for reading, the unit which is opened is usually made the new default input unit. Similarly, using OPEN to open a unit for writing makes that unit the new default output unit. If you wish to prevent a unit from becoming the new default input/output unit, you can specify a "u" in the options for OPEN. Thus
open( "/myfile", "wu" );
opens /myfile for writing, but does not make it the new default output unit.
In TSS, OPEN follows a number of file access/create conventions.
A quick access name is one which contains no slashes or dollar signs and does not have an altname. It must also be less than nine characters long; if not, it is considered to be in error. If you are opening a file and the name is a quick access name (for example, b.out), the file accessor first searches the AFT for a file of that name. If such a file is not found, the file accessor searches for a file of that name in the current catalog. If the specified file name is not of the quick access form, it is assumed to be the name of a permanent file.
If the search fails and the file is being opened for write or append, OPEN tries to create the file. If the file name is of the quick access form, OPEN tries to create a temporary file; otherwise, it tries to create a permanent file.
If the file was already in the AFT when accessed or if the file is a temporary file created during the access, the file is left in the AFT when the unit is closed. If the file was not in the AFT initially and the file is permanent, it is removed from the AFT when the unit is closed. You may override these defaults by including appropriate characters in the action string. The character 't' (for transient) forces a file to be removed from the AFT when it is closed, whether or not it was in the AFT to begin with; and the character 'k' (for keep) keeps the file in the AFT even if it would normally be removed.
When you are through with a unit, you should close it explicitly by calling the CLOSE function. This has the form:
close( unit );
For sequential stream output u