Notes on C Language

C is the most commonly used basic high level computer language. The purpose of a high level language is to allow programs to be written in a logical, algebraic form, more suitable for humans to use than assembly language.

The function of a compiler is to translate the C program into assembly language for a particular processor, and then ultimately to a binary executable program that can be run.

In Linux this process of translation into machine code is done by either gcc or g++. The latter can handle both a more modern (C99) version of C, adn C++ as well. There are actually several distinct steps, at each step, an intermediate file can be produced.

  1. The preprocessor handles both #include and #define directives, producing a complete C file.
  2. The compiler turns the C language into assembly language. (*.s)
  3. The assembler produces a binary object file (*.o)
  4. The linker combines the object file(s) with system libraries, to create the executable file (x-bit set, usually no "extension" in name. The default name is a.out).
Things can go wrong, particularly at compile and link.

Preprocessor directives

#define
defines a constant, by associating a value with it. The preprocessor replaces all occurrances of the constant by its value, this is simply a text substitution.
By convention, constant names are usually written in UPPERCASE.
#include
about includes a file, usually a header file, which is expected to contain more defines, and "prototypes," which give the compiler information about the names and arguments expected by functions contained in libraries (or other .c files).

Some examples.

#define		SIZE	24
#define QUOTE "Neither a borrower nor a lender be."

#include <stdio.h> // standard input/output library functions
#include "myother.h" // header file in user's same directory

Some useful header files:

man pages - getting information

You can get information about any linux command, c function, or header file with the command:  man
for example,

The main program

The main program should be type int, and return an int, 0 means normal termination. The command line invoking a program may have command line arguments, which are separated by spaces. The arguments are passed (by the operating system) to main as argc: the number of arguments, and argv, an array of strings.
argv[0] is the name of the program, and argc is at least 1. So a basic C program (which does nothing, successfully) is:

int main (int argc, char * argv[]) {
return 0;
}

This program writes its arguments on separate lines:

#include <stdio.h>

int main (int argc, char * argv[]) {

int i;
for (i=0; i<argc; i++) // for loop goes through array
printf ("%d: %s\n", i, argv[i]);
return 0; // normal termination
}

[jensen@lindgren asm]$ gcc example1.c # compile the program
[jensen@lindgren asm]$ ./a.out war is peace # run the program
0: ./a.out
1: war
2: is
3: peace
[jensen@lindgren asm]$

Types

Every variable has a type. These include

Arrays, and loops

An array is successive elements all of the same type. In C, the name of an array denotes the address of its first element. So, if you declare  int arr[10]; then arr is its address, considered a constant, arr[0] is the contents of the first element, and arr[9] the contents of the last. You will have to remember the size you declared.

strings

A string is a special case of an array, in C it is an array of char, which is terminated by a NULL byte, '\0', as this is not a character that can be produced by a keyboard. (Thus you don't have to store its length separately, but it is also error prone as one can easily construct a string that is longer than the space you allocated, causing a "buffer overflow." -- In C++, the type string has a more complex structure, designed to avoid a number of such problems.

int arr[10];		// space for 10 ints
char prompt[] = "please enter an integer: "; // enough space for all the characters
prompt[0] = 'P'; // change first letter to upper
int i;

// a while loop to store i-squared in each element of arr
i = 0; // initialize i
while (i<10) { // more to do?
arr[i] = i*i; // store i-squared
i++; // add 1 to i
}
// the for loop initializes; tests; increments just before repeating
// the statements about i are collected together
for (i=0; i<10; i++)
arr[i] = i*i;

pointers  *  &    see also notes on loops and pointers

A pointer is a variable that holds an address. They are generally declared to be pointers to a certain type. There are 2 ways to assign a value to a pointer. Recall that an array name is the address of the array. For a simple variable, the & operator gives its address. Since pointers are themselves variables, they can change, by incrementing, for example. The pointer is dereferenced by putting a * in front of it, to refer to the value pointed to. Since you are studying assembly language, I will comment the example by giving the equivalent mips instructions:  (spaces around the * are optional)

int * ip = &i;		// la  $t0, i
*ip; // lw $t1,($t0) # the value 10 (after the loop)
int * follow = arr; // la $t2, arr
*ip = *follow++; // lw $t3, ($t2)
// add $t2,$t2,4 # post-increment pointer by sizeof(int)
// sw $t3, ($t0) # i = arr[0]

Functions

Functions are the basic modular units of a C program. (main is simply the top-level function, in fact.) A function is declared to return a value of a specified type, and to have arguments, of specified type, and name. The names have the status of local variables. When a function is called, values are passed to these local variables, which may change within the function without affecting the caller's variables.

If the value passed is a pointer (or address using &) the memory pointed to may be changed. Putting const in front of the declaration declares that this will not in fact happen, and instructs the compiler to prevent it.

function prototypes

The compiler needs to know, for each function you call, its name, and the type of its arguments. This takes the form of the function definition line ending with a semi-colon, but without the function body. However, giving argument names is unnecessary, although names may be given to suggest what the arguments are for.  man pages (and header files, but don't look at them ) include such prototypes, such as:

char *strncpy(char *dest, const char *src, size_t n);	//copy at most n chars from src to dest
int mysum (int a, int, int); // returns sum of its 3 arguments
int sumarray(const int* array, int n); // sum first n elements of array

In strncpy, the string (pointed to by) dest may be changed, but src will not be changed.

Functions you write, or their prototypes, should preceed the main program. The full definition may follow.

In the above, note that int* array is the same as int array[], also that array[i] is the same as *(array+i) -- add i words to the address, and dereference.
In the function definition, we can calculate the sum using array notation or pointer notation, which is probably faster but more cryptic. (use one or theother, but not both.) In the function, array is a pointer variable, so it can be incremented.
int  sumarray(const int array[], int n)		// array syntax
{	int sum = 0;	// local variable, must initialize!
int i;
for (i=0; i<n; i++)
sum += array[i];
return sum;
}
int sumarray(const int* array, int n) // pointer syntax
{ int sum=0;
while (n--) // until n iterations completed
sum += *array++; // sucessive array values
return sum;
}

Recursion

There is no restriction on a function calling itself, in fact this is a powerful tool. The definition of factorial, for example, is defined recursively. Later in this course, we shall see how function calls can be implememted, to allow functions to call other functions, including recursively. (In the early days, FORTRAN explicitly forbade recursion, the compilers could not handle it.)

double factorial (int n)	// double, because they get very big
{ if (n<2) return 1.0; // 0! = 1! = 1
return n * factorial(n-1); // n! = n*(n-1)!
}

Notes for cs 216, or Lin Jensen,