reading notes

个人资料

tisini

博客访问：

Pointers

(2005-01-23 12:46:22) 下一个

Section 3.3 Pointers

We have already seen how variables are memory cells that we can access by an identifier. But these variables are stored in concrete places of the computer memory. For our programs, the computer memory is only a succession of 1 byte cells (the minimum size for a datum), each one with a unique address.

A good simile for the computer memory can be a street in a city. On a street all houses are consecutively numbered with an unique identifier so if we talk about 27th of Sesame Street we will be able to find that place without trouble, since there must be only one house with that number and, in addition, we know that the house will be between houses 26 and 28.

In the same way in which houses in a street are numbered, the operating system organizes the memory with unique and consecutive numbers, so if we talk about location 1776 in the memory, we know that there is only one location with that address and also that is between addresses 1775 and 1777.

Address (dereference) operator (`&`).

At the moment in which we declare a variable it must be stored in a concrete location in this succession of cells (the memory). We generally do not decide where the variable is to be placed - fortunately that is something automatically done by the compiler and the operating system at runtime, but once the operating system has assigned an address there are some cases in which we may be interested in knowing where the variable is stored.

This can be done by preceding the variable identifier by an ampersand sign (&), which literally means "address of". For example:

ted = &andy;

would assign to variable ted the address of variable andy, since when preceding the name of the variable andy with the ampersand (&) character we are no longer talking about the content of the variable, but about its address in memory.

We are going to suppose that andy has been placed in the memory address 1776 and that we write the following:

andy = 25; fred = andy; ted = &andy;

the result is shown in the following diagram:

We have assigned to fred the content of variable andy as we have done in many other occasions in previous sections of this tutorial, but to ted we have assigned the address in memory where the operating system stores the value of andy, that we have imagined was 1776 (it can be any address, I have just invented this one). The reason is that in the allocation of ted we have preceded andy with an ampersand (&) character.

The variable that stores the address of another variable (like ted in the previous example) is what we call a pointer. In C++ pointers have certain virtues and they are used very often. Farther ahead we will see how this type of variable is declared.

Reference operator (`*`)

Using a pointer we can directly access the value stored in the variable pointed by it just by preceding the pointer identifier with the reference operator asterisk (*), that can be literally translated to "value pointed by". Therefore, following with the values of the previous example, if we write:

beth = *ted;

(that we could read as: "beth equal to value pointed by ted") beth would take the value 25, since ted is 1776, and the value pointed by 1776 is 25.

You must clearly differenciate that ted stores 1776, but *ted (with an asterisk * before) refers to the value stored in the address 1776, that is 25. Notice the difference of including or not including the reference asterisk (I have included an explanatory commentary of how each expression could be read):

beth = ted; // beth equal to ted ( 1776 )beth = *ted; // beth equal to value pointed by ted ( 25 )

Operator of address or dereference (&)
It is used as a variable prefix and can be translated as "address of", thus: &variable1 can be read as "address of variable1"

Operator of reference (*)
It indicates that what has to be evaluated is the content pointed by the expression considered as an address. It can be translated by "value pointed by".
* mypointer can be read as "value pointed by mypointer".

At this point, and following with the same example initiated above where:

andy = 25; ted = &andy;

you should be able to clearly see that all the following expressions are true:

andy == 25&andy == 1776ted == 1776*ted == 25

The first expression is quite clear considering that its assignation was andy=25;. The second one uses the address (or derefence) operator (&) that returns the address of the variable andy, that we imagined to be 1776. The third one is quite obvious since the second was true and the assignation of ted was ted = &andy;. The fourth expression uses the reference operator (*) that, as we have just seen, is equivalent to the value contained in the address pointed by ted, that is 25.

So, after all that, you may also infer that while the address pointed by ted remains unchanged the following expression will also be true:

*ted == andy

Declaring variables of type pointer

Due to the ability of a pointer to directly reference the value that it point to, it becomes necessary to specify which data type a pointer points to when declaring it. It is not the same to point to a char as it is to point to an int or a float type.

Therefore, the declaration of pointers follows this form:

type * pointer_name;

where type is the type of data pointed, not the type of the pointer itself. For example:

int * number; char * character; float * greatnumber;

they are three declarations of pointers. Each one points to a different data type, but the three are pointers and in fact the three occupy the same amount of space in memory (the size of a pointer depends on the operating system), but the data to which they point do not occupy the same amount of space nor are of the same type, one is int, another one is char and the other one float.

I emphasize that the asterisk (*) that we use when declaring a pointer means only that it is a pointer, and should not be confused with the reference operator that we have seen a bit earlier which is also written with an asterisk (*). They are simply two different tasks represented with the same sign.

// my first pointer#include <iostream.h>int main (){ int value1 = 5, value2 = 15; int * mypointer; mypointer = &value1; *mypointer = 10; mypointer = &value2; *mypointer = 20; cout << "value1==" << value1 << "/ value2==" << value2; return 0;}

value1==10 / value2==20

Notice how the values of value1 and value2 have changed indirectly. First we have assigned to mypointer the address of value1 using the deference ampersand sign (&). Then we have assigned 10 to the value pointed by mypointer, which is pointing to the address of value1, so we have modified value1 indirectly.

In order that you can see that a pointer may take several different values during the same program we have repeated the process with value2 and the same pointer.

Here is an example a bit more complicated:

// more pointers#include <iostream.h>int main (){ int value1 = 5, value2 = 15; int *p1, *p2; p1 = &value1; // p1 = address of value1 p2 = &value2; // p2 = address of value2 *p1 = 10; // value pointed by p1 = 10 *p2 = *p1; // value pointed by p2 = value pointed by p1 p1 = p2; // p1 = p2 (value of pointer copied) *p1 = 20; // value pointed by p1 = 20 cout << "value1==" << value1 << "/ value2==" << value2; return 0;}

value1==10 / value2==20

I have included as comments on each line how the code can be read: ampersand (&) as "address of" and asterisk (*) as "value pointed by". Notice that there are expressions with pointers p1 and p2 with and without the asterisk. The meaning of using or not using a reference asterisk is very different: An asterisk (*) followed by the pointer refers to the place pointed by the pointer, whereas a pointer without an asterisk (*) refers to the value of the pointer itself, that is, the address of where it is pointing.

Another thing that can call your attention is the line:

int *p1, *p2;

that declares the two pointers of the previous example putting an asterisk (*) for each pointer. The reason is that the type for all the declarations of the same line is int (and not int*). The explanation is because of the level of precedence of the reference operator asterisk (*) that is the same as the declaration of types, therefore, because they are associative operators from the right, the asterisks are evaluated first than the type. We have talked about this in section 1.3: Operators, although it is enough that you know clearly that -unless you include parenthesis- you will have to put an asterisk (*) before each pointer that you declare.

Pointers and arrays

The concept of array is very much bound to the one of pointer. In fact, the identifier of an array is equivalent to the address of its first element, like a pointer is equivalent to the address of the first element that it points to, so in fact they are the same thing. For example, supposing these two declarations:

int numbers [20]; int * p;

the following allocation would be valid:

p = numbers;

At this point p and numbers are equivalent and they have the same properties, the only difference is that we could assign another value to the pointer p whereas numbers will always point to the first of the 20 integer numbers of type int with which it was defined. So, unlike p, that is an ordinary variable pointer, numbers is a constant pointer (indeed an array name is a constant pointer). Therefore, although the previous expression was valid, the following allocation is not:

numbers = p;

because numbers is an array (constant pointer), and no values can be assigned to constant identifiers.

Due to the character of variables all the expressions that include pointers in the following example are perfectly valid:

// more pointers#include <iostream.h>int main (){ int numbers[5]; int * p; p = numbers; *p = 10; p++; *p = 20; p = &numbers[2]; *p = 30; p = numbers + 3; *p = 40; p = numbers; *(p+4) = 50; for (int n=0; n<5; n++) cout << numbers[n] << ", "; return 0;}

10, 20, 30, 40, 50,

In chapter "Arrays" we used bracket signs [] several times in order to specify the index of the element of the Array to which we wanted to refer. Well, the bracket signs operator [] are known as offset operators and they are equivalent to adding the number within brackets to the address of a pointer. For example, both following expressions:

a[5] = 0; // a [offset of 5] = 0*(a+5) = 0; // pointed by (a+5) = 0

are equivalent and valid either if a is a pointer or if it is an array.

Pointer initialization

When declaring pointers we may want to explicitly specify to which variable we want them to point,

int number; int *tommy = &number;

this is equivalent to:

int number; int *tommy; tommy = &number;

When a pointer assignation takes place we are always assigning the address where it points to, never the value pointed. You must consider that at the moment of declaring a pointer, the asterisk (*) indicates only that it is a pointer, it in no case indicates the reference operator (*). Remember, they are two different operators, although they are written with the same sign. Thus, we must take care not to confuse the previous with:

int number; int *tommy; *tommy = &number;

that anyway would not have much sense in this case.

As in the case of arrays, the compiler allows the special case that we want to initialize the content at which the pointer points with constants at the same moment as declaring the variable pointer:

char * terry = "hello";

in this case static storage is reserved for containing "hello" and a pointer to the first char of this memory block (that corresponds to 'h') is assigned to terry. If we imagine that "hello" is stored at addresses 1702 and following, the previous declaration could be outlined thus:

it is important to indicate that terry contains the value 1702 and not 'h' nor "hello", although 1702 points to these characters.

The pointer terry points to a string of characters and can be used exactly as if it was an Array (remember that an array is just a constant pointer). For example, if our temper changed and we wanted to replace the 'o' by a '!' sign in the content pointed by terry, we could do it by any of the following two ways:

terry[4] = '!'; *(terry+4) = '!';

remember that to write terry[4] is just the same as to write *(terry+4), although the most usual expression is the first one. With either of those two expressions something like this would happen:

Arithmetic of pointers

To conduct arithmetical operations on pointers is a little different than to conduct them on other integer data types. To begin with, only addition and subtraction operations are allowed to be conducted, the others make no sense in the world of pointers. But both addition and subtraction have a different behavior with pointers according to the size of the data type to which they point.

When we saw the different data types that exist, we saw that some occupy more or less space than others in the memory. For example, in the case of integer numbers, char occupies 1 byte, short occupies 2 bytes and long occupies 4.

Let's suppose that we have 3 pointers:

char *mychar; short *myshort; long *mylong;

and that we know that they point to memory locations 1000, 2000 and 3000 respectively.

So if we write:

mychar++; myshort++; mylong++;

mychar, as you may expect, would contain the value 1001. Nevertheless, myshort would contain the value 2002, and mylong would contain 3004. The reason is that when adding 1 to a pointer we are making it to point to the following element of the same type with which it has been defined, and therefore the size in bytes of the type pointed is added to the pointer.

This is applicable both when adding and subtracting any number to a pointer. It would happen exactly the same if we write:

mychar = mychar + 1; myshort = myshort + 1; mylong = mylong + 1;

It is important to warn you that both increase (++) and decrease (--) operators have a greater priority than the reference operator asterisk (*), therefore the following expressions may lead to confussion:

*p++; *p++ = *q++;

The first one is equivalent to *(p++) and what it does is to increase p (the address where it points to - not the value that contains).
In the second, because both increase operators (++) are after the expressions to be evaluated and not before, first the value of *q is assigned to *p and then both q and p are increased by one. It is equivalent to:

*p = *q; p++; q++;

Like always, I recommend you use parenthesis () in order to avoid unexpected results.

Pointers to pointers

C++ allows the use of pointers that point to pointers, that these, in its turn, point to data. In order to do that we only need to add an asterisk (*) for each level of reference:

char a; char * b; char ** c; a = 'z'; b = &a; c = &b;

this, supposing the randomly chosen memory locations of 7230, 8092 and 10502, could be described thus:

(inside the cells there is the content of the variable; under the cells its location)

The new thing in this example is variable c, which we can talk about in three different ways, each one of them would correspond to a different value:

c is a variable of type (char **) with a value of 8092 *c is a variable of type (char*) with a value of 7230 **c is a variable of type (char) with a value of'z'

void pointers

The type of pointer void is a special type of pointer. void pointers can point to any data type, from an integer value or a float to a string of characters. Its sole limitation is that the pointed data cannot be referenced directly (we can not use reference asterisk * operator on them), since its length is always undetermined, and for that reason we will always have to resort to type casting or assignations to turn our void pointer to a pointer of a concrete data type to which we can refer.

One of its utilities may be for passing generic parameters to a function:

// integer increaser#include <iostream.h>void increase (void* data, int type){ switch (type) { case sizeof(char) : (*((char*)data))++; break; case sizeof(short): (*((short*)data))++; break; case sizeof(long) : (*((long*)data))++; break; }}int main (){ char a = 5; short b = 9; long c = 12; increase (&a,sizeof(a)); increase (&b,sizeof(b)); increase (&c,sizeof(c)); cout << (int) a << ", " << b << ", " << c; return 0;}

6, 10, 13

sizeof is an operator integrated in the C++ language that returns a constant value with the size in bytes of its parameter, so, for example, sizeof(char) is 1, because char type is 1 byte long.

Pointers to functions

C++ allows operations with pointers to functions. The greatest use of this is for passing a function as a parameter to another function, since these cannot be passed dereferenced. In order to declare a pointer to a function we must declare it like the prototype of the function except the name of the function is enclosed between parenthesis () and a pointer asterisk (*) is inserted before the name. It might not be a very handsome syntax, but that is how it is done in C++:

// pointer to functions#include <iostream.h>int addition (int a, int b){ return (a+b); }int subtraction (int a, int b){ return (a-b); }int (*minus)(int,int) = subtraction;int operation (int x, int y, int (*functocall)(int,int)){ int g; g = (*functocall)(x,y); return (g);}int main (){ int m,n; m = operation (7, 5, addition); n = operation (20, m, minus); cout <<n; return 0;}

8

In the example, minus is a global pointer to a function that has two parameters of type int, it is immediately assigned to point to the function subtraction, all in a single line: