Linked Lists: Understanding Pointer Manipulation and Building Linked List Algorithms, Study notes of Data Structures and Algorithms

An introduction to linked lists, explaining why they are used and how they differ from arrays. It covers the basics of linked list structures, including pointer manipulation and linked list algorithms. Examples in c syntax and explanations that avoid c specifics. It also discusses common features of linked lists, such as pointer assignment and iterating over lists, and provides code examples for building linked lists and changing pointers.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-ftn-1
koofers-user-ftn-1 🇺🇸

8 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Linked List
Basics
By Nick Parlante Copyright © 1998-99, Nick Parlante
Abstract
This document introduces the basic structures and techniques for building linked lists
with a mixture of explanations, drawings, sample code, and exercises. The material is
useful if you want to understand linked lists or if you want to see a realistic, applied
example of pointer-intensive code. A separate document, Linked List Problems
(http://cslibrary.stanford.edu/105/), presents 18 practice problems covering a wide range
of difficulty.
Linked lists are useful to study for two reasons. Most obviously, linked lists are a data
structure which you may want to use in real programs. Seeing the strengths and
weaknesses of linked lists will give you an appreciation of the some of the time, space,
and code issues which are useful to thinking about any data structures in general.
Somewhat less obviously, linked lists are excellent advanced practice area for any
programmer. Linked list code has a real complexity in its algorithms, code, and memory
use. Introductory "made up" problems never quite push you the way a nice linked list
problem can. Linked list code is a classic area for any programmer to explore and practice
their pointer code and memory skills, even if they never plan to use a linked list.
Audience
The article assumes a basic understanding of programming and pointers. The article uses
C syntax for its examples where necessary, but the explanations avoid C specifics as
much as possible — really the discussion is oriented towards the important concepts of
pointer manipulation and linked list algorithms.
Other Resources
Link List Problems (http://cslibrary.stanford.edu/105/) Lots of linked
list problems, with explanations, answers, and drawings. The "problems"
article is a companion to this "explanation" article.
Pointers and Memory (http://cslibrary.stanford.edu/102/) Explains all
about how pointers and memory work. You need some understanding of
pointers and memory before you can understand linked lists.
Essential C (http://cslibrary.stanford.edu/101/) Explains all the basic
features of the C programming language.
This is document #103, Linked List Basics, in the CS Education Library at Stanford. This
and other free educational materials are available at http://cslibrary.stanford.edu/. This
document is free to be used, reproduced, or retransmitted so long as this notice is clearly
reproduced at its beginning.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Linked Lists: Understanding Pointer Manipulation and Building Linked List Algorithms and more Study notes Data Structures and Algorithms in PDF only on Docsity!

Linked List

Basics

By Nick Parlante Copyright © 1998-99, Nick Parlante

Abstract This document introduces the basic structures and techniques for building linked lists with a mixture of explanations, drawings, sample code, and exercises. The material is useful if you want to understand linked lists or if you want to see a realistic, applied example of pointer-intensive code. A separate document, Linked List Problems (http://cslibrary.stanford.edu/105/), presents 18 practice problems covering a wide range of difficulty.

Linked lists are useful to study for two reasons. Most obviously, linked lists are a data structure which you may want to use in real programs. Seeing the strengths and weaknesses of linked lists will give you an appreciation of the some of the time, space, and code issues which are useful to thinking about any data structures in general.

Somewhat less obviously, linked lists are excellent advanced practice area for any programmer. Linked list code has a real complexity in its algorithms, code, and memory use. Introductory "made up" problems never quite push you the way a nice linked list problem can. Linked list code is a classic area for any programmer to explore and practice their pointer code and memory skills, even if they never plan to use a linked list.

Audience The article assumes a basic understanding of programming and pointers. The article uses C syntax for its examples where necessary, but the explanations avoid C specifics as much as possible — really the discussion is oriented towards the important concepts of pointer manipulation and linked list algorithms.

Other Resources

  • Link List Problems (http://cslibrary.stanford.edu/105/) Lots of linked list problems, with explanations, answers, and drawings. The "problems" article is a companion to this "explanation" article.
  • Pointers and Memory (http://cslibrary.stanford.edu/102/) Explains all about how pointers and memory work. You need some understanding of pointers and memory before you can understand linked lists.
  • Essential C (http://cslibrary.stanford.edu/101/) Explains all the basic features of the C programming language.

This is document #103, Linked List Basics, in the CS Education Library at Stanford. This and other free educational materials are available at http://cslibrary.stanford.edu/. This document is free to be used, reproduced, or retransmitted so long as this notice is clearly reproduced at its beginning.

Contents Section 1 — Basic List Structures and Code 2 Section 2 — Basic List Building 11 Section 3 — Linked List Code Techniques 17 Section 3 — Code Examples 22

Edition This is the second major version of this document —Jan 17, 1999. The author may be reached at [email protected]. The CS Education Library may be reached at [email protected].

Dedication This document is distributed for the benefit and education of all. That a person seeking knowledge should have the opportunity to find it. May you learn from it in the spirit in which it is given — to make efficiency and beauty in your designs, peace and fairness in your actions.

Section 1 —

Linked List Basics

Why Linked Lists? Linked lists and arrays are similar since they both store collections of data. The terminology is that arrays and linked lists store "elements" on behalf of "client" code. The specific type of element is not important since essentially the same structure works to store elements of any type. One way to think about linked lists is to look at how arrays work and think about alternate approaches.

Array Review Arrays are probably the most common data structure used to store collections of elements. In most languages, arrays are convenient to declare and the provide the handy [ ] syntax to access any element by its index number. The following example shows some typical array code and a drawing of how the array might look in memory. The code allocates an array int scores[100], sets the first three elements set to contain the numbers 1, 2, 3 and leaves the rest of the array uninitialized...

void ArrayTest() { int scores[100];

// operate on the elements of the scores array... scores[0] = 1; scores[1] = 2; scores[2] = 3; }

- Pointer/Pointee A "pointer" stores a reference to another variable sometimes known as its "pointee". Alternately, a pointer may be set to the value NULL which encodes that it does not currently refer to a pointee. (In C and C++ the value NULL can be used as a boolean false). - Dereference The dereference operation on a pointer accesses its pointee. A pointer may only be dereferenced after it has been set to refer to a specific pointee. A pointer which does not have a pointee is "bad" (below) and should not be dereferenced. - Bad Pointer A pointer which does not have an assigned a pointee is "bad" and should not be dereferenced. In C and C++, a dereference on a bad sometimes crashes immediately at the dereference and sometimes randomly corrupts the memory of the running program, causing a crash or incorrect computation later. That sort of random bug is difficult to track down. In C and C++, all pointers start out with bad values , so it is easy to use bad pointer accidentally. Correct code sets each pointer to have a good value before using it. Accidentally using a pointer when it is bad is the most common bug in pointer code. In Java and other runtime oriented languages, pointers automatically start out with the NULL value, so dereferencing one is detected immediately. Java programs are much easier to debug for this reason. - Pointer assignment An assignment operation between two pointers like p=q; makes the two pointers point to the same pointee. It does not copy the pointee memory. After the assignment both pointers will point to the same pointee memory which is known as a "sharing" situation. - malloc() malloc() is a system function which allocates a block of memory in the "heap" and returns a pointer to the new block. The prototype for malloc() and other heap functions are in stdlib.h. The argument to malloc() is the integer size of the block in bytes. Unlike local ("stack") variables, heap memory is not automatically deallocated when the creating function exits. malloc() returns NULL if it cannot fulfill the request. (extra for experts) You may check for the NULL case with assert() if you wish just to be safe. Most modern programming systems will throw an exception or do some other automatic error handling in their memory allocator, so it is becoming less common that source code needs to explicitly check for allocation failures. - free() free() is the opposite of malloc(). Call free() on a block of heap memory to indicate to the system that you are done with it. The argument to free() is a pointer to a block of memory in the heap — a pointer which some time earlier was obtained via a call to malloc().

What Linked Lists Look Like An array allocates memory for all its elements lumped together as one block of memory. In contrast, a linked list allocates space for each element separately in its own block of memory called a "linked list element" or "node". The list gets is overall structure by using pointers to connect all its nodes together like the links in a chain.

Each node contains two fields: a "data" field to store whatever element type the list holds for its client, and a "next" field which is a pointer used to link one node to the next node. Each node is allocated in the heap with a call to malloc(), so the node memory continues to exist until it is explicitly deallocated with a call to free(). The front of the list is a

pointer to the first node. Here is what a list containing the numbers 1, 2, and 3 might look like...

The Drawing Of List {1, 2, 3}

Stack Heap

A “head” pointer local to BuildOneTwoThree() keeps the whole list by storing a pointer to the first node.

Each node stores one data element (int in this example).

Each node stores one next pointer.

The overall list is built by connecting the nodes together by their next pointers. The nodes are all allocated in the heap.

The next field of the last node is NULL.

head

BuildOneTwoThree()

This drawing shows the list built in memory by the function BuildOneTwoThree() (the full source code for this function is below). The beginning of the linked list is stored in a "head" pointer which points to the first node. The first node contains a pointer to the second node. The second node contains a pointer to the third node, ... and so on. The last node in the list has its .next field set to NULL to mark the end of the list. Code can access any node in the list by starting at the head and following the .next pointers. Operations towards the front of the list are fast while operations which access node farther down the list take longer the further they are from the front. This "linear" cost to access a node is fundamentally more costly then the constant time [ ] access provided by arrays. In this respect, linked lists are definitely less efficient than arrays.

Drawings such as above are important for thinking about pointer code, so most of the examples in this article will associate code with its memory drawing to emphasize the habit. In this case the head pointer is an ordinary local pointer variable, so it is drawn separately on the left to show that it is in the stack. The list nodes are drawn on the right to show that they are allocated in the heap.

The Empty List — NULL The above is a list pointed to by head is described as being of "length three" since it is made of three nodes with the .next field of the last node set to NULL. There needs to be some representation of the empty list — the list with zero nodes. The most common representation chosen for the empty list is a NULL head pointer. The empty list case is the one common weird "boundary case" for linked list code. All of the code presented in this article works correctly for the empty list case, but that was not without some effort. When working on linked list code, it's a good habit to remember to check the empty list case to verify that it works too. Sometimes the empty list case works the same as all the cases, but sometimes it requires some special case code. No matter what, it's a good case to at least think about.

Length() Function

The Length() function takes a linked list and computes the number of elements in the list. Length() is a simple list function, but it demonstrates several concepts which will be used in later, more complex list functions...

/* Given a linked list head pointer, compute and return the number of nodes in the list. / int Length(struct node head) { struct node* current = head; int count = 0;

while (current != NULL) { count++; current = current->next; }

return count; }

There are two common features of linked lists demonstrated in Length()...

1) Pass The List By Passing The Head Pointer The linked list is passed in to Length() via a single head pointer. The pointer is copied from the caller into the "head" variable local to Length(). Copying this pointer does not duplicate the whole list. It only copies the pointer so that the caller and Length() both have pointers to the same list structure. This is the classic "sharing" feature of pointer code. Both the caller and length have copies of the head pointer, but they share the pointee node structure.

2) Iterate Over The List With A Local Pointer The code to iterate over all the elements is a very common idiom in linked list code....

struct node* current = head; while (current != NULL) { // do something with *current node

current = current->next; }

The hallmarks of this code are...

  1. The local pointer, current in this case, starts by pointing to the same node as the head pointer with current = head;. When the function exits, current is automatically deallocated since it is just an ordinary local, but the nodes in the heap remain.

  2. The while loop tests for the end of the list with (current != NULL). This test smoothly catches the empty list case — current will be NULL on the first iteration and the while loop will just exit before the first iteration.

  3. At the bottom of the while loop, current = current->next; advances the local pointer to the next node in the list. When there are no more links, this sets the pointer to NULL. If you have some linked list

code which goes into an infinite loop, often the problem is that step (3) has been forgotten.

Calling Length() Here's some typical code which calls Length(). It first calls BuildOneTwoThree() to make a list and store the head pointer in a local variable. It then calls Length() on the list and catches the int result in a local variable.

void LengthTest() { struct node* myList = BuildOneTwoThree();

int len = Length(myList); // results in len == 3 }

Memory Drawings The best way to design and think about linked list code is to use a drawing to see how the pointer operations are setting up memory. There are drawings below of the state of memory before and during the call to Length() — take this opportunity to practice looking at memory drawings and using them to think about pointer intensive code. You will be able to understand many of the later, more complex functions only by making memory drawings like this on your own.

Start with the Length() and LengthTest() code and a blank sheet of paper. Trace through the execution of the code and update your drawing to show the state of memory at each step. Memory drawings should distinguish heap memory from local stack memory. Reminder: malloc() allocates memory in the heap which is only be deallocated by deliberate calls to free(). In contrast, local stack variables for each function are automatically allocated when the function starts and deallocated when it exits. Our memory drawings show the caller local stack variables above the callee, but any convention is fine so long as you realize that the caller and callee are separate. (See cslibrary.stanford.edu/102/, Pointers and Memory, for an explanation of how local memory works.)

Drawing 2: Mid Length Here is the state of memory midway through the execution of Length(). Length()'s local variables head and current have been automatically allocated. The current pointer started out pointing to the first node, and then the first iteration of the while loop advanced it to point to the second node.

Stack Heap

myList

LengthTest()

head

Length()

current

len -

Notice how the local variables in Length() (head and current) are separate from the local variables in LengthTest() (myList and len). The local variables head and current will be deallocated (deleted) automatically when Length() exits. This is fine — the heap allocated links will remain even though stack allocated pointers which were pointing to them have been deleted.

Exercise Q: What if we said head = NULL; at the end of Length() — would that mess up the myList variable in the caller? A: No. head is a local which was initialized with a copy of the actual parameter, but changes do not automatically trace back to the actual parameter. Changes to the local variables in one function do not affect the locals of another function.

Exercise Q: What if the passed in list contains no elements, does Length() handle that case properly? A: Yes. The representation of the empty list is a NULL head pointer. Trace Length() on that case to see how it handles it.

Section 2 —

List Building

BuildOneTwoThree() is a fine as example of pointer manipulation code, but it's not a general mechanism to build lists. The best solution will be an independent function which adds a single new node to any list. We can then call that function as many times as we want to build up any list. Before getting into the specific code, we can identify the classic 3-Step Link In operation which adds a single node to the front of a linked list. The 3 steps are...

1) Allocate Allocate the new node in the heap and set its .data to whatever needs to be stored. struct node* newNode; newNode = malloc(sizeof(struct node)); newNode->data = data_client_wants_stored;

2) Link Next Set the .next pointer of the new node to point to the current first node of the list. This is actually just a pointer assignment — remember: "assigning one pointer to another makes them point to the same thing." newNode->next = head;

3) Link Head Change the head pointer to point to the new node, so it is now the first node in the list. head = newNode;

3-Step Link In Code The simple LinkTest() function demonstrates the 3-Step Link In...

void LinkTest() { struct node* head = BuildTwoThree(); // suppose this builds the {2, 3} list struct node* newNode;

newNode= malloc(sizeof(struct node)); // allocate newNode->data = 1;

newNode->next = head; // link next

head = newNode; // link head

// now head points to the list {1, 2, 3} }

WrongPush() is very close to being correct. It takes the correct 3-Step Link In and puts it an almost correct context. The problem is all in the very last line where the 3-Step Link In dictates that we change the head pointer to refer to the new node. What does the line head = newNode; do in WrongPush()? It sets a head pointer, but not the right one. It sets the variable named head local to WrongPush(). It does not in any way change the variable named head we really cared about which is back in the caller WrontPushTest().

Exercise Make the memory drawing tracing WrongPushTest() to see how it does not work. The key is that the line head = newElem; changes the head local to WrongPush() not the head back in WrongPushTest(). Remember that the local variables for WrongPush() and WrongPushTest() are separate (just like the locals for LengthTest() and Length() in the Length() example above).

Reference Parameters In C We are bumping into a basic "feature" of the C language that changes to local parameters are never reflected back in the caller's memory. This is a traditional tricky area of C programming. We will present the traditional "reference parameter" solution to this problem, but you may want to consult another C resource for further information. (See Pointers and Memory (http://cslibrary.stanford.edu/102/) for a detailed explanation of reference parameters in C and C++.)

We need Push() to be able to change some of the caller's memory — namely the head variable. The traditional method to allow a function to change its caller's memory is to pass a pointer to the caller's memory instead of a copy. So in C, to change an int in the caller, pass a int* instead. To change a struct fraction, pass a struct fraction* intead. To change an X, pass an X. So in this case, the value we want to change is struct node, so we pass a struct node** instead. The two stars () are a little scary, but really it's just a straight application of the rule. It just happens that the value we want to change already has one star (), so the parameter to change it has two (*). Or put another way: the type of the head pointer is "pointer to a struct node." In order to change that pointer, we need to pass a pointer to it, which will be a "pointer to a pointer to a struct node".

Instead of defining WrongPush(struct node* head, int data); we define Push(struct node** headRef, int data);. The first form passes a copy of the head pointer. The second, correct form passes a pointer to the head pointer. The rule is: to modify caller memory, pass a pointer to that memory. The parameter has the word "ref" in it as a reminder that this is a "reference" (struct node*) pointer to the head pointer instead of an ordinary (struct node) copy of the head pointer.

Correct Push() Code Here are Push() and PushTest() written correctly. The list is passed via a pointer to the head pointer. In the code, this amounts to use of '&' on the parameter in the caller and use of '*' on the parameter in the callee. Inside Push(), the pointer to the head pointer is named "headRef" instead of just "head" as a reminder that it is not just a simple head pointer..

/* Takes a list and a data value. Creates a new link with the given data and pushes it onto the front of the list. The list is not passed in by its head pointer. Instead the list is passed in as a "reference" pointer to the head pointer -- this allows us to modify the caller's memory. / void Push(struct node* headRef, int data) { struct node* newNode = malloc(sizeof(struct node));

newNode->data = data; newNode->next = headRef; // The '' to dereferences back to the real head *headRef = newNode; // ditto }

void PushTest() { struct node* head = BuildTwoThree();// suppose this returns the list {2, 3}

Push(&head, 1); // note the & Push(&head, 13);

// head is now the list {13, 1, 2, 3} }

they appear in the source, which is the most convenient for the programmer. So In C++, Push() and PushTest() look like...

/* Push() in C++ -- we just add a '&' to the right hand side of the head parameter type, and the compiler makes that parameter work by reference. So this code changes the caller's memory, but no extra uses of '' are necessary -- we just access "head" directly, and the compiler makes that change reference back to the caller. / void Push(struct node& head, int data) { struct node newNode = malloc(sizeof(struct node));

newNode->data = data; newNode->next = head; // No extra use of * necessary on head -- the compiler head = newNode; // just takes care of it behind the scenes. }

void PushTest() { struct node* head = BuildTwoThree();// suppose this returns the list {2, 3}

Push(head, 1); // No extra use & necessary -- the compiler takes Push(head, 13); // care of it here too. Head is being changed by // these calls.

// head is now the list {13, 1, 2, 3} }

The memory drawing for the C++ case looks the same as for the C case. The difference is that the C case, the *'s need to be taken care of in the code. In the C++ case, it's handled invisibly in the code.

Section 3 —

Code Techniques

This section summarizes, in list form, the main techniques for linked list code. These techniques are all demonstrated in the examples in the next section.

1) Iterate Down a List A very frequent technique in linked list code is to iterate a pointer over all the nodes in a list. Traditionally, this is written as a while loop. The head pointer is copied into a local variable current which then iterates down the list. Test for the end of the list with current!=NULL. Advance the pointer with current=current->next.

// Return the number of nodes in a list (while-loop version) int Length(struct node* head) { int count = 0; struct node* current = head;

while (current != NULL) { count++; current = current->next }

return(count); }

Alternately, some people prefer to write the loop as a for which makes the initialization, test, and pointer advance more centralized, and so harder to omit...

for (current = head; current != NULL; current = current->next) {

2) Changing a Pointer With A Reference Pointer Many list functions need to change the caller's head pointer. To do this in the C language, pass a pointer to the head pointer. Such a pointer to a pointer is sometimes called a "reference pointer". The main steps for this technique are...

  • Design the function to take a pointer to the head pointer. This is the standard technique in C — pass a pointer to the "value of interest" that needs to be changed. To change a struct node, pass a struct node*.
  • Use '&' in the caller to compute and pass a pointer to the value of interest.
  • Use '*' on the parameter in the callee function to access and change the value of interest.

The following simple function sets a head pointer to NULL by using a reference parameter....

// Change the passed in head pointer to be NULL // Uses a reference pointer to access the caller's memory void ChangeToNull(struct node** headRef) { // Takes a pointer to // the value of interest

from NULL to point to the new node, such as the tail variable in the following example of adding a "3" node to the end of the list {1, 2}...

Stack Heap

head^1

tail (^3)

newNode

This is just a special case of the general rule: to insert or delete a node inside a list, you need a pointer to the node just before that position, so you can change its .next field. Many list problems include the sub-problem of advancing a pointer to the node before the point of insertion or deletion. The one exception is if the node is the first in the list — in that case the head pointer itself must be changed. The following examples show the various ways code can handle the single head case and all the interior cases...

5) Build — Special Case + Tail Pointer Consider the problem of building up the list {1, 2, 3, 4, 5} by appending the nodes to the tail end. The difficulty is that the very first node must be added at the head pointer, but all the other nodes are inserted after the last node using a tail pointer. The simplest way to deal with both cases is to just have two separate cases in the code. Special case code first adds the head node {1}. Then there is a separate loop that uses a tail pointer to add all the other nodes. The tail pointer is kept pointing at the last node, and each new node is added at tail->next. The only "problem" with this solution is that writing separate special case code for the first node is a little unsatisfying. Nonetheless, this approach is a solid one for production code — it is simple and runs fast.

struct node* BuildWithSpecialCase() { struct node* head = NULL; struct node* tail; int i;

// Deal with the head node here, and set the tail pointer Push(&head, 1); tail = head;

// Do all the other nodes using 'tail' for (i=2; i<6; i++) { Push(&(tail->next), i); // add node at tail->next tail = tail->next; // advance tail to point to last node }

return(head); // head == {1, 2, 3, 4, 5}; }

6) Build — Dummy Node Another solution is to use a temporary dummy node at the head of the list during the computation. The trick is that with the dummy, every node appear to be added after the .next field of a node. That way the code for the first node is the same as for the other nodes. The tail pointer plays the same role as in the previous example. The difference is that it now also handles the first node.

struct node* BuildWithDummyNode() { struct node dummy; // Dummy node is temporarily the first node struct node* tail = &dummy; // Start the tail at the dummy. // Build the list on dummy.next (aka tail->next) int i;

for (i=1; i<6; i++) { Push(&(tail->next), i); tail = tail->next; }

// The real result list is now in dummy.next // dummy.next == {1, 2, 3, 4, 5}; return(dummy.next); }

Some linked list implementations keep the dummy node as a permanent part of the list. For this "permanent dummy" strategy, the empty list is not represented by a NULL pointer. Instead, every list has a dummy node at its head. Algorithms skip over the dummy node for all operations. That way the dummy node is always present to provide the above sort of convenience in the code. Some of the solutions presented in this document will use the temporary dummy strategy. The code for the permanent dummy strategy is extremely similar, but is not shown.

7) Build — Local References Finally, here is a tricky way to unifying all the node cases without using a dummy node. The trick is to use a local "reference pointer" which always points to the last pointer in the list instead of to the last node. All additions to the list are made by following the reference pointer. The reference pointer starts off pointing to the head pointer. Later, it points to the .next field inside the last node in the list. (A detailed explanation follows.)

struct node* BuildWithLocalRef() { struct node* head = NULL; struct node** lastPtrRef= &head; // Start out pointing to the head pointer int i;

for (i=1; i<6; i++) { Push(lastPtrRef, i); // Add node at the last pointer in the list lastPtrRef= &((*lastPtrRef)->next); // Advance to point to the // new last pointer }

// head == {1, 2, 3, 4, 5}; return(head); }

This technique is short, but the inside of the loop is scary. This technique is rarely used. (Actually, I'm the only person I've known to promote it. I think it has a sort of compact charm.) Here's how it works...