Download Section Handout - Programming Paradigms - 10 and more Exercises Programming Paradigms in PDF only on Docsity!
CS107 Handout 10
Spring 2008 April 14, 2008
Section Handout
Problem 1: Meet The Flintstones
Consider the following C-style struct definitions:
typedef struct rubble { // need tag name for self-reference int betty; char barney[4]; struct rubble *bammbamm; } rubble;
typedef struct { short *wilma[2]; short fred[2]; rubble dino; } flintstone;
Accurately diagram what computer memory looks like after the following seven lines of
code have executed:
rubble *simpsons; flintstone jetsons[4];
simpsons = &jetsons[0].dino; jetsons[1].wilma[3] = (short *) &simpsons; strcpy(simpsons[2].barney, "Bugs Bunny"); ((flintstone *)(jetsons->fred))->dino.bammbamm = simpsons; *(char **)jetson[4].fred = simpsons->barney + 4;
Problem 2: Scheme
Scheme is a language whose primary built-in data structure is the linked list. Unlike any of
the lists you’ve dealt with in C, Scheme lists are fully heterogeneous—that is, the entries
needn’t all be the same type.
Some example lists are:
i. (2 3 5 7) ii. (House at Pooh Corner) iii. (Yankees 2 Diamondbacks 1) iv. (4 calling birds 3 French hens 2 turtle doves 1 partridge)
These linked lists are so flexible, individual elements might themselves be lists. If that’s the
case, then lists can be nested to any depth.
v. ((1 2) (buckle my shoe)) vi. (one (2 (three 4)) 5 six) vii. (how (nested (can (u (go)))) how (nested (can (u (go)))))
We can provide heterogeneous lists in C, but they don’t come easy. In order for them to
work, the individual elements of the list must carry their own type information. The idea is
to tag each list node with some enumerated type that tells us what the rest of the node
contains.
We’ll just pretend that integers and strings are the only atomic types of interest. The third
list above (if bound to the stack variable gameThree) would be structured as follows:
list
list
list
list
nil
str Yankees\
str Diamondbacks\
int
int
2
1
gameThree
The ConcatAll function takes a well-formed list and returns the ordered concatenation of all
of the list’s strings (including those in nested sublists.) Integers should just be skipped, and
shouldn’t contribute to the return value at all. Your implementation shouldn’t orphan any
memory.
- Traverses a properly structured list, and returns the ordered
- concatenation of all strings, including those in nested sublists.
- When applied to the two lists drawn above, the following strings
- would be returned:
- ConcatAll(gameThree) would return "YankeesDiamondbacks"
- ConcatAll(nestedNumbers) would return "onethreesix" */
typedef enum { Integer, String, List, Nil } nodeType;
char *ConcatAll(nodeType *list) {
CS107 Handout 10S
Spring 2008 April 15, 2008
Section Solution
Problem 1: Meet The Flintstones
typedef struct rubble { int betty; char barney[4]; struct rubble *bammbamm; } rubble;
typedef struct { short *wilma[2]; short fred[2]; rubble dino; } flintstone;
rubble *simpsons; flintstone jetsons[4];
simpsons = &jetsons[0].dino; jetsons[1].wilma[3] = (short *) &simpsons; strcpy(simpsons[2].barney, "Bugs Bunny"); ((flintstone *)(jetsons->fred))->dino.bammbamm = simpsons; *(char **)jetson[4].fred = simpsons->barney + 4;
simpsons
jetsons[0] jetsons[1] jetsons[2] jetsons[3]
'B' 'u' 'g' 's'
' ' 'B' 'u' 'n'
'n' 'y' 0
sortedset
CS107 Handout 14
Spring 2008 April 18, 2008
Section Handout
Handout written by Jerry.
The Generic Sorted Set
Binary search trees (BST) save the day whenever both search and insertion are high-
priority operations. BST structures have the advantage over both sorted arrays and
sorted linked lists in that insertion takes logarithmic time on the average, whereas
insertion into sorted lists and arrays generally takes linear time. For this problem, we
are going to implement a fully generic sortedset container using a packed binary search
tree as the underlying representation.
A more traditional binary search tree storing 7 client elements might look as follows:
Each node of the tree consists of
the binary representation of a
client element immediately followed by two addresses, one for the left subtree, and one
for the right subtree. Satisfying the binary search tree property mandates that:
be a strictly increasing sequence according to whatever function the client uses to
compare elements of this type.
The primary disadvantage here is that the heap gets fragmented with several—
potentially hundreds of thousands—of nodes. A more compact, efficient allocation
technique packs all of the above nodes into one contiguous block:
Each entry in this packed binary search tree represents a node. Notice that each entry is
an element followed by two integers. These two integers replace the two pointers of the
traditional BST node. In some ways, they function as links in the same way the pointers
do, because they store the indices of the left and right entries that would hang from that
element in the normal BST drawing. In the case of node 0, the presence of 1 and then 2
implies that node 1 is the root of the left subtree and node 2 is the root of the right
subtree. This pair-of-integers scheme is applied recursively, from root to leaf along
every path. The - 1 is the integer equivalent of NULL : To store - 1 is to state that no subtree
exists.
Notice the one integer to the left of node 0 in the picture. That's more or less the index of
the root of the tree. This value is always either - 1 or 0, depending on whether or not the
BST is empty. We could survive without this extra integer, but we more easily exploit
the recursive nature of the data structure when every single node, including the root, can
be identified in precisely the same way.
This week’s discussion section features the sortedset. You’ll spend the time completing
the sorted set declaration and implementation four functions. (Notice that we’re not
bothering with SetFree or SetRemove ; that’s just too much for one hour.)
typedef struct { // to be completed } sortedset;
void SetNew(sortedset set, int elemSize, int (cmpfn)(const void *, const void *)); bool SetAdd(sortedset *set, const void *elemPtr); void *SetSearch(sortedset *set, const void *elemPtr);
0 1 2 5 - 1 3 - 1 6 4 -^1 -^1
node 0 node 1 node 2 node 3 node 4 node
0 1 2 5 - 1 3 -^164
- 1 - 1 - 1 - 1 - 1 - (^1) perhaps allocated but unused
CS107 Handout 14S
Spring 2008 April 22, 2008
Section Solution
This sortedset thing should try to conform as much as possible to the pictures. There
should be a slab of memory to store kInitialCapacity nodes in addition to that
admittedly annoying int all the way to the left. We need to remember how much space
to allocate and how much of that allocated space is in use. We need to know how large
client elements are so we can get the node allocations right. We also need to store the
comparison function (since only it and not the sortedset implementation knows how
to compare the mystery bytes so the set functions can travel left or right accordingly.)
Once that is done, the SetNew implementation can more or less allocate everything and
initialize the thing to behave as an empty tree.
typedef struct { int root; // points to the offset index of the root. int logicalSize; // number of active client elements currently stored. int allocatedSize; // number of elements needed to saturate memory. int (cmp)(const void *, const void *); int elemSize; // client element size. } sortedset;
Clearly there is an array flavor to this implementation. In particular, we need to manage
an array (called root above) that identifies the location of the slate of bytes that store
the nodes and the integers. I could have typed this as a void * instead, but I chose int
* because the smallest entity that resides at this address is the integer that is initially -
but becomes 0 the moment that anything is added. Our doubling strategy will often
leave us with raw, unused memory, so we'll need to track how many nodes there are
and how many are being used. cmp is there so we remember how to maintain the BST
property, and elemSize is there so we know how much storage is wedged in between
the integer index pairs.
- Function: SetNew
- Usage: SetNew(&stringSet, sizeof(char *), StringPtrCompare);
- SetNew(&constellations, sizeof(pointT), DistanceCompare);
- SetNew allocates the requisite space needed to manage what
- will initially be an empty sorted set. More specifically, the
- routine allocates space to hold up to 'kInitialCapacity' (currently 4)
- client elements. */
#define NodeSize(clientElem) ((clientElem) + 2 * sizeof(int))
static const int kInitialCapacity = 4; void SetNew(sortedset set, int elemSize, int (cmpfn)(const void *, const void *)) { assert(elemSize > 0);
assert(cmpfn != NULL);
set->root = malloc(sizeof(int) + kInitialCapacity * NodeSize(elemSize)); assert(set->root);
*set->root = -1; // set it empty set->logicalSize = 0; // still empty set->allocatedSize = kInitialCapacity; set->cmp = cmpfn; set->elemSize = elemSize; }
The bottom line here is that we build an empty set structure that behaves like an empty
set. Here's the picture I'm hoping we both have in mind:
A good amount of code is shared by the SetSearch and the SetAdd functions. This
shared code is consolidated to the FindNode function.
Motivation: Spelled Out
Both lookup and insertion require the same type of binary search. In the search case, we
need to identify the index of the matching element, and in the insertion case, we need to
identify where in the BST the element could reside if it doesn't reside there already.
90% of SetSearch and SetAdd is identical, so FindNode is an attempt to consolidate
the shared code to one helper function.
typedef struct { char code[4]; char *city; } Airport;
int AirportCmp(const void *v1, const void *v2) { const Airport *ap1 = v1; const Airport *ap2 = v2;
return strcmp(ap1->code, ap2->code); }
sizeof(Airport)
2
0
set
Just to be clear about FindNode 's behavior, you'd benefit from seeing how FindNode
should operate on a tree storing double values:
Assume that the sortedset * variable set is initialized as above, and assume that the
following array of double s has been declared as well.
double testCases[] = { 5.245, 3.141, 6.001 }; int *returnValues[sizeof(testCases)/sizeof(testCases[0])]; int i;
for (i = 0; i < sizeof(testCases)/sizeof(testCases[0]); i++) returnValues[i] = FindNode(set, &testCases[i]);
At the end of snippet execution, you'd expect to see returnValues initialized as
follows:
int DoubleCmp(const void *v1, const void *v2) { const double *dp1 = v1; const double *dp2 = v2; double diff = *dp1 - *dp2;
if (diff > 0.0) return 1; if (diff < 0.0) return -1; return 0; }
sizeof(double)
6
4
0
set
5.245 3 1 8.214 2 - 1 6.001 - 1 - 1 2.81828 - 1 - 1
Note that each pointer addresses not the double , but the int that indexes the double.
If the search element exists, then you expect a pointer to some nonnegative integer. If
the search element does not exist, then the pointer points to the integer that would be
updated if the element in question were added.
Back To Code
Using FindNode , let's provide implementations for SetSearch and SetAdd.
Fortunately, one of them is easy... we can just check to see whether or not a valid
FindNode call returned the address of an int that stores a -1. At this point, we're
thrilled that we wrote FindNode before we wrote this.
- Function: SetSearch
- Usage: if (SetSearch(&staffSet, &lecturer) == NULL)
- printf("musta been fired");
- SetSearch searches for the specified client element according
- the whatever comparison function was provided at the time the
- set was created. A pointer to the matching element is returned
- for successful searches, and NULL is returned to denote failure. */
void *SetSearch(sortedset *set, const void *elemPtr) { int node = FindNode(set, elemPtr); if (node == -1) return NULL; return (char *) (set->root + 1) + *node * NodeSize(set->elemSize); }
Actually, SetAdd isn't horrendous either. Assessing whether the element was
previously added is easy. The tough part is the part that's tough with any type of
generic programming. Should the element need to be added, we need to claim the next
node in the array (which might require a realloc call), copy in the element’s bit
pattern, and update three indices. Next page, please:
CS107 Handout 18
Spring 2008 April 25, 2008
Section Handout
Problem 1: The sparsestringarray
A sparsestringarray is an array-like data structure that provides constant time access to its
elements, and constant time insertion and deletion. It layers array semantics over an
ordered collection of C strings, with the understanding that most of the strings are empty.
The sparsestringarray is different from C arrays and other array-like data structures,
because it’s aggressively miserly in its use of memory. Each empty string requires just one
bit of storage, which is less than 3% of the memory cost incurred by the allocation of a full
char *. The implementation is slower than true C arrays and our vector from Assignment
3, but it’s a wise choice when memory is at a premium and an overwhelmingly large
fraction of the strings being stored are the empty string.
Our sparsestringarray is backed by an array of groups, where each group is responsible
for managing a contiguous subset of array indices. The programmer specifies not only the
logical length of the sparsestringarray , but also the group size. If the logical length of the
full sparsestringarray is, for instance, established as 50,000, and the group size is
established to be 100, then group 0 would manage indices 0 through 99, group 1 would
manage indices 100 through 199, group 2 would manage indices 200 through 299, and so
forth. All search, insert, and delete operations are passed on to the appropriate group.
typedef struct { group *groups; // dynamically allocated array of structs, defined below int numGroups; // number of groups int arrayLength; // logical length of the full sparsestringarray int groupSize; // number of strings managed by each group } sparsestringarray;
Each group contains a bitmap, which is an array of bool s whose length is equal to the group
size, and a C vector (yes, Assignment 3’s vector ) to store nonempty, dynamically allocated
C strings. Search for a particular element amounts to search within a particular group at
index i. If bitmap[i] is false , then the string at the i th^ position is understood to be the
empty string. If instead bitmap[i] is true , then the group needs to find the corresponding
string in the C vector.
typedef struct { bool *bitmap; // set to be of size 'groupSize' vector strings; // vector of dynamically allocated, nonempty C strings } group;
Each true in a group’s bitmap corresponds to some char * (addressing a dynamically
allocated C string) in the same group’s vector. The true at the lowest index in the bitmap
corresponds to the 0th^ entry in the vector ; the true at the second lowest index in the bitmap
corresponds to the 1st^ entry in the vector , and so forth; the total number of true s should be
equal to the logical length of the accompanying vector. (In practice, the bool array would
be compressed to use just one bit of memory for each Boolean value, but for our purposes
we won’t implement that optimization, since it requires advanced C directives not covered
in CS107.)
Here are the prototypes of the four functions you’ll be implementing:
void SSANew (sparsestringarray *ssa, int arrayLength, int groupSize); bool SSAInsert (sparsestringarray *ssa, int index, const char *str); void SSAMap (sparsestringarray *ssa, SSAMapFunction mapfn, void *auxData); void SSADispose (sparsestringarray *ssa);
Of course, the sparsestringarray gives the illusion that all strings, both empty and nonempty,
are stored in an array-like manner. You know as the implementer the internal representation
is such that only nonempty strings are really stored. Your job is to implement these four
functions in a way that’s consistent with the description outlined on the previous page. Here’s
a test program that illustrates how a client can interact with a sparsestringarray.
static void CountEmptyPrintNonEmpty(int index, const char *str, void auxData) { if (strcmp(str, "") != 0) { printf("Oooo! Nonempty string at index %d: "%s"\n", index, str); } else { ((int *)auxData)++; } }
int main(int argc, char **argv) { sparsestringarray ssa; SparseStringArrayNew(&ssa, 70000, 35);
SparseStringArrayInsert(&ssa, 33001, "need"); SparseStringArrayInsert(&ssa, 58291, "more"); SparseStringArrayInsert(&ssa, 33000, "Eye"); SparseStringArrayInsert(&ssa, 33000, "I"); SparseStringArrayInsert(&ssa, 67899, "cowbell");
int numEmptyStrings = 0; SparseStringArrayMap(&ssa, CountEmptyPrintNonEmpty, &numEmptyStrings); printf("%d of the strings were empty strings.\n", numEmptyStrings);
SparseStringArrayDispose(&ssa); return 0; }
Here’s the output of that test program:
Oooo! Nonempty string at index 33000: "I" Oooo! Nonempty string at index 33001: "need" Oooo! Nonempty string at index 58291: "more" Oooo! Nonempty string at index 67899: "cowbell" 69996 of the strings were empty strings.
Y e^ l l o w
P i n k
G r e e n
R e^ d
NULL P u^ r^ p^ l e
c) Finally, implement the SSAMap routine, which applies the specified mapping function to
every single index/string pair held by the specified sparsestringarray. Note that the
mapping function is called on behalf of all strings, empty and nonempty. The specified
auxiliary data is channeled through as the third argument to every single call.
- Function: SSAMap
- Applies the specified mapping routine to every single index/string pair
- (along with the specified auxiliary data). Note that the mapping routine
- is called on behalf of all strings, both empty and nonempty. */
typedef void (*SSAMapFunction)(int index, const char *str, void *auxData) void SSAMap(sparsestringarray *ssa, SSAMapFunction mapfn, void *auxData);
Problem 2: Serializing Lists of Packed Character Nodes
Write a function serializeList to convert a linked list to a single stream of null-delimited
characters arrays.
The linked list consists of a series of
variably sized nodes, where each node
packs the address of the next node and all
of the characters of a C string all into one
contiguous block.
Notice that each node stores a four-byte pointer,
followed by the individual characters of the string,
followed by the null character (represented as a shaded
box). Each of these four-byte pointers stores the address
of the next node in the list, unless there is no next node, in
which case the four-byte pointer is equal to NULL.
serializeList synthesizes a dynamically allocated serialization of such a list. The
serialization starts off with a sizeof(int)- byte figure storing the number of C strings. The
serialization then continues with each of the C strings laid down side by side, one after
another in their original order. The individual strings are separated by the null characters,
and the final string in the character array is null-terminated as well. If handling the above
list, serializeList would build and return the base address of the int storing the 5 :
serializeList takes a const void * and constructs the corresponding serialization. Your
implementation:
- should be implemented iteratively in one single pass over the list.
- should create a serialization using the exact number of bytes needed.
5 R e d Y e^ l l o w P i n k G r e e n P u^ r^ p^ l e
list
- should not free the nodes of the original list.
- should be written in straight C, using no C++ whatsoever.
- should return the base address of the entire figure, expressed as an **int ***.
- should properly handle the empty list.
- needn't perform any error checking of any sort.
Relevant function prototypes:
The strlen function returns the number of bytes in str , not including the
terminating null character.
- **strcpy(char destination, const char source);
The strcpy function copies string source to destination , including the
terminating null character, stopping after the null character has been copied.
int *serializeList(const void *list);
Problem 3: The multitable
The multitable allows a client to associate keys (of any type) with one or more values (of
any type). It operates somewhat like the C++ map class, except that it’s written in C and it
allows multiple values to be bound to a single key.
The multitable shouldn’t re-implement the hashset and the vector , but instead should be
layered on top of them. A single key’s collection of values should be stored in a C vector ,
and each key/ vector - of-values pair will be stored in a C hashset. The pair itself is a
manually managed chunk of memory, the size being determined by the size of the key and
the size of a vector.
I’ve designed the multitable struct for you, but you’ll be implementing three functions to
demonstrate your understanding of all the low-level C functions we’ve been studying.
Here’s the reduced .h file outlining the signatures of those three functions.
typedef int (*MultiTableHashFunction)(const void keyAddr, int numBuckets); typedef int (MultiTableCompareFunction)(const void *keyAddr1, const void keyAddr2); typedef void (MultiTableMapFunction)(void *keyAddr, void *valueAddr, void *auxData);
typedef struct { hashset mappings; int keySize; int valueSize; } multitable;
keySize bytes sizeof(vector)^ bytes^ bytes