























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This chapter introduces the use of arrays and files in Java and Processing. ... Java objects — arrays, in particular — have a clone() method that tells an ...
Typology: Exams
1 / 31
This page cannot be seen from the preview
Don't miss anything!
























In the preceding chapters, we have used variables to store single values of a given type. It is sometimes convenient to store multiple values of a given type in a single collection variable. Programming languages use arrays for this purpose. It is also convenient to store such values in files rather than by hard-coding them in the program itself or by expecting the user to enter them manually. Languages use files for this purpose. This chapter introduces the use of arrays and files in Java and Processing.
As in previous chapters, the running example is implemented in Processing but the remainder of the examples on arrays, array topics and multi-dimensional array can work in either Processing or Java. The material on files is Processing-specific; Java files are treated in a later chapter.
Computers are powerful tools both for collecting and storing large amounts of data and for analyzing and presenting the patterns and trends in that data. These patterns and trends are commonly called information , and the computer is commonly used as a tool for deriving information from data. For example, computers have been used to collect and store thousands of statistics on human life and health for the United Nations, millions of customer records for multinational corporations, and billions of data points for the human genome project. Computers have also been used to mine useful information from these data sets. Processing provides all of these capabilities, with a particular emphasis on data visualization , whose goal is to present data in such as way as to allow humans to see the informational “big picture” that is so easily lost in the volumes of raw data.
Note that data representation and visualization are not easy tasks. Collecting and managing large data sets is challenging because of the myriad ways in which the data can be corrupted or lost. Processing large data sets requires considerable computing power and careful programming. Presenting data accurately requires careful extraction of data abstractions that are faithful to the original data. The entire field of information systems , a sub-field of computing, has arisen to address these issues.
In this chapter, our vision is to build an application that can display an appropriate set of data as a bar chart such as the one shown in the rough sketch in Figure 8-1. This is a standard bar chart in which each labeled bar represents a single data value and we’d like to add some aggregate statistics at the bottom. Bar charts such as this one allow the human visual system to perceive the relative values in this data set. Our goal is to display the average life expectancy in years of a newborn child in the five permanent members of the UN Security Council for the year 2007. Figure 8 - 1. A bar chart showing created from a list of data values
Building a visualization such as this one requires that the application be able to:
Represent the data, including the life expectancy values and the corresponding country names; Analyze the data and derive aggregate statistics (e.g., average, minimum and maximum values); Store the data permanently; Present the data in a visual bar chart.
We can achieve the last element of the vision, presenting the data as text and bars of appropriate sizes, using techniques discussed in the previous chapters. This chapter focuses on the first three elements. We will use arrays to represent the data, array processing techniques to analyze the data, and files to store the data permanently.
The first element of the chapter example vision is to represent the five data values shown in Figure 8-1. In previous chapters, we would do this using five separate variables:
float expChina = 72.961, expFrance = 80.657, expRussia = 65.475, expUK = 79.425, expUSA = 78.242;
This approach could work, but consider the problem of computing the average of these data values. This would require the use of the following expression:
(expChina + expFrance + expRussia + expUK + expUSA) / 5
Now, consider the fact that the International Standards Organization officially recognizes over 200 countries. This means that working with data for all the countries would require over 200 separate float variables and expressions with separate operands to match.
As an alternative to the simple variable, which stores exactly one value, Java provides a data structure that stores multiple values of the same type. We have already seen an example of this sort of structure; Java represents variables of type String as lists of char values that can be accessed using an index. For example, if aString is a variable of type String, then aString.charAt(i) will return the char value in aString with index i. This section describes how to declare, initialize and work with indexed structures.
Java represents indexed data structures using arrays. Arrays are objects, which means that they must be accessed via handles. To define an array, we must declare an array handle and initialize the value referred to by that handle. To declare an array handle, we use the following pattern:
type [] name ;
data structure by allocating a fixed amount of adjacent memory locations appropriate for representing the number and type of the elements. Once initialized, the length of the array cannot be modified.
Array Indexes Each array value has an assigned index running from 0 to 4, shown in the figure using square braces. A program can access an individual array element using the subscript operator ([]), which requires the array’s name and the item’s index value. The pattern for using this operator to access the element of the array anArray of index i is shown here:
anArray [ i ]
In Java, array indexing uses a zero-based scheme, which means that the first item in the expectancyValues array can be accessed using the expression expectancyValues[0], the second value using the expression expectancyValues[1], and so forth. These subscript expressions are known as indexed variables because a program can use them as it uses any other variable. For example, a program can set the value of the first expectancy variable using this assignment statement.
expectancyValues[0] = 72.961;
When the program is running, Java’s subscript operation tests the value of the index and throws an error if the index value is out of bounds. For example, the evaluating the expressions expectancyValues[- 1] or expectancyValues[6] will throw errors.
ArrayLength The number of elements in an array is known as the length of the array and can be accessed using the array length property.^1 For example, the length of the expectancyValues array can be accessed using expectancyValues.length, which returns integer value 5, and the last element of the array can be accessed using expectancyValues[expectancyValues.length – 1].
Array Initializers Java supports a way to initialize array values using array initializers. The following code initializes an array with the values shown in Figure 8-1.
float[] expectancyValues = {72.961, 80.657, 65.475, 79.425, 78.242};
This code initializes the values in the array to the literal float values specified in the braces ({}). In this case, Java allocates the size of the array data structure to fit the number of values found in the array initializer expression. This array initializer cannot be used as an array literal in other contexts; it must be used to initialize the array as shown here.
Java does not require arrays to store only values of primitive types. Arrays can store reference types as well. This statement defines an array of string objects:
String[] expectancyCountries = {"China", "France", "Russia", "UK", "USA"};
(^1) In Java, a program accesses an array’s length using the length property, e.g., anArray.length, whereas it
accesses a string’s length using the length() method, e.g., aString.length().
This data structure can be visualized as follows:
Here the array elements are not primitive values, but handles for String reference objects.
Array definitions in Java have the following general pattern:
Because Java implements arrays as fixed length, indexed structures, the counting for loop provides an effective way to work with array elements. For example, the following code prints the names of the five countries represented by expectancyCountries:
Code: for (int i = 0; i < expectancyCountries.length; i++) { println(i + ": " + expectancyCountries[i]); }
ElementType [] arrayName ; or ElementType [] arrayName = new ElementType [ length ]; or ElementType [] arrayName = arrayInitializer ;
ElementType is any type (including an array type); arrayName is the handle for the array object being defined – if there is no assignment clause in the statement, the handle value is set to null; length is an expression specifying the number of elements in the array; arrayInitializer is the list of literal values of type ElementType , enclosed in curly braces ({ }).
Array Definition Pattern
float[] result = new float[arrayLength]; for (int i = 0; i < arrayLength; i++) { result[i] = 0.0; } return result; }
This method receives an integer representing the length of the desired array, verifies that it is at least 0, constructs the array, fills the array with values of 0.0, and finally returns the array. Note that Java and some other languages often initialize numeric array values to 0, but as discussed in the previous section, it generally not a good idea for a program to assume this.
Reference Types as Parameters In Chapter 3 we discussed the distinction between primitive types and reference types, where primitive types store simple values and reference types store references, or pointers, to values. Arrays are reference types. The following diagram illustrates the difference:
On the left, we declare an integer i, whose value is the primitive integer value 1 as shown; on the right, we declare an integer array a whose value is a reference to the two-valued array as shown.
This distinction is important when using arrays and other reference types as parameters. Consider the following code, which initializes an integer variable i to the value 1 and passes that primitive value as an argument to the changeValue() method.
Code: public static void main(String[] args) { int x = 1; changeValue (x); System. out .println("In main(), x == " + x); }
public static void changeValue( int x) { x = 2; System. out .println("In changeValue(), x == " + x); }
Output: In changeValue(), x == 2 In main(), x == 1
This code behaves as we would expect given our discussion of parameter passage in Chapter 4:
In this parameter passage technique, called pass-by-value , the value of the argument is passed to the parameter. Java passes all of its parameters by value. However, because an array is a reference object, the pass-by-value technique leads to potentially unexpected results. Consider the following code, which initializes an integer array variable a to the array initializer value {1, 2} and passes that reference value as an argument to the changeArray() method:
Code: public static void main(String[] args) { int [] a = { 1, 1 }; changeArray (a); System. out .println("In main(): " + "{" + a[0] + ", " + a[1] + "}"); }
public static void changeArray( int [] a) { a[0] = 3; a[1] = 4; System. out .println("In changeArray(): " + "{" + a[0] + ", " + a[1] + "}"); }
Output: In changeArray(): {3, 4} In main(): {3, 4}
Note that the output of this code is different from the example given earlier. Here, changeArray() changes the values in the array permanently, which is why both calls to println() print the new values (3, 4). This code behaves as follows:
So while this is still pass-by-value behavior, the nature of the reference value being passed allows the original value of the argument to be accessed and changed by reference.
int [] original = {11, 22, 33, 44, 55}; int [] copy;
and we want copy to be a distinct copy of original, we can write:
copy = original.clone();
The clone() method makes a distinct copy of the array original We can picture the result as follows:
The clone() method can thus be used to make a distinct copy of an array.
It is important to note, however, that, for the sake of efficiency, clone() makes a simple copy of the object’s memory. For arrays of primitive types such as original , this produces a completely distinct copy but not for arrays of reference types. To illustrate, consider the following code segment, which manipulates an array of StringBuffer objects:
StringBuffer[] names = { new StringBuffer("Abby"), new StringBuffer("Bob"), new StringBuffer("Chris") }; StringBuffer[] copy = names.clone();
Here, the clone() method does makes a copy of the array names , but it is not a completely distinct copy. The reason is that names is an array of StringBuffer values, meaning its elements are StringBuffer handles. StringBuffer is similar to String except that where modifications to String objects result in the creation of a completely new string object, modifications of StringBuffer objects modify the existing StringBuffer object. When names is cloned, it makes a copy of itself by a simple copy of its memory. This creates a second array whose elements are copies of its elements, and since those elements are String handles containing addresses, the String handles in this copy contain the same addresses. Put differently, the elements of names and the elements of copy are different handles for the same sequence of values. Because it copies handles without copying the objects to which they refer, the clone() method’s operation is sometimes referred to as a shallow copy operation.
original
11
[0] [1] [2] [3] 22 33 44
[4] 55
copy 11
[0] [1] [2] [3] 22 33 44
[4] 55
names
Abby Bob Chris
[0] [1] [2]
copy
[0] [1] [2]
In some situations, shallow copying can lead to a problem. The most common problem occurs if we change the objects to which the handles in a shallow copy refer. For example, if we use names to change the 'o' in "Bob" to 'u',
names[1].setCharAt(1, 'u');
public static StringBuffer [] deepCopy(StringBuffer [] original) { StringBuffer [] result = new StringBuffer[original.length] ;
for (int i = 0; i < original.length; i++) result[i] = original[i].clone();
return result; }
There are many situations in which the clone() method’s shallow copy is perfectly adequate, however. For example, if we assign names[1] the value "Bill" , copy[1] will still refer to "Bob":
Array Equality Java’s Object class defines an equals() message that can be sent to an array object: if (a1.equals(a2) ) // ...
Unfortunately, this method simply compares the addresses in the handles a1 and a2. If they refer to the same object, then it returns true; otherwise it returns false. To actually compare the elements of two arrays, we must write our own method. To illustrate, the following class method equals() can be used to compare the elements of two arrays of double values, array1 and array2 :
names
Abby Bob Chris
[0] [1]
copy
[0] [1] [2]
Bill
[2]
names
Abby Bub Chris
[0] [1] [2]
copy
[0] (^) [1] [2]
[2]
We must also be able to print our data in a consistent manner. Our ultimate goal is to produce a bar chart such as the one shown in our original sketch shown in Figure 8-1. In this iteration, we’ll satisfy ourselves by simply printing the names and values for each country without the bars. We’ll also include the aggregate statistics. To achieve this preliminary goal, we can use the following algorithm.
Given: expectancyCountries is declared as an array of strings and is initialized with a list of country names. The index values correspond with the values of the expectancy value array. expectancyValues is declared as an array of floats and initialized with a list of expectancy values. The index values correspond with the values of the country name array.
Algorithm:
This algorithm combines four basic tasks all in one loop. The main task is that of printing the table, which the algorithm does using a counting for loop that goes through the countries one at a time, printing one table row on each pass (steps 5 and 5.a). Each time through the loop, the algorithm refers to the “current” country name or expectancy value; this refers to the ith name or value in the respective arrays.
The loop is also computing statistics as it goes through. It computes the average expectancy value using the same algorithm shown in the section above (see the computeAverage() method). It’s also searching for the maximum and the minimum life expectancy values. It does this by maintaining a maximum (and minimum) value “seen so far”. Each time through the loop, it updates these values based on whether the current value is larger (or smaller) than the current value seen so far. All three of these accumulator algorithms assume that their accumulators have been initialized properly before the loop starts. The sum accumulator must be initialized to 0, which ensures that sum accumulated by the loop is accurate. The maximum accumulator must be set to some really small number, which ensures that the current value seen the first time through the loop will always be larger than the maximum value seen so far. The computation of the minimum value is handled similarly.
The following code implements this algorithm.
/**
void setup(){ // Print the table header. println("Average Life Expectancy in Years (" + year + ")");
// Initialize the aggregator values. float sum = 0.0, maximum = Float.MIN_VALUE, minimum = Float.MAX_VALUE;
for (int i = 0; i < expectancyCountries.length; i++) { // Print the next table row. print(expectancyCountries[i] + ": " + expectancyValues[i] + "\n");
// Accumulate the sum of the expectancy values. sum += expectancyValues[i];
// Update the maximum value seen so far. if (expectancyValues[i] > maximum) { maximum = expectancyValues[i]; }
// Update the minimum value seen so far. if (expectancyValues[i] < minimum) { minimum = expectancyValues[i]; } }
// Print the aggregate statistics. println("Average: " + sum / expectancyCountries.length); println("Maximum Value: " + maximum); println("Minimum Value: " + minimum); println("Data Source: " + source); } This program prints the following simple table in the text output panel.
Average Life Expectancy in Years (2007) China: 72. France: 80. Russia: 65. UK: 79. USA: 78. Average: 75. Maximum Value: 80. Minimum Value: 65. Data Source: GapMinder.com, 2009
Code Output int [] list = { 7, 1, 9, 5, 11 }; if ( linearSearch (list, 100) > - 1) { System. out .println("Item found"); } else { System. out .println("Item not found"); }
Item not found
Note that the algorithm and implementing code assume that the list to be searched is not null. Passing a null list, as in linearSearch(null, 100) results in a null-pointer exception. Given that this search method cannot control how it is called, it would be wise to modify the search method as follows:
public static int linearSearch( int [] list, int value) { if (list == null ) { return -1; } for ( int i = 0; i < list.length; i++) { if (value == list[i]) { return i; } } return -1; }
This version of the method checks the validity of the list before starting the search and, if the list is null, indicates that the value is not found by returning -1. This is more robust because it anticipates potentially bad input and responds appropriately.
Binary Search If a list has been sorted, binary search can be used to search for an item more efficiently than linear search. Linear search can require up to n comparisons to locate a particular item, but binary search will require at most log 2 n comparisons. For example, for a list of 1024 (= 2^10 ) items, binary search will
locate an item using at most 10 comparisons, whereas linear search may require 1024 comparisons.
In the binary search method, we first examine the middle element in the list, and if this is the desired element, the search is successful. Otherwise we determine whether the item being sought is in the first half or in the second half of the list and then repeat this process, using the middle element of that list.
To illustrate, suppose the list to be searched is as shown here in the left-most column:
If we are looking for 1995, we would first examine the middle number 1898 in the sixth position. Because 1995 is greater than 1898, we can disregard the first half of the list and concentrate on the second half (see column two). The middle number in this sub-list is 2335, and the desired item 1995 is less than 2335, so we discard the second half of this sub-list and concentrate on the first half (see column three). Because there is no middle number in this sub-list, we examine the number immediately preceding the middle position —the number 1995 — and locate our number. Note that this approach only works if the list is sorted.
The following algorithm specifies this binary search approach for a list of n elements stored in an array, list [0], list [1],.. ., list [ n – 1] that has been ordered so the elements are in ascending order. If value is found, its location in the array is returned; otherwise the value n is returned.
Binary Search Algorithm:
Note that this algorithm adds the safety check for a null list in step 3. The following code implements this algorithm in Java:
public static int binarySearch( int [] list, int value) { if (list == null ) { return -1; } int first = 0; int last = list.length - 1; while (first <= last) { int middle = (first + last) / 2; if (value < list[middle]) { last = middle - 1; } else if (value > list[middle]) { first = middle + 1; } else { return middle; } } return -1; }
In cases where the input file includes more than one atomic value on a given line, the program must split up the input line. In the following example, the country names are listed on a single line in the file.
This program uses loadStrings() again but because the input file includes all the country names on one line, loadStrings() produces an array of strings with only one element at index 0 whose value is the string:
"China, France, Russia, UK, USA"
To work with the individual country names, the program must split this one string value into separate country names. It does this using the split() method, which takes as arguments: (1) the line to be split, countryLines[0]; and (2) a string specifying the characters used to separate the country names, ", ". The separating string is known as a delimiter. This method creates an array of five strings, one for each country with the delimiting characters removed. The result is the same array of country names produced in the last example.
Because these input files are text files, the only type of data that Processing can read from them is string data. To read numeric values from a text file, a program must convert the string value it reads from the file, say “72.961”, into the corresponding numeric value for use in numeric computations, 72.961. The following example reads the lines from the file and constructs arrays for the country names and the numeric values for those countries.
This program declares three arrays, a string array for the lines in the file (countryLines), a second string array for the country names (countryNames) and a float array for the expectancy values (countryValues). As with the previous examples, it starts by calling loadStrings() to read the lines of the file into an array of strings. This results in an array of 5 strings, the first of which has the following value:
"China 72.961"
In order to work with the name as a string and the expectancy value as a float, the program must now separate the name string from the float value. This process is parsing and the individual elements on the lines being parsed are called tokens. In this example, the parsing process will produce data for five countries, with two tokes for each country: a name string and a float value. The program uses the new operator to create an empty array for the names and an empty array for the values. Both of these arrays have a length set to the number of lines read from the file. This way, we can add or remove countries from the file and the program will automatically handle the changed number of country lines.
The program then loops through the input and divides each line into the country name portion and the numeric value portion. It does this using the split() method discussed in the previous example. In this case, split() returns an array of two tokens (tokens) , the first of which is a name string (e.g., “China”) and the second of which is the expectancy value for that country (e.g., “72.961”). The program stores the name directly into the array of country names. It must then convert the string version of the numeric value (e.g., “72.961”) into the corresponding floating point value (e.g., 72.961); it does this using the float() conversion method and then loads that converted value into the array of expectancy values.
The program finishes by printing the data on the text output window. This output data looks very much like the input file, but the big difference is that the program has parsed tokens on each line in the file into