Character and String Processing: Manipulating Characters and Strings in Java, Study notes of Computer Science

An overview of character and string processing in java. It covers the use of character constants and special characters, the character class and its methods, and the handling of strings as complex objects. The document also includes examples of implementing methods like isdigit(), randomletter(), and touppercase().

Typology: Study notes

Pre 2010

Uploaded on 08/16/2009

koofers-user-hiz-1
koofers-user-hiz-1 🇺🇸

10 documents

1 / 48

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Mat 2170
Week 11
Characters and Strings
Spring 2009
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30

Partial preview of the text

Download Character and String Processing: Manipulating Characters and Strings in Java and more Study notes Computer Science in PDF only on Docsity!

Mat 2170

Week 11

Characters and Strings

Spring 2009

Student Responsibilities

I (^) Reading: Textbook, Sections 8.2–8. I (^) Lab: Character and String processing I (^) Attendance

Chapter Eight Overview

Surely you don’t think that numbers are as important as words. —King Azaz to the Mathemagician; Norton Juster, The Phantom Tollbooth, 1961

  1. Characters - the primitive type char
  2. Strings as an abstract idea
  3. Using methods in the String class
  4. A case study in string processing

ASCII

I (^) The first widely adopted character encoding was ASCII — American Standard Code for Information Interchange.

I (^) With only 256 possible characters (the number of bit combinations in a byte), the ASCII system proved inadequate to represent the many alphabets in use throughout the world.

I (^) It has therefore been superseded by Unicode, which allows for a much larger number of characters.

The ASCII Subset of Unicode

The first 128 characters — written in Octal or Base 8 base 0 1 2 3 4 5 6 7 000 \ 000 \ 001 \ 002 \ 003 \ 004 \ 005 \ 006 \ 007 010 \b \t \n \ 011 \f \r \ 016 \ 017 020 \ 020 \ 021 \ 022 \ 023 \ 024 \ 025 \ 026 \ 027 030 \ 030 \ 031 \ 032 \ 033 \ 034 \ 035 \ 036 \ 037 040 space! " # $ % & ’ 050 ( ) * + , -. / 060 0 1 2 3 4 5 6 7 070 8 9 : ; < = >? 100 @ A B C D E F G 110 H I J K L M N O 120 P Q R S T U V W 130 X Y Z [ \ ] ^ _ 140 ‘ a b c d e f g 150 h i j k l m n o 160 p q r s t u v w 170 x y z { | } ∼ \

Properties of Unicode

Two properties of the Unicode table are worth special notice:

I (^) The character codes for the digits are consecutive

I (^) The letters in the alphabet are divided into two ranges: one for the uppercase letters and one for the lowercase letters.

Within each range of alphabetic letters, the Unicode values are consecutive.

Special Characters

I (^) Most of the characters in the Unicode table are familiar ones that appear on the keyboard.

I (^) These characters are called printing (or printable) characters.

I (^) The table also includes several special characters that are typically used to control formatting.

I (^) Special characters are indicated in the Unicode table by an ”escape” sequence, which consists of a backslash followed by a character or sequence of digits.

The Character Class

I (^) The Character class is defined in the java.lang package and is therefore available in any Java program without an import statement.

I (^) The Character class provides several useful methods for manipulating char values.

I (^) Character methods are static — they belong to the class, rather than to an object of the class.

I (^) Hence, these methods aren’t sent to a Character class object, but can be used with chars — similar to Math class methods and how they work on numeric types.

I (^) It is good programming practice to use these library methods rather than writing our own.

Why to Use Character Class Methods

I (^) They are standard — programmers recognize and know what they mean.

I (^) Library methods have been tested by millions of client programmers, so it is reasonable to expect that they are correct.

I (^) Implementations in the Character class are able to convert characters in alphabets other than our Roman (English) alphabet.

I (^) Library methods are typically more efficient than methods we write ourselves.

Two Which Return char Rather Than boolean

static char toLowerCase(char ch) Returns a copy of ch converted to its lowercase equivalent, if any. Otherwise, the value of ch is returned unchanged.

static char toUpperCase(char ch) Returns a copy of ch converted to its uppercase equivalent, if any. Otherwise, the value of ch is returned unchanged.

Neither of these methods modifies the char argument.

Character Arithmetic

I (^) The fact that characters have underlying integer representations allows us to use them in arithmetic expressions.

I (^) For example, if you evaluate the expression ’A’ + 1, Java will convert the character ’A’ into the integer 65 and then add 1 to get 66, which is the character code for ’B’

I (^) As an example, the following method returns a randomly chosen uppercase letter—why is the (char) in the return statement? public char randomLetter() { return (char) rgen.nextInt(’A’, ’Z’); }

An Exercise in Character Arithmetic

I (^) We wish to implement a method, toHexDigit(), that takes an integer and returns the corresponding hexadecimal (base 16) digit as a character:

  1. Thus, if the argument is between 0 and 9, the method should return the corresponding character between ’0’ and ’9’.
  2. If the argument is between 10 and 15, the method should return the appropriate letter in the range ’A’ through ’F’. A = 10, B = 11, C = 12 D = 13, E = 14, F = 15
  3. If the argument is outside this range, the method should return a question mark, ’?’.

A Solution

public char toHexDigit(int n) { if (0 <= n && n <= 9) { return (char) (’0’ + n); } else if (10 <= n && n <= 15) { return (char) (’A’ + (n - 10)); } else { return ’?’; } }

I (^) For most applications, this abstract view of strings is precisely the right one — however, on the inside, strings are surprisingly complicated objects.

I (^) Java supports a high–level view of stings by making String a class whose methods hide the underlying complexity.

Using Methods in the String Class

I (^) Java defines many useful methods that operate on the String class.

I (^) Before trying to use those methods individually, it is important to understand how those methods work at a more general level.

I (^) The String class uses the receiver syntax when we call a String method — unlike the Character class static methods which take chars as arguments — in Java, we send a message to a String object.