










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
programming language tree pharse expression
Typology: Schemes and Mind Maps
1 / 18
This page cannot be seen from the preview
Don't miss anything!











Introduction to Programming language A programming language is a language designed to communicate instructions to a computer. They are used to create programs that control the behavior of a machine. A programming language is a notation for writing programs, which are specifications of a computation or algorithm. However, some authors restrict the term "programming language" to those languages that can express all possible algorithms. Thus, Programming language is a set of commands, strings or characters readable by programmers but easily translatable to machine code. It has syntax, grammar, and semantics.
Attributes of good Programming Language Clarity, Simplicity and Unity: A Programming language provides both a conceptual framework for Algorithm planning and means of expressing them. It should provide a clear, simple and unified set of concepts that can be used as primitives in developing algorithms. Orthogonality: Orthogonality is one of the most important features of PL. It is the property that says " Changing A does not change B". Support for Abstraction: There is always found that a substantial gap remaining between the abstract data structure and operations that characterize the solution to a problem and their particular data structure and operations built into a language. Programming Environment: An appropriate programming environment (reliable documentation and testing packages) adds an extra utility and make language implementation easier. Portability of programs: Programming language should be portable means it should be easy to transfer a program from which they are developed to the other computer. A program whose definition is independent of features of a Particular machine forms can only support Portability. Example: Ada, FORTRAN, C, C++, Java.
actually in use. Compared to natural languages that developed and evolved independently, programming languages are far more similar to each other because:
Structured programming Structured programming (sometimes known as modular programming) is a programming paradigm that facilitates the creation of programs with readable code and reusable components. All modern programming languages support structured programming, but the mechanisms of support, like the syntax of the programming languages, varies. Where modules or elements of code can be reused from a library, it may also be possible to build structured code using modules written in different languages, as long as they can obey a common module interface or application program interface (API) specification. However, when modules are reused, it's possible to compromise data security and governance, so it's important to define and enforce a privacy policy controlling the use of modules that bring with them implicit data access rights. Structured programming encourages dividing an application program into a hierarchy of modules or autonomous elements, which may, in turn, contain other such elements. Within each element, code may be further structured using blocks of related logic designed to improve readability and maintainability. These may include case, which tests a variable against a set of values; Repeat, while and for, which construct loops that continue until a condition is met. In all structured programming languages, an unconditional transfer of control, or goto statement, is deprecated and sometimes not even available.
In the definitions, the symbol “::=” means that the name on the left-hand side is defined by the expression on the right-hand side. The name in a pair of angle brackets “<>” is nonterminal, which means that the name needs to be further defined. The vertical bar “|” represents an “or” relation. The boldfaced names are terminal, which means that the names need not be further defined. They form the vocabulary of the language. We can use the sentence definition to check whether the following sentences are syntactically correct. fast high big computer is good table 1 the high table is a good table 2 a fast table makes the high horse 3 the fast big high computer is good 4 good table is high 5 a table is not a horse 6 is fast computer good 7 The first sentence is syntactically correct, although it does not make much sense. Three adjectives in the sentence are correct because the definition of an adjective recursively allows any number of adjectives to be used in the subject and the object of a sentence. The second and third sentences are also syntactically correct according to
the definition. The fourth and fifth sentences are syntactically incorrect because a noun is missing in the object of the sentences. The sixth sentence is incorrect because “not” is not a terminal. The last sentence is incorrect because the definition does not allow a sentence to start with a verb. After we have a basic understanding of BNF, we can use it to define a small programming language. The first five lines define the lexical structure, and the rest defines the syntactic structure of the language.
The General Problem of Describing Syntax A language, whether natural (such as English) or artificial (such as Java), is a set of strings of characters from some alphabet. The strings of a language are called sentences or statements. The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language. English, for example, has a large and complex collection of rules for specifying the syntax of its sentences. By comparison, even the largest and most complex programming languages are syntactically very simple. Formal descriptions of the syntax of programming languages, for simplicity’s sake, often do not include descriptions of the lowest-level syntactic units. These small units are called lexemes. The description of lexemes can be given by a lexical specification, which is usually separate from the syntactic description of the language. The lexemes of a programming language include its numeric literals, operators, and special words, among others. One can think of programs as strings of lexemes rather than of characters. Lexemes are partitioned into groups—for example, the names of variables, methods, classes, and so forth in a programming language form a group called identifiers. Each lexeme group is represented by a name, or token. So, a token of a language is a category of its lexemes. For example, an identifier is a token that can have lexemes, or instances, such as sum and total. In some cases, a token has only a single possible lexeme. For example, the token for the arithmetic operator symbol + has just one possible lexeme. Consider the following Java statement: index = 2 * count + 17; The lexemes and tokens of this statement are: Lexemes Tokens index identifier = equal_sign 2 int_literal
In general, languages can be formally described in two distinct ways - by recognition and by generation - although neither provides a definition that is practical by itself for people trying to learn or use a programming language. Grammars and Derivations A grammar is a generative device for defining languages. The sentences of the language are generated through a sequence of applications of the rules, beginning with a special nonterminal of the grammar called the start symbol. This sequence of rule applications is called a derivation. In a grammar for a complete programming language, the start symbol represents a complete program and is often named. The simple grammar shown in Example below is used to illustrate derivations. Example: Grammar for a small language The language described by the grammar of Example above has only one statement form: assignment. A program consists of the special word begin, followed by a list of statements separated by semicolons, followed by the special word end. An expression is either a single variable or two variables separated by either a + or - operator. The only variable names in this language are A, B, and C. A derivation of a program in this language follows: => begin end => begin ; end => begin = ; end
The grammar of Example above describes assignment statements whose right sides are arithmetic expressions with multiplication and addition operators and parentheses. For example, the statement A = B * ( A + C ) is generated by the leftmost derivation: Parse Trees One of the most attractive features of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the languages they define. These hierarchical structures are called parse trees. For example, the parse tree in diagram (next page) shows the structure of the assignment statement derived previously.
Diagram: Parse tree for the structure of the assignment statement Every internal node of a parse tree is labeled with a nonterminal symbol; every leaf is labeled with a terminal symbol and every subtree of a parse tree describes one instance of an abstraction in the sentence. Ambiguity A grammar that generates a sentential form for which there are two or more distinct parse trees is said to be ambiguous. Consider the grammar shown in Example below, which is a minor variation of the grammar shown below:
Rather than allowing the parse tree of an expression to grow only on the right, this grammar allows growth on both the left and the right. Syntactic ambiguity of language structures is a problem because compilers often base the semantics of those structures on their syntactic form. Specifically, the compiler chooses the code to be generated for a statement by examining its parse tree. If a language structure has more than one parse tree, then the meaning of the structure cannot be determined uniquely. This problem is discussed in two specific examples in the following subsections. There are several other characteristics of a grammar that are sometimes useful in determining whether a grammar is ambiguous. They include the following: (1) if the grammar generates a sentence with more than one leftmost derivation and (2) if the grammar generates a sentence with more than one rightmost derivation. Some parsing algorithms can be based on ambiguous grammars. When such a parser encounters an ambiguous construct, it uses nongrammatical information provided by the designer to construct the correct parse tree. In many cases, an ambiguous grammar can be rewritten to be unambiguous but still generate the desired language. Operator Precedence When an expression includes two different operators, for example, x + y * z, one obvious semantic issue is the order of evaluation of the two operators (for example, in this expression is it add and then multiply, or vice versa?). This semantic question can be answered by assigning different precedence levels to operators. For example, if * has been assigned higher precedence than + (by the language designer), multiplication will be done first, regardless of the order of appearance of the two operators in the expression. A grammar can be written for the simple expressions we have been discuss- ing that is both unambiguous and specifies a consistent precedence of the + and * operators, regardless of the order in which the operators appear in an expression. The correct ordering is specified by using separate nonterminal symbols to represent the operands of the operators that have different precedence. This requires additional nonterminals and some new rules. Instead of using for both operands of both + and *, we could use three nonterminals to represent operands, which allows the grammar to force different operators to different levels in the parse tree. If is the root symbol for expressions, + can be forced to the top of the parse tree by having directly generate only + operators, using the new nonterminal, , as the right operand of +. Next, we can define to generate