Grammar
Definition
A grammar specifies a language by describing structural rules which can be used to accept or reject a sentence. So, each grammar is associated with a language.
It is fully described by (Variables, Terminal chars/ alphabet, Start symbol, Production rules).
Production rules are of the form:
string including variable V
A given string is accepted if, starting with the start symbol, using a finite number of production rules, one can arrive at the string.
Parsing connection
The rules involved in generating a given string (sentence or word) is closely connected to the semantic meaning underlying that string.
Characteristics
There are conflicting demands on grammar design/ development.
Due to the parsing connection, unambiguity is desirable.
A good grammar should define a language with which a wide variety of semantics can be expressed. So, expressiveness is desirable.
At the same time, if the resultant language is being used for communicaiton, brevity is desirable.
Context free grammars
(CFG) constrain production rules so that their left hand side (LHS) has exactly one variable/ start symbol.
Normal forms
Chomsky normal form for CFG: only
Other traits
Pumping Lemma for CFG (How could a sufficiently long string be produced by a CFG?); use in checking if language is CFG.
Syntax trees and ambiguity.
Regular expression
This is a symbolic representation of a certain type of languages (defined by its syntax). This language consists of strings which may be produced by the application of production rules specified by the regular expression. THe production rules may also be viewed as producing a set of strings.
Production rules
Fundamental production rules include concatenation (cartesian product of string sets), alternation (union of k sets) represented by \verb | , and the ‘kleene star’ represented by * (union with empty set of the concatenation-closure of a certain set).
Grouping
To clarify operator precedence, expressions are grouped together. Generally, groupings are specified by enclosing them within ().
Example
abc(def|hig)*uvw.
Also see programming languages’ use of regular expressions for string search and for the associated richer syntax.
Finite state automata
This is a class of boolean functions over strings that can be formed using a finite alphabet.
States and transitions
Finite state automaton is a collection of states and transitions between the states. The states include special states such as exactly 1 start-state and
Every transition between two states is labeled with a (input) character from the alphabet. Depending on the number of transitions corresponding to a given (state, letter) pair, the transition is said to be deterministic or non-deterministic.
As a directed graph
One can use a directed graph where nodes and edges correspond to states and transitions. The edges are labelled with appropriate input characters.
Acceptance
A given string is accepted or rejected based on whether one can reach a ‘final-state’ by making transitions appropriate to characters in the string.
Resources
Uses constant memory.
Expressiveness
Accept regular languages (describable by a regular expression). Pumping lemma for regex: Middle part of strings in regular languages can be pumped.
Strictly more powerful than decision trees. \why
(Non) determinism
Determinsitic (DFA) and non deterministic representations.
Pushdown automata
Determinsitic Push-down automata and stack memory. Non deterministic push-down automata; accepts context-free languages (use Greibach).