Lexical Analysis

Purpose

A lexical analyzer is responsible for tasks like -

  • Stripping out new lines and unnecessary white spaces
  • Ignoring comments based on a certain start symbol (# in Python for example)
  • Keeping track of line numbers
  • Generating tokens

Terminology

Lexeme

A lexeme is a sequence of characters, which when matched with a pattern are identified by the lexical analyzer as an instance of a specific token.

Pattern

A pattern is a description of the form, that lexemes have to take to constitute a token. For a keyword token, for example, the pattern could be as simple as just being a sequence of alphabets.

Token

A token can be thought of as a lexical unit. It is an abstract symbol, that might have some optional attribute value. Tokens are what are parsed by a parser.

Implementation

To implement a lexical analyzer, we will make use of finite automata.