Thursday, 21 April 2011

Writing an math expression evaluator: Lexical Analysis

Introduction

I've always been interested by compilers and specifically the effort that goes in to processing a string of code in to something a computer can understand. The most complete but simple example is a math expression parser.

At first thought, it doesn't sound all that complicated, but giving it a bit more thought, you can't just split an expression like "1 + 2 * (6 / 2)" on spaces and then work out what each part is, what if they enter "5+5"?

Then you think of actually evaluating the expression... operator precedence, processing the deepest expression parts first, etc.. also, how do you even know it's valid? you may be able to enter "+ + 2 4 ( hello 1" but you would have trouble evaluating that!

While writing these posts, I've learnt the basic methods for translating and evaluating mathematical expressions and implemented it as a ruby application. Please let me know if there are any improvements I can make, as I'm always looking to improve things :)

Lexical Analysis

To make this easier and cleaner, evaluation is split in to 2 phases, Lexical analysis is the first phase, taking the string and converting it in to a stream of symbols which are fed in to the second phase, the parser. The lexer should also raise errors indicating invalid characters and anything it can't find a symbol for.

As an example "1 + 1 * 3.14159", it would create a stream of symbols like: IntegerLiteral(1) AdditionOperator IntegerLiteral(1) MultiplicationOperator FloatLiteral(3.14159)

The method I chose was to hold a list of "matchers" that basically match a symbol at the current position in the input string.

So it'd start out with "1 + 1 * 3.14159" and the first request for a token would return IntegerLiteral(1) using a simple regular expression like: /^[0-9]*/. return IntegerLiteral and advance the string the number of characters that were matched.

now we are at " + 1 * 3.14159" a regular expression could match /\s+/ and just skip it because we aren't interested in that. it'll still advance the string the number of whitespace characters it matched.

We get to the plus, a simple literal check will match this so if code[position] is "+" then add one to the position and and return a AdditionOperator.

And so on

I have implemented this in a generic a way as possible to allow for other symbols and matchers to be lexed without changing the code code.

class Lexer
 def initialize(str, matchers)
  @str = str
  @position = 0
  @matchers = matchers
 end

 def consume
  return Symbol.new(nil, 0, :end) if @position >= @str.length
  part = @str[@position..-1]
  @matchers.each do |m|
   result = m.match(part)
   if result
    @position += result.length
    return self.send :next if m.ignore?
    return result
   end
  end
  raise "Unable to lex: #{part}"
 end
end

To create a Lexer class that would give us the correct symbols for a math expression, you just create a matcher for each type of symbol. This code isn't very pretty, but some nice DSL will be written later!

matchers = []
matchers << RegexMatcher.new(/(\s)+/, :whitespace, :ignore => true)
matchers << LiteralMatcher.new('+', :add)
matchers << LiteralMatcher.new('-', :subtract)
matchers << LiteralMatcher.new('*', :multiply)
matchers << LiteralMatcher.new('/', :divide)
matchers << LiteralMatcher.new('(', :left_paren)
matchers << LiteralMatcher.new(')', :right_paren)
matchers << RegexMatcher.new(/(\s)+/, :whitespace, :ignore => true, :)

lexer = Lexer.new("1 + 1 * 3.14159", matchers)

Well - I forgot about this post!


Since I started writing this I've now written a gem for lexical analysis with the pretty DSL I said could be done later.. now it looks like this:

ExpressionLexer = Lexr.that {
 ignores /\s+/ => :whitespace
 matches /[-+]?[0-9]*\.?[0-9]+/ => :number, :convert_with => lambda { |v| Float(v) }
 matches "+" => :addition
 matches "-" => :subtraction
 matches "*" => :multiplication
 matches "/" => :division
 matches "(" => :left_parenthesis
 matches ")" => :right_parenthesis
}

then can be used like this:

lexer = ExpressionLexer.new("1 * 12.5 / (55 + 2 - 56)")
until lexer.end?
 puts lexer.next
end

giving you an output of:

number(1.0)
multiplication(*)
number(12.5)
division(/)
left_parenthesis(()
number(55.0)
addition(+)
number(2.0)
subtraction(-)
number(56.0)
right_parenthesis())
end()

Ok, so this has kind of turned in to a shameless plug for my own gems.. Here is the details :)

Lexr - A simple but powerful lexical analyser for Ruby

There is also another gem I wrote that uses this called math_engine. It's a recursive decent parser which parses math expressions and gives you the result. It was a great experience writing these and I learnt a lot. Hopefully you can learn from them too.

No comments:

Post a Comment