## Introduction

I've always been interested by compilers and specifically the effort that goes in to processing a string of code in to something a computer can understand. The most complete but simple example is a math expression parser.At first thought, it doesn't sound all that complicated, but giving it a bit more thought, you can't just split an expression like "1 + 2 * (6 / 2)" on spaces and then work out what each part is, what if they enter "5+5"?

Then you think of actually evaluating the expression... operator precedence, processing the deepest expression parts first, etc.. also, how do you even know it's valid? you may be able to enter "+ + 2 4 ( hello 1" but you would have trouble evaluating that!

While writing these posts, I've learnt the basic methods for translating and evaluating mathematical expressions and implemented it as a ruby application. Please let me know if there are any improvements I can make, as I'm always looking to improve things :)

## Lexical Analysis

To make this easier and cleaner, evaluation is split in to 2 phases, Lexical analysis is the first phase, taking the string and converting it in to a stream of symbols which are fed in to the second phase, the parser. The lexer should also raise errors indicating invalid characters and anything it can't find a symbol for.As an example "1 + 1 * 3.14159", it would create a stream of symbols like: IntegerLiteral(1) AdditionOperator IntegerLiteral(1) MultiplicationOperator FloatLiteral(3.14159)

The method I chose was to hold a list of "matchers" that basically match a symbol at the current position in the input string.

So it'd start out with "1 + 1 * 3.14159" and the first request for a token would return IntegerLiteral(1) using a simple regular expression like: /^[0-9]*/. return IntegerLiteral and advance the string the number of characters that were matched.

now we are at " + 1 * 3.14159" a regular expression could match /\s+/ and just skip it because we aren't interested in that. it'll still advance the string the number of whitespace characters it matched.

We get to the plus, a simple literal check will match this so if code[position] is "+" then add one to the position and and return a AdditionOperator.

And so on

I have implemented this in a generic a way as possible to allow for other symbols and matchers to be lexed without changing the code code.

class Lexer def initialize(str, matchers) @str = str @position = 0 @matchers = matchers end def consume return Symbol.new(nil, 0, :end) if @position >= @str.length part = @str[@position..-1] @matchers.each do |m| result = m.match(part) if result @position += result.length return self.send :next if m.ignore? return result end end raise "Unable to lex: #{part}" end end

To create a Lexer class that would give us the correct symbols for a math expression, you just create a matcher for each type of symbol. This code isn't very pretty, but some nice DSL will be written later!

matchers = [] matchers << RegexMatcher.new(/(\s)+/, :whitespace, :ignore => true) matchers << LiteralMatcher.new('+', :add) matchers << LiteralMatcher.new('-', :subtract) matchers << LiteralMatcher.new('*', :multiply) matchers << LiteralMatcher.new('/', :divide) matchers << LiteralMatcher.new('(', :left_paren) matchers << LiteralMatcher.new(')', :right_paren) matchers << RegexMatcher.new(/(\s)+/, :whitespace, :ignore => true, :) lexer = Lexer.new("1 + 1 * 3.14159", matchers)

## Well - I forgot about this post!

Since I started writing this I've now written a gem for lexical analysis with the pretty DSL I said could be done later.. now it looks like this:

ExpressionLexer = Lexr.that { ignores /\s+/ => :whitespace matches /[-+]?[0-9]*\.?[0-9]+/ => :number, :convert_with => lambda { |v| Float(v) } matches "+" => :addition matches "-" => :subtraction matches "*" => :multiplication matches "/" => :division matches "(" => :left_parenthesis matches ")" => :right_parenthesis }

then can be used like this:

lexer = ExpressionLexer.new("1 * 12.5 / (55 + 2 - 56)") until lexer.end? puts lexer.next end

giving you an output of:

number(1.0) multiplication(*) number(12.5) division(/) left_parenthesis(() number(55.0) addition(+) number(2.0) subtraction(-) number(56.0) right_parenthesis()) end()

Ok, so this has kind of turned in to a shameless plug for my own gems.. Here is the details :)

Lexr - A simple but powerful lexical analyser for Ruby

There is also another gem I wrote that uses this called math_engine. It's a recursive decent parser which parses math expressions and gives you the result. It was a great experience writing these and I learnt a lot. Hopefully you can learn from them too.

## No comments:

## Post a Comment