Over the past several months I've lent myself to perusing various materials related to parsing, lexical analysis, and the-like. It has been an on-off subject for some time, but I started playing with writing my own PEG grammars using a pretty unchallenging utility called PEG.js.
Unfortunately I've hit a snag in my quite nubishly written grammar, and I'm having trouble getting around it. Understandably, most solutions I've seen default to developing context-based implementations, developing multiple grammars for various contexts that can be identified and employed using business logic. However, I wanted to avoid that as much as possible and get as close to a solid context-free grammar that provided an intuitive JSON result for easy traversing.
The original snippet is here, but I'll post below (because everyone hates broken links in forum posts when they click on a search result months or years down the road )
The Problem
The following code produces a very intuitive AST:
Code:
However, the parser chokes on this:
Code:
The major issue currently plaguing me right now is redirection when it begins with an integer. I'm aware of why; the problem is that I'm having trouble rethinking some of the definitions to help the grammar understand that the '2' is associated with the redirection operation.
It produces the following:
Code:
It should produce this:
Code:
You can copy the below grammar into the Parser Generator to test it. Paste the grammar into the left text area, then write a single-line shell command in the top right box; the resulting AST will be displayed in the grey box in the bottom right corner of the page
PEG.js Parser Generator -- Used For Testing Grammars
Grammar
Code:
Unfortunately I've hit a snag in my quite nubishly written grammar, and I'm having trouble getting around it. Understandably, most solutions I've seen default to developing context-based implementations, developing multiple grammars for various contexts that can be identified and employed using business logic. However, I wanted to avoid that as much as possible and get as close to a solid context-free grammar that provided an intuitive JSON result for easy traversing.
The original snippet is here, but I'll post below (because everyone hates broken links in forum posts when they click on a search result months or years down the road )
The Problem
The following code produces a very intuitive AST:
Code:
./quux.sh foobar --baz="hello world"
However, the parser chokes on this:
Code:
./appleSauce.sh 2&>1
The major issue currently plaguing me right now is redirection when it begins with an integer. I'm aware of why; the problem is that I'm having trouble rethinking some of the definitions to help the grammar understand that the '2' is associated with the redirection operation.
It produces the following:
Code:
[
{
"cmd": "./appleSauce.sh",
"args": [
2
],
"div": {
"redirect": "&>",
"addend": 1
}
}
]
It should produce this:
Code:
[
{
"cmd": "./appleSauce.sh",
"args": [],
"div": {
"redirect": "&>",
"addend": 1,
"augend": 2
}
}
]
You can copy the below grammar into the Parser Generator to test it. Paste the grammar into the left text area, then write a single-line shell command in the top right box; the resulting AST will be displayed in the grey box in the bottom right corner of the page
PEG.js Parser Generator -- Used For Testing Grammars
Grammar
Code:
commands "command"
= cmd:commandText* { return cmd; }
commandText "command"
= ws identifier:identifier args:arg* nbws div:cmdDivider ws { return {
cmd: identifier,
args: args,
div: div
};}
cmdDivider "command separator"
= redir:redirection? { return redir; }
nbws "whitespace" = [ \t]*
ws "whitespace" = [ \t\n\r]*
rws "whitespace" = [ \t\n\r]+
redirection "redirection"
= ";" { return ";" }
/ "|" { return "|" }
/ dig:number op:">&" dig2:number { return { redirect: op, augend: dig, addend: dig2 } }
/ dig:number op:">&" add:"-" { return { redirect: op, augend: dig, addend: add } }
/ dig:number op:"<&" dig2:number { return { redirect: op, addend: dig2, augend: dig } }
/ dig:number op:"<&" add:"-" { return { redirect: op, augend: dig, addend: add } }
/ dig:number op:">>" { return { redirect: op, augend: dig } }
/ dig:number op:">&" { return { redirect: op, augend: dig } }
/ dig:number op:">|" { return { redirect: op, augend: dig } }
/ dig:number op:">" { return { redirect: op, augend: dig } }
/ dig:number op:"<<-" { return { redirect: op, augend: dig } }
/ dig:number op:"<<" { return { redirect: op, augend: dig } }
/ dig:number op:"<&" { return { redirect: op, augend: dig } }
/ dig:number op:"<>" { return { redirect: op, augend: dig } }
/ dig:number op:"<" { return { redirect: op, augend: dig } }
/ op:">&" dig:number { return { redirect: op, addend: dig } }
/ op:">&" add:"-" { return { redirect: op, addend: add } }
/ op:"<&" dig:number { return { redirect: op, addend: dig } }
/ op:"<&" add:"-" { return { redirect: op, addend: add } }
/ op:">>" { return { redirect: op } }
/ op:">&" { return { redirect: op } }
/ op:">|" { return { redirect: op } }
/ op:">" { return { redirect: op } }
/ op:"<<-" { return { redirect: op } }
/ op:"<<" { return { redirect: op } }
/ op:"<&" { return { redirect: op } }
/ op:"<>" { return { redirect: op } }
/ op:"<" { return { redirect: op } }
/ op:"&>" dig:number { return { redirect: op, addend: dig } }
/ op:"&>" { return { redirect: op } }
// ----- 3. Values -----
arg "argument"
= nbws val:(value/variable) { return val; }
value "value"
= false
/ null
/ true
/ number
/ string
/ identifier
false = "false" { return false; }
null = "null" { return null; }
true = "true" { return true; }
variable "variable"
= "-"+ varname:variablename ws val:variablesetter? { return { name: varname, value: val }; }
variablename "variable"
= [A-Za-z][A-Za-z0-9_-]* { return text(); }
variablesetter "variable assignment"
= "="? ws val:value { return val; }
// ----- 6. Numbers -----
number "number"
= minus? int frac? exp? { return parseFloat(text()); }
decimal_point "decimal point"
= "."
digit1_9 "digit"
= [1-9]
e "e"
= [eE]
exp "expression"
= e (minus / plus)? DIGIT+
frac "fraction"
= decimal_point DIGIT+
int "integer"
= zero / (digit1_9 DIGIT*)
minus "minus"
= "-"
plus "plus"
= "+"
zero "zero"
= "0"
// ----- 7. Strings -----
identifier "identifier"
= [~./A-Za-z][A-Za-z0-9_./~-]* { return text(); }
string "string"
= quotation_mark chars:char* quotation_mark { return chars.join(""); }
char "character"
= unescaped
/ escape
sequence:(
'"'
/ "\\"
/ "/"
/ "b" { return "\b"; }
/ "f" { return "\f"; }
/ "n" { return "\n"; }
/ "r" { return "\r"; }
/ "t" { return "\t"; }
/ "u" digits:$(HEXDIG HEXDIG HEXDIG HEXDIG) {
return String.fromCharCode(parseInt(digits, 16));
}
)
{ return sequence; }
escape "escape"
= "\\"
quotation_mark "quotation mark"
= '"'
unescaped "unescaped character"
= [^\0-\x1F\x22\x5C]
// ----- Core ABNF Rules -----
// See RFC 4234, Appendix B (http://tools.ietf.org/html/rfc4234).
DIGIT "digit" = [0-9]
HEXDIG "hexadecimal digit" = [0-9a-f]i