What is Context free grammar?
A Context-Free Grammar (CFG) is a set of rules that define what is and is not a valid sequence of “words” of a language. The rules aren’t limited to the production of text and can be applied in all fields where a fixed number of rules structures the production of the work. CFG is not sufficient however for generating meaningful natural language, for the reason that they have no context. But they generate grammatically correct sentences.
In a next post I’ll combine them with other mythologies to overcame this weakness.
How Context free grammars work?
In CFG language is generated via production rules. Production rules are in the form:
V -> w
Where V is a single “non-terminal” which mean that it needs to be defined by production rule. And w is a sequence of “terminals” which mean that they don’t need to be defined by production rules. It is also often the case where w contain “non terminal” nodes.
for example
s -> a b
in this rule s , a and b are non-terminals because they need to be defined further.
a -> 'the'
b -> c 'dog'|'fox'
now a is a terminal as it don’t need any further definition while b still need to be defined by specifying c. lets define c:
c -> 'lazy' | 'quick'
The pipe is an ‘or‘ meaning that we can use either values.
Now the rule a->a b can generate the following sentences:
the lazy dog
the lazy fox
the quick fox
the quick dog
each sentence having 25% of chance of being generated.
A simple implementation
The best way toi start using CFG is to use an already developed library. But in this section we will build a simple engine using python.
Let’s first encode the production rules. We will use a dictionary to for this purpose.
d = {} d["s"]=[["a", "b"]] d["a"]=[["the"]] d["b"]=[["c", "d"]] d["c"]=[["lazy"], ["quick"]] d["d"]=[["dog"], ["fox"]]
Note how the rules are encoded as array of arrays, in this way we can encode the “non-terminal” rules and the various options for “terminal” nodes.
expansion = [] seed = "s" def expand(start, expansion): if start in d: #Grab one possible expansion possible = d[start] random_expansion = random.choice(possible) # Call this method again with the current element for i in range(len(random_expansion)): expand(random_expansion[i], expansion); # if the rule wasn't found, then it's a terminal: # append the string to the expansion else: expansion.append(start)</pre>
You can see that the code is quite simple which shows the power of the simple idea of CFG.
The function receives both an array and the seed and calls itself recursively if a rule that matches that seed String is present. Note that there is no specific encoding for the non-terminal nodes. In this simple code every string that match a key in the dictionary is considered as “non-terminal”.
To test the program run it using the seed.
expand(seed, expansion) print(expansion) expansion = [] expand(seed, expansion) print(expansion)
Every run will give a random sentence generated by the rule, so you may need to run several times to generate other sentences.
This website was… how do I say it? Relevant!! Finally I’ve found something that helped me. Thank you!
An impressive share! I’ve just forwarded this onto a colleague who had been conducting a little homework on this. And he actually ordered me dinner because I stumbled upon it for him… lol. So allow me to reword this…. Thanks for the meal!! But yeah, thanx for spending some time to talk about this matter here on your website.
Oh my goodness! Amazing article dude! Thank you so much, However I am encountering troubles with your RSS. I don’t understand why I can’t join it. Is there anybody having similar RSS problems? Anyone who knows the solution will you kindly respond? Thanks!!
Hi just wanted to give you a quick heads up and let you know a few of the images aren’t loading properly. I’m not sure why but I think its a linking issue. I’ve tried it in two different browsers and both show the same results.
Hi, Can’t see right now. Are the images that do not load form this post?
I really appreciate your writing, this is great stuff! I am going to bookmark your site and keep checking for new information.