Here's a really simple example to show why you'd want to do this:
(1) if X > 2 then ...
(2) if X > 5 then ...
Now we can create two "dummy variables" from these rules, which are set to 1 if the rule matches, else set to 0:
(1) if X > 2 then V_1=1, else V_1=0
(2) if X > 5 then V_2=1, else V_2=0
The problem with this is that the two dummy variables are highly correlated and any iterative statistical and/or machine learning algorithm will take way more iterations to converge because of this (ie: it's much harder for an algorithm to set the weights for the two dummy variables independently, as they have to wait for one weight to be close to optimal before working on the next weight and so on...).
So the idea is to decorrelate the dummy variables like so:
By using the fact that "(1) being true implies (2) being true", we get:
(1) if X > 2 AND NOT X > 5 then V_1=1, else V_1=0
(2) if X > 5 AND then V_2=1, else V_2=0
So now you have two decorrelated dummy variables which carry the same information.
It's actually much simpler than having to add in a load of "NOT AND" rules like above, eg:
(A) being true implies (B), (C), ..., being true
simply means that if (A) is true you set all the dummy variables for (B), (C), ..., to zero (assuming you don't have cyclic implications, which I don't think will occur here anyway).
This is the simplest example I can think of, but in practice there will be 1000's of rules, using 100's of variables, with rule-depths ranging from a single comparison to a few 100 comparisons.
Hope this makes sense.
Juk