The cross entropy measures the average number of bits needed to identify an event drawn from the set if another distribution is assumed; the KL divergence measures the difference between the two probability distributions, or, better, the information gained when the priors $q$ are revised in light of posteriors $p$ (in other words, the amount of information lost when $q$ is used instead of $p$). It is defined as