Finding Top-n Emerging Sequences to Contrast Sequence Sets

  • Technical report TR07-03. Comparing groups or sets is the main focal issue in statistics, and data mining research has also focused on automatically identifying values and instances that differ significantly across groups, known as contrast sets. Whether traditional statistics or the work on contrast sets, the comparison is made on nominal data. There is very little work on contrasting sets of event sequences. In this paper we introduce the notion of emerging sequences; sequences that when taken from a set of sequences A and put in a set of sequences B would be considered an abnormal outcast in B and thus distinguishes the set A from the set B. We present approaches for finding such emerging sequences efficiently and introduce an algorithm for discovering the top most emerging sequences. | TRID-ID TR07-03

