Outliers

This topic explains why chapters may be marked as outliers and how to understand the outlier score.

After you run an experiment and your text classes were analyzed, the expected accuracy and the associated outliers are provided. Outliers are chapters that Tiberias did not correctly classify during testing of its classification model. This topic explains why chapters may be marked as outliers and how to understand the outlier score.

When Tiberias analyzes the text classes you defined, it builds a classification model in which weights are assigned to every feature in the text classes. Tiberias then engages in a rigorous internal process of testing this model known as cross-validation. (Learn more about cross validation.)

Expected accuracy scores indicate the following:

Expected accuracy = 100%: During testing, Tiberias consistently managed to accurately classify all of the chapters in all of the text classes you defined.
Expected accuracy < 100%: There were chapters in the text classes that Tiberias misclassified during testing.

Outlier scores indicate the following:

100% – the chapter was misclassified in all of the cross validation tests.
50% – the chapter was misclassified in half of the cross validation tests.

This information can help you refine the definitions of the text classes that you want to investigate. It could also assist you to develop hypotheses about a chapter and its relationship to the other chapters of the text class you defined.