SubIMDB is a structured corpus of subtitles that captures everyday language.

It includes 38,102 subtitles extracted with the help of OpenSubtitles and IMDb.

It contains 225,847,810 words in 38,643,849 lines.

It has been used in state-of-the-art solutions to various tasks, such as:

Lexical Simplification
Complex Word Identification
Psycholinguistic Word Feature Inference