SubIMDB is a structured corpus of subtitles that captures everyday language.
It includes
38,102 subtitles extracted with the help of
OpenSubtitles and
IMDb.
It contains
225,847,810 words in
38,643,849 lines.
It has been used in state-of-the-art solutions to various tasks, such as:
- Lexical Simplification
- Complex Word Identification
- Psycholinguistic Word Feature Inference