phrasplit Documentation

A Python library for splitting text into sentences, clauses, or paragraphs using spaCy NLP. Designed for audiobook creation and text-to-speech processing.

Features

  • Sentence splitting: Intelligent sentence boundary detection using spaCy

  • Clause splitting: Split sentences at commas for natural pause points

  • Paragraph splitting: Split text at double newlines

  • Long line splitting: Break long lines at sentence/clause boundaries

  • Abbreviation handling: Correctly handles Mr., Dr., U.S.A., etc.

  • Ellipsis support: Preserves ellipses without incorrect splitting

Installation

Install phrasplit using pip:

pip install phrasplit

You’ll also need to download a spaCy language model:

python -m spacy download en_core_web_sm

Quick Start

from phrasplit import split_sentences, split_clauses, split_paragraphs

# Split text into sentences
text = "Dr. Smith is here. She has a Ph.D. in Chemistry."
sentences = split_sentences(text)
# ['Dr. Smith is here.', 'She has a Ph.D. in Chemistry.']

# Split sentences into comma-separated parts
text = "I like coffee, and I like tea."
clauses = split_clauses(text)
# ['I like coffee,', 'and I like tea.']

# Split text into paragraphs
text = "First paragraph.\n\nSecond paragraph."
paragraphs = split_paragraphs(text)
# ['First paragraph.', 'Second paragraph.']

Table of Contents

Indices and tables