phrasplit Documentation
A Python library for splitting text into sentences, clauses, or paragraphs using spaCy NLP. Designed for audiobook creation and text-to-speech processing.
Features
Sentence splitting: Intelligent sentence boundary detection using spaCy
Clause splitting: Split sentences at commas for natural pause points
Paragraph splitting: Split text at double newlines
Long line splitting: Break long lines at sentence/clause boundaries
Abbreviation handling: Correctly handles Mr., Dr., U.S.A., etc.
Ellipsis support: Preserves ellipses without incorrect splitting
Installation
Install phrasplit using pip:
pip install phrasplit
You’ll also need to download a spaCy language model:
python -m spacy download en_core_web_sm
Quick Start
from phrasplit import split_sentences, split_clauses, split_paragraphs
# Split text into sentences
text = "Dr. Smith is here. She has a Ph.D. in Chemistry."
sentences = split_sentences(text)
# ['Dr. Smith is here.', 'She has a Ph.D. in Chemistry.']
# Split sentences into comma-separated parts
text = "I like coffee, and I like tea."
clauses = split_clauses(text)
# ['I like coffee,', 'and I like tea.']
# Split text into paragraphs
text = "First paragraph.\n\nSecond paragraph."
paragraphs = split_paragraphs(text)
# ['First paragraph.', 'Second paragraph.']