API Reference ============= This page contains the complete API reference for phrasplit. Main Functions -------------- .. module:: phrasplit split_sentences ^^^^^^^^^^^^^^^ .. autofunction:: phrasplit.split_sentences **Example:** .. code-block:: python from phrasplit import split_sentences text = "Dr. Smith is here. She has a Ph.D. in Chemistry." sentences = split_sentences(text) # ['Dr. Smith is here.', 'She has a Ph.D. in Chemistry.'] # Disable colon splitting text = "Note: This is important." sentences = split_sentences(text, split_on_colon=False) # ['Note: This is important.'] split_clauses ^^^^^^^^^^^^^ .. autofunction:: phrasplit.split_clauses **Example:** .. code-block:: python from phrasplit import split_clauses text = "I like coffee, and I like tea." clauses = split_clauses(text) # ['I like coffee,', 'and I like tea.'] split_paragraphs ^^^^^^^^^^^^^^^^ .. autofunction:: phrasplit.split_paragraphs **Example:** .. code-block:: python from phrasplit import split_paragraphs text = "First paragraph.\n\nSecond paragraph." paragraphs = split_paragraphs(text) # ['First paragraph.', 'Second paragraph.'] split_text ^^^^^^^^^^ .. autofunction:: phrasplit.split_text **Example:** .. code-block:: python from phrasplit import split_text, Segment text = "First sentence. Second sentence.\n\nNew paragraph." segments = split_text(text, mode="sentence") for seg in segments: print(f"P{seg.paragraph} S{seg.sentence}: {seg.text}") # P0 S0: First sentence. # P0 S1: Second sentence. # P1 S0: New paragraph. # Clause mode for finer granularity text = "Hello, world.\n\nGoodbye, friend." segments = split_text(text, mode="clause") # Returns clauses with paragraph and sentence indices split_long_lines ^^^^^^^^^^^^^^^^ .. autofunction:: phrasplit.split_long_lines **Example:** .. code-block:: python from phrasplit import split_long_lines text = "This is a very long sentence that needs to be split into smaller parts." lines = split_long_lines(text, max_length=40) Data Types ---------- Segment ^^^^^^^ .. autoclass:: phrasplit.Segment :members: :undoc-members: A named tuple representing a text segment with position information. **Fields:** - ``text`` (str): The text content of the segment - ``paragraph`` (int): Paragraph index (0-based) within the document - ``sentence`` (int | None): Sentence index (0-based) within the paragraph. None for paragraph mode. **Example:** .. code-block:: python from phrasplit import split_text, Segment segments = split_text("Hello world.", mode="sentence") seg = segments[0] # Access by name print(seg.text) # "Hello world." print(seg.paragraph) # 0 print(seg.sentence) # 0 # Access by index print(seg[0]) # "Hello world." print(seg[1]) # 0 print(seg[2]) # 0 # Unpack text, para, sent = seg Module Contents --------------- splitter module ^^^^^^^^^^^^^^^ .. automodule:: phrasplit.splitter :members: :undoc-members: :show-inheritance: :exclude-members: _get_nlp, _protect_ellipsis, _restore_ellipsis, _split_sentence_into_clauses, _split_at_clauses, _hard_split, _split_at_boundaries Type Information ---------------- phrasplit is fully typed and includes a ``py.typed`` marker file for PEP 561 compliance. You can use it with mypy and other type checkers. Function signatures: .. code-block:: python from typing import NamedTuple class Segment(NamedTuple): text: str paragraph: int sentence: int | None = None def split_sentences( text: str, language_model: str = "en_core_web_sm", apply_corrections: bool = True, split_on_colon: bool = True, ) -> list[str]: ... def split_clauses( text: str, language_model: str = "en_core_web_sm", ) -> list[str]: ... def split_paragraphs(text: str) -> list[str]: ... def split_text( text: str, mode: str = "sentence", language_model: str = "en_core_web_sm", apply_corrections: bool = True, split_on_colon: bool = True, ) -> list[Segment]: ... def split_long_lines( text: str, max_length: int, language_model: str = "en_core_web_sm", ) -> list[str]: ...