Understandingtext processing using formal definitions, Lecture notes of Computer Communication Systems

Key Activities in Text Processing: Parsing: Breaking down text into smaller components (e.g., words, sentences). Text Cleaning: Removing unwanted characters, symbols, and formatting errors. Word Extraction: Identifying and extracting individual words from the text. Text Normalization: Converting text to a standard format (e.g., lowercase, stemming). Text Transformation: Converting text into a different structure, like summaries or structured data. Natural Language Processing (NLP): Analyzing and understanding human language (e.g., sentiment analysis, translation).

Typology: Lecture notes

2021/2022

Available from 12/29/2024

gret-lee
gret-lee 🇸🇬

2 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Text processing
Understanding text processing using formal definitions
Submitted by:
M Essa Khan S022
M Tayyab S016
M Sohail S025
Ali Shair S048
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Understandingtext processing using formal definitions and more Lecture notes Computer Communication Systems in PDF only on Docsity!

Text processing

Understanding text processing using formal definitions

Submitted by:

M Essa Khan S M Tayyab S M Sohail S Ali Shair S

Introduction Text processing refers to the manipulation, analysis, and transformation of text data to extract meaningful information or to prepare the text for further analysis. Key Activities in Text Processing:

  • Parsing : Breaking down text into smaller components (e.g., words, sentences).
  • Text Cleaning : Removing unwanted characters, symbols, and formatting errors.
  • Word Extraction : Identifying and extracting individual words from the text.
  • Text Normalization : Converting text to a standard format (e.g., lowercase, stemming).
  • Text Transformation : Converting text into a different structure, like summaries or structured data.
  • Natural Language Processing (NLP) : Analyzing and understanding human language (e.g., sentiment analysis, translation).

Breaking A Text Into Words Key Definitions

  • CHAR : A character in the text (e.g., letters, punctuation). [CHAR]
  • blank : A sequence of blank characters, such as spaces, line breaks, and tabs. | blank : ℙCHAR
  • TEXT : A sequence of characters. This can be empty. TEXT ::= seqCHAR
  • SPACE : A non-empty sequence of blank characters. SPACE ::= seqblank (non-empty)
  • WORD : A sequence of characters that are not blanks. WORD ::= seqCHAR \ blank

The words Function The words function takes a TEXT as input and returns a sequence of WORDS.

Recursive Case BreakdownCase 1: If the TEXT starts with a space:

words (s ⁀ r) = words r ∧

  • Discard the leading space and recursively process the rest of the text. ❑ Case 2: If the TEXT ends with a space:

words (l ⁀s) = words l ∧

  • Discard the trailing space and recursively process the beginning part. ❑ Case 3: If the TEXT has spaces between parts: words (l ⁀s ⁀ r) = (words I) ⁀ (words r)
  • Split the text at the space and recursively process both parts, combining the results.

Example For the input: words(H, o, w, " ", a, r, e, " ", y, o, u, "?")

  • Split into: How, are, you.
  • Ignore spaces.
  • Output: {How,are,you}

Words Counting Introduction: The utility of WC is a function that takes a text file as input and outputs a tuple of three numbers.

  • Number of lines in the file.
  • Number of words in the file
  • Number of characters in the file.

Z-Schema

Applications of Word Counting:

  • Text Analysis and Processing
  • Programming and Development
  • Publishing and Media
  • File Management Why It Matters: Word counting ensures clarity, efficiency, and compliance in communication, while supporting analysis, optimization, and fairness across various domains like writing, technology, and data processing.