Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

(Text Analysis) The availability of computers with string-manipulation capabilities has resulted in some rather interesting approaches to analyzing the writings of great authors. Much attention has been focused on whether William Shakespeare ever lived. Some scholars believe there is substantial evidence indicating that Christopher Marlowe or other authors actually penned the masterpieces attributed to Shakespeare. Researchers have used computers to find similarities in the writings of these two authors. This exercise examines three methods for analyzing texts with a computer. Note that thousands of texts, including Shakespeare, are available online at www.gutenberg.org. a. Write a program that reads several lines of text from the keyboard and prints a table indicating the number of occurrences of each letter of the alphabet in the text. For example, the phrase To be, or not to be: that is the question: contains one "a," two "b's," no "c's," etc. b. Write a program that reads several lines of text and prints a table indicating the number of one-letter words, two-letter words, threeletter words, etc., appearing in the text. For example, the phrase Whether 'tis nobler in the mind to suffer contains the following word lengths and occurrences:c. Write a program that reads several lines of text and prints a table indicating the number of occurrences of each different word in the text. The first version of your program should include the words in the table in the same order in which they appear in the text. For example, the lines To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer contain the words "to" three times, the word "be" two times, the word "or" once, etc. A more interesting (and useful) printout should then be attempted in which the words are sorted alphabetically.

Short Answer

Expert verified
Implement separate programs to count letter occurrences, word lengths, and unique words, then sort words alphabetically.

Step by step solution

01

Count Occurrences of Each Letter

Write a program that reads text input from a user, then iterate character by character to count each letter's occurrences. Use a dictionary where keys are alphabet letters and values are the counts of occurrences for each letter. Loop through each character, and if it's a letter, increment its count in the dictionary. Finally, display the count for each letter.
02

Count Word Lengths

Create a program to read the input text and split it into words. For each word, measure its length in letters. Use a dictionary where keys represent word lengths (e.g., 'one letter', 'two letters') and values are the counts of such words. Increment the count in the dictionary for each word's length. Print the result, showing how many words there are for each length.
03

Count Unique Words

Develop a program that processes the text input to extract each word. Use a dictionary where the keys are unique words and the values are their occurrence counts. Split the text by spaces and punctuation, convert each word to lowercase to ensure case insensitivity, and update the dictionary accordingly. List the words based on their appearance order, not alphabetically, in your initial printout.
04

Sort Words Alphabetically

Take the dictionary from Step 3 and sort the words alphabetically. Use Python's `sorted()` function to sort the dictionary keys (words) and then print the words along with their occurrence counts in sorted order. This will allow users to see frequency counts while easily finding words in the sorted table.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

String Manipulation
String manipulation refers to the process of changing, analyzing, or inspecting strings – sequences of characters. In text analysis programming, we often deal with tasks like transforming strings to lowercase to ensure uniformity, removing unwanted characters such as punctuation, or splitting strings into smaller parts, such as words or letters. These operations are foundational for analyzing textual data because they prepare the data for more complex processing tasks.

Common string manipulation techniques include:
  • **Changing Cases**: Conversion of text to upper or lower case to ensure consistency during comparison.
  • **Trimming**: Removal of white spaces from the beginning and end, which can help avoid errors in text analysis.
  • **Splitting**: Division of a string into parts using delimiters like spaces or commas, which is especially useful for counting words or evaluating text structure.
  • **Replacement**: Substituting certain text segments with others, such as removing punctuation or correcting spelling errors.
Using these techniques ensures accurate processing of text data, essential for tasks like letter frequency analysis or word counting.
Letter Frequency Analysis
Letter Frequency Analysis involves counting how often each letter appears within a given text, which can provide critical insights into the text's characteristics. For example, certain authors might have unique linguistic signatures that can be identified through this type of analysis.

The process usually follows these steps:
  • **Initialization**: Create a dictionary with letters of the alphabet as keys and initialized counts as values.
  • **Iteration**: Loop through each character in the text. If the character is a letter, increase its count in the dictionary.
  • **Normalization**: Convert all text to the same case (usually lowercase) to ensure that the frequency count is case-insensitive.
  • **Result Display**: Print the frequency of each letter, which can be used for further statistical or pattern analysis.
This type of analysis is foundational in cryptography and historical research, aiding in tasks like deciphering codes or investigating authorship questions.
Word Count Programming
Word count programming automates the process of counting words in text, which is a frequently required data point in many text analysis scenarios. By counting the occurrences of word lengths, we can better understand the complexity of the language used.

The general process involves:
  • **Reading and Splitting**: Start by reading the input text and splitting it into a list of words.
  • **Length Counting**: For each word, calculate the number of letters it contains. Use a dictionary where keys represent the word lengths, and their corresponding values indicate how many times those lengths occur.
  • **Outputting Results**: Display the word length counts, which provides insight into the writing style, such as whether an author favors longer, more complex words or shorter, simpler ones.
Word count programming is used in numerous applications, from simple content length verification in writing software to complex linguistic research.
Unique Word Counting
Unique Word Counting focuses on identifying and tallying distinct words in text, which helps examine text complexity and diversity. This form of analysis considers both the variety of words and their frequency to offer insights into linguistic richness.

Here's a typical approach to counting unique words:
  • **Data Cleansing**: Initially, process the text to remove punctuation and convert each word to lowercase, ensuring uniformity.
  • **Extraction and Counting**: Split the text into words and, as you iterate through them, add each unique word to a dictionary with its occurrence count.
  • **Initial and Sorted Output**: First, print words in the order they appear to offer an unaltered overview of word usage. Additionally, sort these words alphabetically to provide an organized and easy-to-reference list of word frequencies.
Unique word counting is essential in both computational linguistics and text analytics, facilitating applications like keyword extraction, author profiling, and sentiment analysis.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

(Quicksort) You have previously seen the sorting techniques of the bucket sort and selection sort. We now present the recursive sorting technique called Quicksort. The basic algorithm for a single-subscripted array of values is as follows: a. Partitioning Step: Take the first element of the unsorted array and determine its final location in the sorted array (i.e., all values to the left of the element in the array are less than the element, and all values to the right of the element in the array are greater than the element). We now have one element in its proper location and two unsorted subarrays. b. Recursive Step: Perform Step 1 on each unsorted subarray. Each time Step 1 is performed on a subarray, another element is placed in its final location of the sorted array, and two unsorted subarrays are created. When a subarray consists of one element, that subarray must be sorted; therefore, that element is in its final location. The basic algorithm seems simple enough, but how do we determine the final position of the first element of each subarray? As an example, consider the following set of values (the element in bold is the partitioning elementit will be placed in its final location in the sorted array): 37 2 6 4 89 8 10 12 68 45 a. Starting from the rightmost element of the array, compare each element with 37 until an element less than 37 is found. Then swap 37 and that element. The first element less than 37 is 12, so 37 and 12 are swapped. The values now reside in the array as follows: 12 2 6 4 89 8 10 37 68 45 Starting from the left of the array, but beginning with the element after 12, compare each element with 37 until an element greater than 37 is found. Then swap 37 and that element. The first element greater than 37 is 89, so 37 and 89 are swapped. The values now reside in the array as follows: 12 2 6 4 37 8 10 89 68 45 Starting from the right, but beginning with the element before 89, compare each element with 37 until an element less than 37 is found. Then swap 37 and that element. The first element less than 37 is 10, so 37 and 10 are swapped. The values now reside in the array as follows: 12 2 6 4 10 8 37 89 68 45 Starting from the left, but beginning with the element after 10, compare each element with 37 until an element greater than 37 is found. Then swap 37 and that element. There are no more elements greater than 37, so when we compare 37 with itself, we know that 37 has been placed in its final location of the sorted array. Once the partition has been applied to the array, there are two unsorted subarrays. The subarray with values less than 37 contains 12, 2, 6, 4, 10 and 8. The subarray with values greater than 37 contains 89, 68 and 45. The sort continues with both subarrays being partitioned in the same manner as the original array. Based on the preceding discussion, write recursive function quickSort to sort a single subscripted integer array. The function should receive as arguments an integer array, a starting subscript and an ending subscript. Function partition should be called by quickSort to perform the partitioning step

For each of the following, write a single statement that performs the specified task. Assume that long integer variables value1 and value2 have been declared and value1 has been initialized to 200000. a. Declare the variable longPtr to be a pointer to an object of type long. b. Assign the address of variable value1 to pointer variable longPtr. c. Print the value of the object pointed to by longPtr. d. Assign the value of the object pointed to by longPtr to variable value2. e. Print the value of value2. f. Print the address of value1. g. Print the address stored in longPtr. Is the value printed the same as value1's address

Write a program that encodes English language phrases into pig Latin. Pig Latin is a form of coded language often used for amusement. Many variations exist in the methods used to form pig Latin phrases. For simplicity, use the following algorithm: To form a pig-Latin phrase from an English-language phrase, tokenize the phrase into words with function strtok. To translate each English word into a pig-Latin word, place the first letter of the English word at the end of the English word and add the letters ay." Thus, the word "jump" becomes "umpjay," the word "the" becomes "hetay" and the word "computer" becomes "omputercay." Blanks between words remain as blanks. Assume that the English phrase consists of words separated by blanks, there are no punctuation marks and all words have two or more letters. Function printLatinword should display each word. [Hint: Each time a token is found in a call to strtok, pass the token pointer to function printLatinword and print the pig-Latin word.]

Perform the task specified by each of the following statements: a. Write the function header for function zero that takes a long integer array parameter bigIntegers and does not return a value. b. Write the function prototype for the function in part (a). c. Write the function header for function add1AndSum that takes an integer array parameter oneTooSmall and returns an integer. d. Write the function prototype for the function described in part (c)

Write a program that uses function strncmp to compare two strings input by the user. The program should input the number of characters to compare. The program should state whether the first string is less than, equal to or greater than the second string. Write a program that uses random number generation to create sentences. The program should use four arrays of pointers to char called article, noun, verb and preposition. The program should create a sentence by selecting a word at random from each array in the following order: article, noun, verb, preposition, article and noun. As each word is picked, it should be concatenated to the previous words in an array that is large enough to hold the entire sentence. The words should be separated by spaces. When the final sentence is output, it should start with a capital letter and end with a period. The program should generate 20 such sentences.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free