Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Suppose we have a compression function \(c\), which takes a bit string \(s\) to a compressed string \(c(s)\). (a) Show that for any integer \(N\) there must be a string \(s\) of length \(N\) for which length \((c(s)) \geq N\); that is, no effective compression is done. (b) Compress some already compressed files (try compressing with the same utility several times in sequence). What happens to the file size? (c) Given a compression function \(c\) as in (a), give a function \(c^{\prime}\) such that for all bit strings \(s\), length \(\left(c^{\prime}(s)\right) \leq \min (\) length \((c(s))\), length \((s))+1\); that is, in the worst case, compression with \(c^{\prime}\) expands the size by only 1 bit.

Short Answer

Expert verified
(a) There must be a string of length \(N\) such that length \((c(s)) \geq N\). (b) Compressing an already compressed file does not significantly reduce its size further and may increase it. (c) \(c^{\text{prime}}(s)\) can be defined to ensure compression results in a size at most \(min(length(c(s)), length(s)) + 1\).

Step by step solution

01

Part (a): Proof Setup

To show that for any integer \(N\), there exists a string \(s\) of length \(N\) such that length \((c(s)) \geq N\), assume that the function \(c\) compresses every string of length \(N\) effectively.
02

Part (a): Pigeonhole Principle

Using the pigeonhole principle, a compression function \(c\) that maps every string of length \(N\) to a string of length less than \(N\) would require too few compressed strings to accommodate all the original strings, leading to \(c\) being not injective (one-to-one), which is a contradiction.
03

Part (a): Conclusion

Since not all strings can be compressed to a length smaller than \(N\), there must be at least one string of length \(N\) for which no compression is done, i.e., length \((c(s)) \geq N\).
04

Part (b): Experiment with Compression Algorithms

Take a file and compress it using a popular compression utility (e.g., ZIP). Compress the already compressed file again using the same utility. Observe the file size after each round of compression.
05

Part (b): Observation

After the first compression, the file size will be reduced. However, subsequent compressions generally result in little to no additional size reduction, and in some cases, may even increase the file size slightly.
06

Part (b): Explanation

This happens because the compressed file is already optimized, and further attempts at compression introduce overhead or metadata, which might increase the file size rather than decreasing it.
07

Part (c): Define the Function \(c^{\text{prime}}\)

Define the function \(c^{\text{prime}}(s)\) as follows: if length \((c(s)) \leq\) length \((s)\), then \(c^{\text{prime}}(s) = c(s)\). Else, add one extra bit to signify that the file was not compressed: \(c^{\text{prime}}(s) = s\text{ concatenated with a single bit}\).
08

Part (c): Verification

Check that \(c^{\text{prime}}\) satisfies length constraints. If length \((c(s)) \leq\) length \((s)\), length \((c^{\text{prime}}(s))\) equals length \((c(s))\). If length \((c(s)) >\) length \((s)\), length \((c^{\text{prime}}(s))\) is length \((s)+1\), since we added one bit.
09

Part (c): Conclusion

Thus, the function \(c^{\text{prime}}(s)\) ensures that the length of the compressed string is at most min(length \((c(s))\), length \((s)) + 1\), meeting the requirement of the exercise.

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

pigeonhole principle
The pigeonhole principle is a fundamental concept in mathematics and computer science. It states that if you have more pigeons than pigeonholes and each pigeon needs a place, at least one pigeonhole must contain more than one pigeon. Let's relate this to compression.
When you try to compress every possible string of a given length to a shorter length, there aren't enough shorter strings to go around. For example, if you have 8-bit strings (like '10101010'), there are 256 of these. If each must be compressed to fewer than 8 bits, you'd need more than 256 different shorter strings! This is impossible, meaning some strings won't compress effectively. This is why, according to the pigeonhole principle, there will always be at least one string that does not become shorter when compressed.
compression algorithms
Compression algorithms are techniques and methods used to reduce the size of data. They work by identifying and eliminating redundancy within the data. There are two main types of compression: lossless and lossy.

Lossless compression: This method ensures that no data is lost during the compression process. Popular algorithms like ZIP and PNG fall under this category.
Lossy compression: This method allows some loss of data, which usually goes unnoticed by users. Examples include JPEG for images and MP3 for audio.

When you compress a file, the algorithm identifies patterns and symbols that appear frequently and replaces them with shorter codes. For example, in the text 'aaaaaabb', 'a' occurs 6 times. Instead of storing it as 'aaaaaabb', the algorithm may store it as '6a2b'. This way, the data consumes less space. However, repeatedly compressing an already compressed file can result in no further gains or even lead to an increase in size. The informational overhead added during each compression cycle compounds, leaving the file size stable or slightly larger.
file size optimization
File size optimization is essential for efficient data storage and transmission. When we optimize file size, we aim to reduce the amount of data required while maintaining the necessary information and quality. Here’s how it's done:

  • Compression: As discussed earlier, we use algorithms to reduce redundancy and lower file size.
  • Format Selection: Choosing the right format helps significantly. For instance, using vector graphics (SVG) for simple images instead of raster graphics (JPEG) can save space.
  • Data Pruning: This involves removing unnecessary parts of data. For example, in a programming codebase, this can mean deleting unused libraries.


Optimizing file size not only saves storage space but also improves data transfer speeds, making operations like downloading faster. However, it's crucial to note that perfect optimization balances minimal data size with the retention of data integrity and quality.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Using the programming language of your choice that supports user-defined automatic type conversions, define a type netint and supply conversions that enable assignments and equality comparisons between ints and netints. Can a generalization of this approach solve the problem of network argument marshalling?

Write a program to construct a dictionary of all "words," defined to be runs of consecutive nonwhitespace, in a given text file. We might then compress the file (ignoring the loss of whitespace information) by representing each word as an index in the dictionary. Retrieve the file rfc791.txt containing [Pos81], and run your program on it. Give the size of the compressed file assuming first that each word is encoded with 12 bits (this should be sufficient), and then that the 128 most common words are encoded with 8 bits and the rest with 13 bits. Assume that the dictionary itself can be stored by using, for each word, length(word) + 1 bytes.

The presentation formatting process is sometimes regarded as an autonomous protocol layer, separate from the application. If this is so, why might including data compression in the presentation layer be a bad idea?

Write your own implementation of htonl. Using both your own htonl and (if little-endian hardware is available) the standard library version, run appropriate experiments to determine how much longer it takes to byte-swap integers versus merely copying them.

Suppose you want to implement fast-forward and reverse for MPEG streams. What problems do you run into if you limit your mechanism to displaying I frames only? If you don't, then to display a given frame in the fast-forward sequence, what is the largest number of frames in the original sequence you may have to decode?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free