Chapter 7: Problem 19

Suppose we have a compression function \(c\), which takes a bit string \(s\) to a compressed string \(c(s)\). (a) Show that for any integer \(N\) there must be a string \(s\) of length \(N\) for which length \((c(s)) \geq N\); that is, no effective compression is done. (b) Compress some already compressed files (try compressing with the same utility several times in sequence). What happens to the file size? (c) Given a compression function \(c\) as in (a), give a function \(c^{\prime}\) such that for all bit strings \(s\), length \(\left(c^{\prime}(s)\right) \leq \min (\) length \((c(s))\), length \((s))+1\); that is, in the worst case, compression with \(c^{\prime}\) expands the size by only 1 bit.

Short Answer

Expert verified

(a) There must be a string of length \(N\) such that length \((c(s)) \geq N\). (b) Compressing an already compressed file does not significantly reduce its size further and may increase it. (c) \(c^{\text{prime}}(s)\) can be defined to ensure compression results in a size at most \(min(length(c(s)), length(s)) + 1\).

Step by step solution

Part (a): Proof Setup

To show that for any integer \(N\), there exists a string \(s\) of length \(N\) such that length \((c(s)) \geq N\), assume that the function \(c\) compresses every string of length \(N\) effectively.

Part (a): Pigeonhole Principle

Using the pigeonhole principle, a compression function \(c\) that maps every string of length \(N\) to a string of length less than \(N\) would require too few compressed strings to accommodate all the original strings, leading to \(c\) being not injective (one-to-one), which is a contradiction.

Part (a): Conclusion

Since not all strings can be compressed to a length smaller than \(N\), there must be at least one string of length \(N\) for which no compression is done, i.e., length \((c(s)) \geq N\).

Part (b): Experiment with Compression Algorithms

Take a file and compress it using a popular compression utility (e.g., ZIP). Compress the already compressed file again using the same utility. Observe the file size after each round of compression.

Part (b): Observation

After the first compression, the file size will be reduced. However, subsequent compressions generally result in little to no additional size reduction, and in some cases, may even increase the file size slightly.

Part (b): Explanation

This happens because the compressed file is already optimized, and further attempts at compression introduce overhead or metadata, which might increase the file size rather than decreasing it.

Part (c): Define the Function \(c^{\text{prime}}\)

Define the function \(c^{\text{prime}}(s)\) as follows: if length \((c(s)) \leq\) length \((s)\), then \(c^{\text{prime}}(s) = c(s)\). Else, add one extra bit to signify that the file was not compressed: \(c^{\text{prime}}(s) = s\text{ concatenated with a single bit}\).

Part (c): Verification

Check that \(c^{\text{prime}}\) satisfies length constraints. If length \((c(s)) \leq\) length \((s)\), length \((c^{\text{prime}}(s))\) equals length \((c(s))\). If length \((c(s)) >\) length \((s)\), length \((c^{\text{prime}}(s))\) is length \((s)+1\), since we added one bit.

Part (c): Conclusion

Thus, the function \(c^{\text{prime}}(s)\) ensures that the length of the compressed string is at most min(length \((c(s))\), length \((s)) + 1\), meeting the requirement of the exercise.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

pigeonhole principle

The pigeonhole principle is a fundamental concept in mathematics and computer science. It states that if you have more pigeons than pigeonholes and each pigeon needs a place, at least one pigeonhole must contain more than one pigeon. Let's relate this to compression.
When you try to compress every possible string of a given length to a shorter length, there aren't enough shorter strings to go around. For example, if you have 8-bit strings (like '10101010'), there are 256 of these. If each must be compressed to fewer than 8 bits, you'd need more than 256 different shorter strings! This is impossible, meaning some strings won't compress effectively. This is why, according to the pigeonhole principle, there will always be at least one string that does not become shorter when compressed.

compression algorithms

Compression algorithms are techniques and methods used to reduce the size of data. They work by identifying and eliminating redundancy within the data. There are two main types of compression: lossless and lossy.

Lossless compression: This method ensures that no data is lost during the compression process. Popular algorithms like ZIP and PNG fall under this category.
Lossy compression: This method allows some loss of data, which usually goes unnoticed by users. Examples include JPEG for images and MP3 for audio.

When you compress a file, the algorithm identifies patterns and symbols that appear frequently and replaces them with shorter codes. For example, in the text 'aaaaaabb', 'a' occurs 6 times. Instead of storing it as 'aaaaaabb', the algorithm may store it as '6a2b'. This way, the data consumes less space. However, repeatedly compressing an already compressed file can result in no further gains or even lead to an increase in size. The informational overhead added during each compression cycle compounds, leaving the file size stable or slightly larger.

file size optimization

File size optimization is essential for efficient data storage and transmission. When we optimize file size, we aim to reduce the amount of data required while maintaining the necessary information and quality. Here’s how it's done:

Compression: As discussed earlier, we use algorithms to reduce redundancy and lower file size.
Format Selection: Choosing the right format helps significantly. For instance, using vector graphics (SVG) for simple images instead of raster graphics (JPEG) can save space.
Data Pruning: This involves removing unnecessary parts of data. For example, in a programming codebase, this can mean deleting unused libraries.

Optimizing file size not only saves storage space but also improves data transfer speeds, making operations like downloading faster. However, it's crucial to note that perfect optimization balances minimal data size with the retention of data integrity and quality.

Short Answer

Step by step solution

Part (a): Proof Setup

Part (a): Pigeonhole Principle

Part (a): Conclusion

Part (b): Experiment with Compression Algorithms

Part (b): Observation

Part (b): Explanation

Part (c): Define the Function \(c^{\text{prime}}\)

Part (c): Verification

Part (c): Conclusion

Key Concepts

pigeonhole principle

compression algorithms

file size optimization

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Data Structures

Computer Programming

Computer Organisation and Architecture

Databases

Computer Network

Computer Systems

Study anywhere. Anytime. Across all devices.

Company

Product

Help