Chapter 5: Q18E (page 163)

The following table gives the frequencies of the letters of the English language (including the blank for separating words) in a particular corpus.
blank
18.3%
r
4.8%
y
1.6%
e
10.2%
d
3.5%
p
1.6%
t
7.7%
l
3.4%
b
1.3%
a
6.8%
c
2.6%
v
0.9%
o
5.9%
u
2.4%
k
0.6%
i
5.8%
m
2.1%
j
0.2%
n
5.5%
w
1.9%
x
0.2%
s
5.1%
f
1.8%
q
0.1%
h
4.9%
g
1.7%
z
0.1%
What is the optimum Huffman encoding of this alphabet?
What is the expected number of bits per letter?
Suppose now that we calculate the entropy of these frequencies
$H = \sum_{t = 0}^{26} p_{t} \log \frac{1}{p_{t}}$
(see the box in page 143). Would you expect it to be larger or smaller than your answer above? Explain.
d. Do you think that this is the limit of how much English text can be compressed? What features of the English language, besides letters and their frequencies, should a better compression scheme take into account?

Short Answer

Expert verified

In this question we can use different method to convert alphabet letter’s using binary bits pattern and getting answer.

Step by step solution

Compression Technique

Huffman encoding is a data compression technique. Assume that the alphabet frequency is as shown in Figure 1. Determine the most efficient Huffman encoding for the alphabets.

Follow the methods outlined below to determine the best Huffman encoding:

• Arrange the alphabets in ascending order of frequency.

• Choose the two alphabets with the lowest frequency.

• Combine them and arrange the results into the frequency list.

• Repeat steps 1-3 until the entire list has been scanned.

Figure 1 depicts this procedure.

Figure 1

Explanation least frequent alphabets in parent node

• In, the alphabets z and q are used since they are the least common. These are combined, and the result is assigned to the parent node. Because the result is lower than all of the other wavelengths, it's also positioned before j in the list.

The fresh list will be: result so on.[z,q],x,j,k,v,........so on

• The least common alphabets in STEP 2 comprise result [z,q] and x . As a consequence, combine them and place the outcome inside the parent node. So, result[result [z,q],x ] has now become 0.4 , which in itself is bigger than j's value. Therefore, with in bandwidth list, put it after j. As a result, your new list will look like this:

j, result [result [z,q],x],k,v,...... So on.

• In the j is left node and result[result [z,q],x is right node as j is less than result[result[z,q],x ].

Continue this procedure on until entire list has been scanned.

• Give each left branch a number of 0 and each right branch a number of 1 . Figure 2 depicts the end outcome.

Figure 2:

Start somewhere at parent node and explore until you reach full alphabet, checking the 0s and 1s of the branches you've traversed.

The following are the results for any and all alphabets:

blank:101(3bits)
e:010 (3bits)
t:1000 (4bits)
a:1110 (4bits)
0:1100(4bits)
i:0111(4bits)
n:0110(4bits)
s:0011(4bits)
h:0001(4bits)
r:0000(4bits)
d:11111(5 bits)
l :11110(5 bits)
c:00101(5 bits)
u:00100(5 bits)
m:100111(6 bits)
w:100101(6 bits)
f:100100(6 bits)
g:110111(6 bits)
y:110110(6 bits)
p:110101(6 bits)
b:110100(6 bits)
v:1001100(7 bits)
k:10011011(8bits)
j:100110100(9 bits)
x:1001101011(10 bits)
q:10011010101(11 bits)
z:10011010100(11 bits)

Suppose the length of bits used for Huffman encoding is $I_{a}$ and frequency of the letter is $p_{a}$ .

Sum of the frequencies is 101 . Expected number of bits per letter:

$Expected number of bits per letter = \frac{(\underset{}{\sum f_{a} I_{a}})}{(a \in A)}$

$\begin{array}{l} = \frac{1}{f a} a [18.3 x 3 + 10.2 \times 3 + 7.7 \times 4 + 6.8 x 4 + 5.9 \times 4 + 5.8 \times 4 + 5.5 x 4 + 5.1 \times 4 + 4.9 \times 4 \\ + 4.8 \times 4 + 3.5 \times 5 + 3.4 \times 5 + 2.6 \times 5 + 2.4 \times 5 + 2.1 \times 6 + 1.9 \times 6 + 1.8 \times 6 + 1.7 \times 6 + 1.6 \times 6 + 1.6 \times 6 + \\ 1.3 \times 6 + 0.9 \times 7 + 0.6 \times 8 + 0.2 \times 9 + 0.2 \times 10 + 0.1 \times 11 + 0.1 \times 11) \end{array}$

$= \frac{1}{f a} [(21.3 + 30.6 + 30.8 + 27.2 + 23.6 + 23.2 + 22 + 20.4 + 19.6 + 19.2 + 17.5 + 17 + 13 + 12 + 12.6 + 11.4 + 10.8 + 10.2 + 9.6 + 9.6 + 7.8 + 6.3 + 4.8 + 1.8 + 2.0 + 1.1 + 1.1) = \frac{386.5}{101}$

Assume, alphabet’s letter use to convert with number of bits base on = 3.83 bits per letter.

Conclusion

In the above question there will be binary calculation conversion number or alphabets in bits per letter. This above calculation proved that bits per letter is can do with simple maths formula. It prove correct answer as above.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Short Answer

Step by step solution

Compression Technique

Explanation least frequent alphabets in parent node

Conclusion

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Cybersecurity in Computer Science

Algorithms in Computer Science

Big Data

Data Representation in Computer Science

Blockchain Technology

Computer Network

Study anywhere. Anytime. Across all devices.

Company

Product

Help

blank	18.3%	r	4.8%	y	1.6%
e	10.2%	d	3.5%	p	1.6%
t	7.7%	l	3.4%	b	1.3%
a	6.8%	c	2.6%	v	0.9%
o	5.9%	u	2.4%	k	0.6%
i	5.8%	m	2.1%	j	0.2%
n	5.5%	w	1.9%	x	0.2%
s	5.1%	f	1.8%	q	0.1%
h	4.9%	g	1.7%	z	0.1%