Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

The following table gives the frequencies of the letters of the English language (including the blank for separating words) in a particular corpus.

blank

18.3%

r

4.8%

y

1.6%

e

10.2%

d

3.5%

p

1.6%

t

7.7%

l

3.4%

b

1.3%

a

6.8%

c

2.6%

v

0.9%

o

5.9%

u

2.4%

k

0.6%

i

5.8%

m

2.1%

j

0.2%

n

5.5%

w

1.9%

x

0.2%

s

5.1%

f

1.8%

q

0.1%

h

4.9%

g

1.7%

z

0.1%

  1. What is the optimum Huffman encoding of this alphabet?
  2. What is the expected number of bits per letter?
  3. Suppose now that we calculate the entropy of these frequencies

H=t=026ptlog1pt

(see the box in page 143). Would you expect it to be larger or smaller than your answer above? Explain.

d. Do you think that this is the limit of how much English text can be compressed? What features of the English language, besides letters and their frequencies, should a better compression scheme take into account?

Short Answer

Expert verified

In this question we can use different method to convert alphabet letter’s using binary bits pattern and getting answer.

Step by step solution

01

Compression Technique

a)

Huffman encoding is a data compression technique. Assume that the alphabet frequency is as shown in Figure 1. Determine the most efficient Huffman encoding for the alphabets.

Follow the methods outlined below to determine the best Huffman encoding:

• Arrange the alphabets in ascending order of frequency.

• Choose the two alphabets with the lowest frequency.

• Combine them and arrange the results into the frequency list.

• Repeat steps 1-3 until the entire list has been scanned.

Figure 1 depicts this procedure.

Figure 1

02

Explanation least frequent alphabets in parent node

• In, the alphabets z and q are used since they are the least common. These are combined, and the result is assigned to the parent node. Because the result is lower than all of the other wavelengths, it's also positioned before j in the list.

The fresh list will be: result so on.[z,q],x,j,k,v,........so on

• The least common alphabets in STEP 2 comprise result [z,q] and x . As a consequence, combine them and place the outcome inside the parent node. So, result[result [z,q],x ] has now become 0.4 , which in itself is bigger than j's value. Therefore, with in bandwidth list, put it after j. As a result, your new list will look like this:

j, result [result [z,q],x],k,v,...... So on.

• In the j is left node and result[result [z,q],x is right node as j is less than result[result[z,q],x ].

Continue this procedure on until entire list has been scanned.

• Give each left branch a number of 0 and each right branch a number of 1 . Figure 2 depicts the end outcome.

Figure 2:

Start somewhere at parent node and explore until you reach full alphabet, checking the 0s and 1s of the branches you've traversed.

The following are the results for any and all alphabets:

  • blank:101(3bits)
  • e:010 (3bits)
  • t:1000 (4bits)
  • a:1110 (4bits)
  • 0:1100(4bits)
  • i:0111(4bits)
  • n:0110(4bits)
  • s:0011(4bits)
  • h:0001(4bits)
  • r:0000(4bits)
  • d:11111(5 bits)
  • l :11110(5 bits)
  • c:00101(5 bits)
  • u:00100(5 bits)
  • m:100111(6 bits)
  • w:100101(6 bits)
  • f:100100(6 bits)
  • g:110111(6 bits)
  • y:110110(6 bits)
  • p:110101(6 bits)
  • b:110100(6 bits)
  • v:1001100(7 bits)
  • k:10011011(8bits)
  • j:100110100(9 bits)
  • x:1001101011(10 bits)
  • q:10011010101(11 bits)
  • z:10011010100(11 bits)

b)

Suppose the length of bits used for Huffman encoding is Iaand frequency of the letter is pa.

Sum of the frequencies is 101 . Expected number of bits per letter:

Expectednumberofbitsperletter=faIaaA

=1faa[18.3x3+10.2×3+7.7×4+6.8x4+5.9×4+5.8×4+5.5x4+5.1×4+4.9×4+4.8×4+3.5×5+3.4×5+2.6×5+2.4×5+2.1×6+1.9×6+1.8×6+1.7×6+1.6×6+1.6×6+1.3×6+0.9×7+0.6×8+0.2×9+0.2×10+0.1×11+0.1×11)

=1fa[(21.3+30.6+30.8+27.2+23.6+23.2+22+20.4+19.6+19.2+17.5+17+13+12+12.6+11.4+10.8+10.2+9.6+9.6+7.8+6.3+4.8+1.8+2.0+1.1+1.1)=386.5101

Assume, alphabet’s letter use to convert with number of bits base on = 3.83 bits per letter.

03

Conclusion 

In the above question there will be binary calculation conversion number or alphabets in bits per letter. This above calculation proved that bits per letter is can do with simple maths formula. It prove correct answer as above.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free