Chapter 7: Problem 16
In order to read a web page (Special Topic 7.4), you need to know its character cncoding (Special Topic 7.3). Write a program that has the URL of a web page as a command-line argument and that fetches the page contents in the proper encoding. Determine the encoding as follows: 1\. After calling urlopen, call input.headers ["content-type"]. You may get a string such as "text/htn1; charset-windows-1251". If so, use the value of the charset attribute as the cncoding. 2\. Read the first line using the "latin 1 " encoding. If the first two bytes of the file are 254255 or 255254 , the encoding is "ut \(f-16^{\prime \prime}\). If the first three bytes of the file are 239187191 , the encoding is "ut \(f-8^{*}\). 3\. Continue reading the page using the "latin 1 " encoding and look for a string of the form encoding=... or charset \(=\ldots\) If you found a match, extract the character encoding (discarding any surrounding quotation marks) and re-read the document with that encoding. If none of these applies, write an error message that the encoding could not be determined.
Short Answer
Step by step solution
Key Concepts
These are the key concepts you need to understand to accurately answer the question.