Chapter 3: Problem 19
Explain how the lexicographic ordering of strings in Python differs from the ordering of words in a dictionary or telephone book. Hint: Consider strings such as 184 , wiley, con, Century 21 , and while-U-Wait.
Short Answer
Expert verified
Python orders strings by Unicode, while dictionaries ignore case and consider letters first.
Step by step solution
01
Understand Lexicographic Ordering in Python
In Python, strings are ordered lexicographically, which means they are compared based on the Unicode point number of each character. This is akin to the way words are arranged in an English dictionary but not identical. Unicode assigns a numerical value to each character, including numbers, uppercase letters, lowercase letters, and special characters. This ordering can appear unintuitive at times because of the different numerical values assigned to each group of characters.
02
Compare Characters Based on Unicode
In Python's lexicographic ordering, numbers come before uppercase letters, which come before lowercase letters. Special characters may appear before or after these groups based on their Unicode values. For example, `184` is less than `Century 21` because numbers have lower Unicode values than uppercase letters. Similarly, `wiley` comes after `while-U-Wait` because the uppercase 'W' in `while-U-Wait` takes precedence over the lowercase 'w' in `wiley` according to Unicode.
03
Understand Dictionary Ordering
Dictionary or telephone book ordering for words typically starts by ignoring case and treating special characters (like hyphens or spaces) as equivalent to spaces or as ignored, ordering by the words themselves first, with numbers often sorted after letters. Words like `184`, `wiley`, `con`, `Century 21`, and `while-U-Wait` might be ordered by interpreting spaces or characters differently, ignoring case sensitivity. Hence, `Century 21` might appear before `con`, depending on the rules.
04
Analyze the Differences
From the example words, we can see that differences arise because lexicographic ordering considers the precise Unicode value, while dictionary ordering often considers linguistic rules or conventions. For instance, in lexicographic ordering `184` is first due to its low Unicode point, while `Century 21` may follow `con` alphabetically if dictionary rules like ignoring spaces and numbers are applied.
Unlock Step-by-Step Solutions & Ace Your Exams!
-
Full Textbook Solutions
Get detailed explanations and key concepts
-
Unlimited Al creation
Al flashcards, explanations, exams and more...
-
Ads-free access
To over 500 millions flashcards
-
Money-back guarantee
We refund you if you fail your exam.
Over 30 million students worldwide already upgrade their learning with Vaia!
Key Concepts
These are the key concepts you need to understand to accurately answer the question.
Unicode comparison
When Python organizes strings, it pays attention to the Unicode value of each character. Unicode is a system that assigns a unique code to every character, whether it's a letter, digit, or symbol. This coding allows for a vast range of characters from different languages and alphabets to be represented uniformly.
In Python's sorting, each string is compared character by character using these Unicode values. This can sometimes lead to unexpected results because:
In Python's sorting, each string is compared character by character using these Unicode values. This can sometimes lead to unexpected results because:
- Numerical characters like '1', '8', and '4' have lower Unicode values than letters, so they appear earlier in a sort.
- Uppercase letters such as 'C' or 'W' have lower Unicode values compared to their lowercase equivalents like 'c' or 'w'.
- Special characters, depending on their Unicode position, might appear before or after these alphanumeric characters.
Python string sorting
In Python, sorting strings is all about that lexicographic order derived from Unicode. Unlike human-intuitive sorting, which might consider some logical linguistic sense, Python's method follows strict Unicode value precedence.
- It compares each string starting from the first character onwards, stopping only when a difference is detected.
- If two strings start with the same character, Python proceeds to compare the next character and so on, until a difference is noted.
- If one string is a prefix of another, the shorter string will come first.
Dictionary ordering
Dictionary ordering, like what you’d find in a physical book or phone directory, has its unique nuances. Typically, this form of ordering considers more contextual factors than Python’s straightforward Unicode comparison.
- Most commonly, dictionary sorting is case-insensitive, meaning that 'A' and 'a' are treated the same.
- Special characters and spaces are often downplayed or considered equivalent to spaces.
- Non-alphabetic tokens such as numbers may sometimes be sorted separately either before or after the alphabetically ordered list.
Character precedence in sorting
Character precedence refers to the priority order characters follow during sorting. In Python's lexicographic system:
- Numbers have the highest precedence because of their low Unicode values.
- Uppercase letters come next, before any lowercase letters.
- Lowercase letters follow, given their higher Unicode point numbers.
- Special characters could appear anywhere depending on their specific Unicode value.