Chapter 3: Problem 19

Explain how the lexicographic ordering of strings in Python differs from the ordering of words in a dictionary or telephone book. Hint: Consider strings such as 184 , wiley, con, Century 21 , and while-U-Wait.

Short Answer

Expert verified

Python orders strings by Unicode, while dictionaries ignore case and consider letters first.

Step by step solution

Understand Lexicographic Ordering in Python

In Python, strings are ordered lexicographically, which means they are compared based on the Unicode point number of each character. This is akin to the way words are arranged in an English dictionary but not identical. Unicode assigns a numerical value to each character, including numbers, uppercase letters, lowercase letters, and special characters. This ordering can appear unintuitive at times because of the different numerical values assigned to each group of characters.

Compare Characters Based on Unicode

In Python's lexicographic ordering, numbers come before uppercase letters, which come before lowercase letters. Special characters may appear before or after these groups based on their Unicode values. For example, `184` is less than `Century 21` because numbers have lower Unicode values than uppercase letters. Similarly, `wiley` comes after `while-U-Wait` because the uppercase 'W' in `while-U-Wait` takes precedence over the lowercase 'w' in `wiley` according to Unicode.

Understand Dictionary Ordering

Dictionary or telephone book ordering for words typically starts by ignoring case and treating special characters (like hyphens or spaces) as equivalent to spaces or as ignored, ordering by the words themselves first, with numbers often sorted after letters. Words like `184`, `wiley`, `con`, `Century 21`, and `while-U-Wait` might be ordered by interpreting spaces or characters differently, ignoring case sensitivity. Hence, `Century 21` might appear before `con`, depending on the rules.

Analyze the Differences

From the example words, we can see that differences arise because lexicographic ordering considers the precise Unicode value, while dictionary ordering often considers linguistic rules or conventions. For instance, in lexicographic ordering `184` is first due to its low Unicode point, while `Century 21` may follow `con` alphabetically if dictionary rules like ignoring spaces and numbers are applied.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Unicode comparison

When Python organizes strings, it pays attention to the Unicode value of each character. Unicode is a system that assigns a unique code to every character, whether it's a letter, digit, or symbol. This coding allows for a vast range of characters from different languages and alphabets to be represented uniformly.

In Python's sorting, each string is compared character by character using these Unicode values. This can sometimes lead to unexpected results because:

Numerical characters like '1', '8', and '4' have lower Unicode values than letters, so they appear earlier in a sort.
Uppercase letters such as 'C' or 'W' have lower Unicode values compared to their lowercase equivalents like 'c' or 'w'.
Special characters, depending on their Unicode position, might appear before or after these alphanumeric characters.

Thus, understanding Unicode is crucial for grasping why Python sorts strings the way it does.

Python string sorting

In Python, sorting strings is all about that lexicographic order derived from Unicode. Unlike human-intuitive sorting, which might consider some logical linguistic sense, Python's method follows strict Unicode value precedence.

It compares each string starting from the first character onwards, stopping only when a difference is detected.
If two strings start with the same character, Python proceeds to compare the next character and so on, until a difference is noted.
If one string is a prefix of another, the shorter string will come first.

Because of these rules, strings are placed in order according to their exact Unicode values from beginning to end, making Python's sorting predictable if you can recall Unicode sequences.

Dictionary ordering

Dictionary ordering, like what you’d find in a physical book or phone directory, has its unique nuances. Typically, this form of ordering considers more contextual factors than Python’s straightforward Unicode comparison.

Most commonly, dictionary sorting is case-insensitive, meaning that 'A' and 'a' are treated the same.
Special characters and spaces are often downplayed or considered equivalent to spaces.
Non-alphabetic tokens such as numbers may sometimes be sorted separately either before or after the alphabetically ordered list.

Therefore, dictionary ordering emphasizes linguistic rules over mechanical Unicode values, making it appear more logical in everyday language usage.

Character precedence in sorting

Character precedence refers to the priority order characters follow during sorting. In Python's lexicographic system:

Numbers have the highest precedence because of their low Unicode values.
Uppercase letters come next, before any lowercase letters.
Lowercase letters follow, given their higher Unicode point numbers.
Special characters could appear anywhere depending on their specific Unicode value.

This precedence directly influences how sorting functions operate and how strings appear when organized. Knowing this hierarchy is exceptionally beneficial when predicting the order of string outputs, whether sorting data or automating text arrangements with scripts.

Explain how the lexicographic ordering of strings in Python differs from the ordering of words in a dictionary or telephone book. Hint: Consider strings such as 184 , wiley, con, Century 21 , and while-U-Wait.

Short Answer

Step by step solution

Understand Lexicographic Ordering in Python

Compare Characters Based on Unicode

Understand Dictionary Ordering

Analyze the Differences

Key Concepts

Unicode comparison

Python string sorting

Dictionary ordering

Character precedence in sorting

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Big Data

Data Structures

Theory of Computation

Computer Systems

Problem Solving Techniques

Computer Organisation and Architecture

Study anywhere. Anytime. Across all devices.

Company

Product

Help