The Python Journey – Chapter IX Python Sets

python sets

A Comprehensive Explanation On Python Sets

In Python, a set is an unordered collection of unique elements. Unlike lists or tuples, sets do not allow duplicate items, making them ideal for scenarios where the uniqueness of data matters. Sets are also mutable, meaning you can add, remove, or modify their elements. However, the elements themselves must be hashable, which means they must be immutable data types (like integers, strings, and tuples).

With the release of Python 3.12, sets continue to be an incredibly efficient tool for data storage and manipulation, especially when dealing with large datasets where you need to eliminate duplicates or perform fast membership testing. Access the last chapter Python Tuples here.


Creating a Set in Python

Sets are created by placing elements inside curly braces {} or by using the built-in set() function.

# Creating a set with curly braces

fruits = {"apple", "banana", "cherry", "apple"}  # Duplicate "apple" is automatically removed

print(fruits)  # Output: {"apple", "banana", "cherry"}

# Creating an empty set (you must use set(), {} will create an empty dictionary)

empty_set = set()

In the example above, you’ll notice that duplicate elements are automatically removed from the set. This is one of the core features of sets: they only store unique elements.


Set Operations

Python sets support a variety of operations that make them incredibly useful in real-world applications, especially when you need to compare collections of data or eliminate redundancies.

1. Adding and Removing Elements

You can add new elements using the add() method and remove elements using remove() or discard().

# Adding an element to a set

fruits.add("orange")

print(fruits)  # Output: {"apple", "banana", "cherry", "orange"}

# Removing an element

fruits.remove("banana")

print(fruits)  # Output: {"apple", "cherry", "orange"}

If you attempt to remove an element that does not exist with remove(), it will raise a KeyError. To avoid this, you can use discard(), which will not throw an error if the element is not present.


2. Set Union, Intersection, and Difference

Sets are particularly useful for comparing and combining data using operations like union, intersection, and difference.

  • Union: Combines all unique elements from both sets.
  • Intersection: Returns only elements common to both sets.
  • Difference: Returns elements present in the first set but not in the second.
set1 = {1, 2, 3, 4}

set2 = {3, 4, 5, 6}

# Union

print(set1.union(set2))  # Output: {1, 2, 3, 4, 5, 6}

# Intersection

print(set1.intersection(set2))  # Output: {3, 4}

# Difference

print(set1.difference(set2))  # Output: {1, 2}

These operations are perfect when handling data from multiple sources and needing to identify overlaps, commonalities, or differences.

python

3. Real-World Example: Managing User Permissions

Imagine you’re building a system where users have different roles and permissions, such as “admin”, “editor”, and “viewer”. Sets can help you manage these permissions, ensuring each user has unique access rights without redundancy.

# Permissions for admin and editor roles

admin_permissions = {"add_user", "delete_user", "modify_settings", "view_reports"}

editor_permissions = {"edit_content", "view_reports"}

# Combining permissions (Union)

all_permissions = admin_permissions.union(editor_permissions)

print(all_permissions)  # Output: {'add_user', 'delete_user', 'modify_settings', 'edit_content', 'view_reports'}

# Common permissions (Intersection)

common_permissions = admin_permissions.intersection(editor_permissions)

print(common_permissions)  # Output: {'view_reports'}

In this case, using sets allows you to easily manage permissions and avoid duplicate entries, streamlining the process of assigning user rights.


Other Useful Set Methods

  • issubset() and issuperset(): These methods check if one set is a subset or superset of another.
  • pop(): Removes and returns an arbitrary element from the set.
  • clear(): Removes all elements from the set.
# Checking if a set is a subset of another

permissions = {"edit_content", "view_reports"}

print(permissions.issubset(editor_permissions))  # Output: True

# Removing all elements

permissions.clear()

print(permissions)  # Output: set()

Python Crash Course, 3rd Edition Paperback – 10 January 2023

by Eric Matthes

Less -12% ₹3,662

Sets and Performance in Python 3.12

With Python 3.12, sets remain one of the most optimized data structures for handling large, unordered collections. Thanks to Python’s hashing mechanism, set lookups (like checking if an item exists) are extremely fast—often operating in constant time, O(1).

This makes sets invaluable when you need to eliminate duplicates from large datasets or perform fast membership tests. For instance, if you’re working on a data analytics project and need to filter out unique values from a large list, sets offer a quick and efficient way to do this.


Real-World Example: Deduplicating Emails in a Marketing Campaign

Imagine you’re running an email marketing campaign and need to ensure that no email address receives duplicate messages. You can use a set to eliminate any duplicate entries from a list of email addresses.

email_list = ["john@example.com", "mary@example.com", "john@example.com", "sara@example.com"]

# Use a set to remove duplicates

unique_emails = set(email_list)

print(unique_emails)  # Output: {'john@example.com', 'sara@example.com', 'mary@example.com'}

In this scenario, using a set automatically filters out duplicate emails, ensuring each address only receives one message. This is a perfect real-world use of sets to maintain data integrity.


Python sets are a powerful tool for managing unique, unordered collections of items. Whether you’re handling large datasets, managing user permissions, or deduplicating data, sets offer flexibility, speed, and efficiency. With their rich set of built-in methods and operations, they’re ideal for situations where you need to eliminate redundancy, compare collections, or perform fast membership checks.

In Python 3.12, sets continue to be optimized for performance, making them an essential feature for developers looking to handle data with ease.

Comparison of Python Sets to Equivalent Features in Java and C#

The concept of sets—unordered collections of unique elements—exists across various programming languages like Python, Java, and C#. However, there are important differences in syntax, functionality, and performance. Below is a detailed comparison of sets in Python, Java, and C#, highlighting their key features and performance considerations.

FeaturePython (Sets)Java (HashSet/TreeSet)C# (HashSet)
SyntaxDefined using curly braces {} or set()Implemented using HashSet or TreeSetImplemented using HashSet<T> from System.Collections.Generic
UniquenessAutomatically enforcedAutomatically enforcedAutomatically enforced
OrderUnordered (insertion order maintained from Python 3.7+)HashSet is unordered, TreeSet is sorted by natural orderUnordered
MutabilityMutableMutableMutable
Duplicate HandlingDuplicates are not allowedDuplicates are not allowedDuplicates are not allowed
Null HandlingCan contain None (only one)HashSet can store null, TreeSet cannot store nullHashSet can store null
Methods for Set Operationsunion(), intersection(), difference()addAll(), retainAll(), removeAll()UnionWith(), IntersectWith(), ExceptWith()
Adding Elementsadd(), update()add()Add()
Removing Elementsremove(), discard(), pop()remove(), clear()Remove(), Clear()
Set Membership Testingin operator (e.g., element in set)contains() methodContains() method
Equality CheckSupports equality comparison using ==Uses equals() for comparisonUses Equals() method
Subset & Superset Checkissubset(), issuperset()containsAll() for subset checkIsSubsetOf(), IsSupersetOf()
Performance – InsertionO(1) for add() (average case, hash-based)O(1) for HashSet (average case), O(log n) for TreeSetO(1) for HashSet (average case)
Performance – LookupO(1) for in (hash-based)O(1) for HashSet, O(log n) for TreeSetO(1) for HashSet
Performance – DeletionO(1) for remove() (average case)O(1) for HashSet, O(log n) for TreeSetO(1) for HashSet
Iterating Over ElementsO(n), where n is the number of elementsO(n) for both HashSet and TreeSetO(n), where n is the number of elements
Memory EfficiencyRelatively lightweightHashSet has higher memory usage due to hashing, TreeSet less soSimilar to Java HashSet due to hashing
Common Use CasesEliminating duplicates, set operations, membership testingSimilar use cases, but TreeSet can be used for sorted setsEliminating duplicates, set operations, fast lookups
Thread SafetyNot thread-safeNot thread-safe, must use Collections.synchronizedSet() for thread safetyNot thread-safe, must use locking for thread safety
Support for Frozen/Immutable SetsYes, via frozenset()No built-in immutable Set equivalentNo built-in immutable Set, but can implement read-only sets
Set Intersection OptimizationFast due to hash-based lookupSlower for TreeSet (O(log n)) but fast for HashSetFast due to hash-based lookup

Performance Comparison

Performance characteristics of sets in Python, Java, and C# are primarily influenced by the underlying data structures used in each language’s implementation. Let’s compare their performance for common set operations:

OperationPython (Sets)Java (HashSet)C# (HashSet)
InsertionO(1) on averageO(1) for HashSetO(1) on average
Membership TestingO(1) on averageO(1) for HashSetO(1) on average
RemovalO(1) on averageO(1) for HashSetO(1) on average
UnionO(n + m), where n and m are the sizes of the setsO(n + m) for HashSetO(n + m)
IntersectionO(min(n, m))O(min(n, m)) for HashSetO(min(n, m))
DifferenceO(n)O(n)O(n)
IterationO(n)O(n)O(n)

Key Insights

  • Python and C# implement sets using a hash-based structure (HashSet in C# and internally hash-based in Python), ensuring O(1) performance for insertion, deletion, and membership testing in average cases.
  • Java offers both HashSet (which operates similarly to Python and C#) and TreeSet, which is backed by a red-black tree. TreeSet provides sorted order but with O(log n) time complexity for basic operations, making it slower compared to HashSet and Python’s set.
  • Memory Usage: Since HashSet and Python’s set are based on hashing, they generally consume more memory due to the underlying hash table. TreeSet in Java, being tree-based, may consume less memory but is slower.
  • Thread Safety: None of these implementations are thread-safe by default. In Java, thread safety can be achieved using Collections.synchronizedSet(). In C# and Python, developers need to manually implement thread-safe solutions using locking mechanisms or specialized libraries.

While all three languages provide powerful set implementations with similar capabilities, the choice of language and set type depends on the specific requirements of the application:

  • Python Sets: Best for rapid prototyping and development where performance, simplicity, and flexibility matter. It has an advantage in terms of simplicity, especially for beginner-friendly syntax.
  • Java (HashSet/TreeSet): Provides more options depending on whether you need performance (HashSet) or ordered elements (TreeSet). However, TreeSet sacrifices performance for the guarantee of order.
  • C# HashSet: Offers excellent performance comparable to Python’s set and Java’s HashSet, making it ideal for enterprise-level applications needing fast membership testing and uniqueness guarantees.

In most general-use cases, Python sets and C# HashSet offer the best balance of performance and simplicity when handling unordered collections of unique elements.

Curated Reads

Dhakate Rahul

Dhakate Rahul

Leave a Reply

Your email address will not be published. Required fields are marked *