How to Store Multiple Data Items
Scenario
As a programmer, there are instances where you would need a single variable to hold multiple values. There are many answers to this situation because there are different designed solutions to different scenarios of this problem.
There are many built-in data types (list, set, and dictionary) that can help us that best suits the given situation.
We will be looking at 3 basic practices.
Where is this data stored?
In this chapter, our multiple items are temporarily stored within our program. When our program is done executing our code, we cannot have access to our data.
This is called non-persistent storage.
If we do not need to remember our previous state of our program, we often store data (values stored in variables and such) into the RAM of our computer. The RAM allows a very quick retrieval of data when a program is running.
If we want to save data from our program to be accessible without the program or we want to remember what happened in the previous execution, we would need to create an external file or create and use a database.
Option 1 - Using lists
A list
in python can contain any type of data.
Each item added will have a numeric index (starting at 0).
example[0]
will contain a String data of'Marshall'
example[3]
will contain a List data containing['Freya,' 'Goji']
Lists are built-in data type in Python that offers strong set of operations and methods to provide a way to have data collection.
Option 2 - Using sets
Main Reason: To check if some item exists, checking against a set is faster than checking against a list
There should be no duplicates; each items are unique
Order of insertion does not matter
When you need an implementation of a mathematical set so that you can perform set operations (Python documentation link)
We can convert a list with duplicate items to set to remove duplicates
Items are still addable and removable in a set
The code above currently has two collection of data: a set and a list.
It is currently trying to see if the person is allergic to dust
given the two collections with same item values.
With respect to performance (speed of the program), the set
based collection will outperform the list
.
Reasoning:
Because each individual item in a set is unique, Python generates an integer address for each item.
This is the reason that sets do not have duplicates; duplicates would generate the same integer address. This concept of integer addressing is called hashing.
When we are checking if
dust
is an item in the set of ofallergies
by executing:"Dust" in allergies
, Python would hash the value ofdust
and try to access the location of it fromallergies
. Since that address is not part of the set, it can immediately return a value ofFalse
When
dust
is being searched through the list, Python starts the search from the beginning of the list, and it compares every item againstdust
until found. If it is never found, then we reach the end of the list to returnFalse
Therefore, a set would need to do a single hashing operation to see if that value is in the set or not. However, a list would have to check every single value against the searching value to see if it exists in the list. As the size of the list grows, the number of comparisons Python needs to do would be far larger than the number of comparisons a set would have done.
Option 3 - Countable Unique Items
For this situation, let's look at an example first.
The code above is an example of using a dictionary.
A dictionary will always store two types of information for each item in its collection.
Address (commonly known as "key") -> Dictionary allows us to use immutable customized hashing for our items in the dictionary to act as an index for dictionary items.
Value -> Dictionary allows us to associate any type of data located at our custom address
In Python programming, dictionaries are a way to create a key+value pairs of data.
Last updated