Collections
基于我目前的学习经验,以下几种类型用的很多:
defaultdict (dict子类调用工厂函数来提供缺失值)
counter (用于计算可哈希对象的dict子类)
deque (类似于列表的容器,可以从两端操作)
namedtuple (用于创建具有命名字段的tuple子类的工厂函数)
defaultdict
基础概念
“defaultdict”是在名为“collections”的模块中定义的容器。它需要一个函数(默认工厂)作为其参数。默认情况下设置为“int”,即0.如果键不存在则为defaultdict,并返回并显示默认值。
其实就是一个查不到key值时不会报错的dict
应用实例
创建了一个叫person的字典,里面存储的key值为name,age,如果这时候尝试调用person['city'],会抛出KeyError错误,因为没有city这个键值:
person = {'name':'xiaobai','age':18}
print ("The value of key 'name' is : ",person['name'])
print ("The value of key 'city' is : ",person['city'])
Out: The value of key 'name' is : xiaobai
Traceback (most recent call last):
File "C:\Users\E560\Desktop\test.py", line 17, in <module>
print ("The value of key 'city' is : ",person['city'])
KeyError: 'city'
用defaultdict再试试:
from collections import defaultdict
person = defaultdict(lambda : 'Key Not found') # 初始默认所有key对应的value均为‘Key Not Found’
person['name'] = 'xiaobai'
person['age'] = 18
print ("The value of key 'name' is : ",person['name'])
print ("The value of key 'adress' is : ",person['city'])
Out:The value of key 'name' is : xiaobai
The value of key 'adress' is : Key Not found
除此之外,我们还可以利用defaultdict创建时,传递参数为所有key默认value值这一特性,实现一些其他的功能,比如:
from collections import defaultdict
d = defaultdict(list)
d['person'].append("xiaobai")
d['city'].append("paris")
d['person'].append("student")
for i in d.items():
print(i)
Out: ('person', ['xiaobai', 'student'])
('city', ['paris'])
我们默认所有key对应的是一个list,自然就可以在赋值时使用list的append方法了。再比如这个例子:
from collections import defaultdict
food = (
('jack', 'milk'),
('Ann', 'fruits'),
('Arham', 'ham'),
('Ann', 'soda'),
('jack', 'dumplings'),
('Ahmed', 'fried chicken'),
)
favourite_food = defaultdict(list)
for n, f in food:
favourite_food[n].append(f)
print(favourite_food)
Out:defaultdict(<class 'list'>, {'jack': ['milk', 'dumplings'], 'Ann': ['fruits', 'soda'], 'Arham': ['ham'], 'Ahmed': ['fried chicken']})
from collections import defaultdict
pets = [
("dog", "Affenpinscher"),
("dog", "Terrier"),
("dog", "Boxer"),
("cat", "Abyssinian"),
("cat", "Birman"),
]
group_pets = defaultdict(list)
for pet, breed in pets:
group_pets[pet].append(breed)
for pet, breeds in group_pets.items():
print(pet, "->", breeds)
dog -> ['Affenpinscher', 'Terrier', 'Boxer']
cat -> ['Abyssinian', 'Birman']
Counter
基础概念
Counter是dict的子类,一个计数器
返回一个字典,key就是出现的元素,value就是该元素出现的次数
应用实例
from collections import Counter
count_list = Counter(['B','B','A','B','C','A','B','B','A','C']) #计数list
print (count_list)
count_tuple = Counter((2,2,2,3,1,3,1,1,1)) #计数tuple
print(count_tuple)
Out:Counter({'B': 5, 'A': 3, 'C': 2})
Counter({1: 4, 2: 3, 3: 2})
配合dataframe也可以,当然,df有value_counts()方法:
df= pd.DataFrame({'name':['a','b','c','a','a','b'],'value':[1,2,3,4,5,6]})
counter_result = Counter(df['name'])
counter_result
Out:Counter({'a': 3, 'b': 2, 'c': 1})
# df['frequency'] =[ counter_result[n] for n in df['name'] ]
df['frequency'] = df['name'].map(df['name'].value_counts())
Counter一般不会用于dict和set的计数,因为dict的key是唯一的,而set本身就不能有重复元素
现在我们也可以直接把在defaultdict例子中用过food元组拿来计数:
from collections import Counter
food = (
('jack', 'milk'),
('Ann', 'fruits'),
('Arham', 'ham'),
('Ann', 'soda'),
('jack', 'dumplings'),
('Ahmed', 'fried chicken'),
)
favourite_food_count = Counter(n for n,f in food) #统计name出现的次数
print(favourite_food_count)
Out: Counter({'jack': 2, 'Ann': 2, 'Arham': 1, 'Ahmed': 1})
substract:
from collections import Counter
inventory = Counter(dogs=23, cats=14, pythons=7)
adopted = Counter(dogs=2, cats=5, pythons=1)
inventory.subtract(adopted)
inventory
=>Counter({'dogs': 21, 'cats': 9, 'pythons': 6})
new_pets = {"dogs": 4, "cats": 1}
inventory.update(new_pets)
inventory
=>Counter({'dogs': 25, 'cats': 10, 'pythons': 6})
new_pets = {"dogs": 4, "pythons": 2}
inventory += new_pets
inventory
Counter({'dogs': 27, 'cats': 7, 'pythons': 7})
Deque
基础概念
在我们需要在容器两端的更快的添加和移除元素的情况下,可以使用deque. 我的个人理解是deque就是一个可以两头操作的容器,类似list但比列表速度更快
应用实例
deque的方法有很多,很多操作和list类似,也支持切片
from collections import deque
d = deque()
d.append(1)
d.append(2)
d.append(3)
print(len(d))
print(d[0])
print(d[-1])
Out: 3
1
3
===============================================
print(deque([1, 2, 3, 4]))
print(deque(range(1, 5)))
print(deque("abcd"))
numbers = {"one": 1, "two": 2, "three": 3, "four": 4}
print(deque(numbers.keys()))
print(deque(numbers.values()))
print(deque(numbers.items()))
deque([1, 2, 3, 4])
deque([1, 2, 3, 4])
deque(['a', 'b', 'c', 'd'])
deque(['one', 'two', 'three', 'four'])
deque([1, 2, 3, 4])
deque([('one', 1), ('two', 2), ('three', 3), ('four', 4)])
======================================
numbers = deque([1, 2, 3, 4])
numbers.popleft() #1
numbers.popleft() #2
numbers = deque([1, 2, 3, 4])
numbers.pop()
numbers # deque([1, 2, 3])
letters = deque("abde")
letters.insert(2, "c")
letters #deque(['a', 'b', 'c', 'd', 'e'])
letters.remove("d")
letters # deque(['a', 'b', 'c', 'e'])
deque最大的特点在于我们可以从两端操作:
d = deque([i for i in range(5)])
print(len(d))
# Output: 5
d.popleft() # 删除并返回最左端的元素
# Output: 0
d.pop() # 删除并返回最右端的元素
# Output: 4
print(d)
# Output: deque([1, 2, 3])
d.append(100) # 从最右端添加元素
d.appendleft(-100) # 从最左端添加元素
print(d)
# Output: deque([-100, 1, 2, 3, 100])
再举几个常用的例子,定义一个deque时可以规定它的最大长度,deque和list一样也支持extend方法,方便列表拼接,但是deque提供双向操作:
from collections import deque
d = deque([1,2,3,4,5], maxlen=9) #设置总长度不变
d.extendleft([0]) # 从左端添加一个list
d.extend([6,7,8]) # 从右端拓展一个list
print(d)
Out:deque([0, 1, 2, 3, 4, 5, 6, 7, 8], maxlen=9)
现在d已经有9个元素了,规定的maxlen=9,这个时候如果从左边添加元素,会自动移除最右边的元素,反之也是一样:
d.append(100)
print(d)
d.appendleft(-100)
print(d)
Out: deque([1, 2, 3, 4, 5, 6, 7, 8, 100], maxlen=9)
deque([-100, 1, 2, 3, 4, 5, 6, 7, 8], maxlen=9)
deque还有很多其他的用法:
# custom_queue.py
from collections import deque
class Queue:
def __init__(self):
self._items = deque()
def enqueue(self, item):
self._items.append(item)
def dequeue(self):
try:
return self._items.popleft()
except IndexError:
raise IndexError("dequeue from an empty queue") from None
def __len__(self):
return len(self._items)
def __contains__(self, item):
return item in self._items
def __iter__(self):
yield from self._items
def __reversed__(self):
yield from reversed(self._items)
def __repr__(self):
return f"Queue({list(self._items)})"
numbers = Queue()
# Enqueue items
for number in range(1, 5):
numbers.enqueue(number)
numbers # Queue([1, 2, 3, 4])
限制长度:
four_numbers = deque([0, 1, 2, 3, 4], maxlen=4) # Discard 0
four_numbers # deque([1, 2, 3, 4])
four_numbers.append(5) # Automatically remove 1
four_numbers #deque([2, 3, 4, 5])
four_numbers.append(6) # Automatically remove 2
four_numbers # deque([3, 4, 5, 6])
four_numbers.appendleft(2) # Automatically remove 6
four_numbers # deque([2, 3, 4, 5])
另外一个例子:网页浏览历史
sites = (
"google.com",
"yahoo.com",
"bing.com"
)
pages = deque(maxlen=3)
pages.maxlen #3
for site in sites:
pages.appendleft(site)
pages #deque(['bing.com', 'yahoo.com', 'google.com'])
pages.appendleft("facebook.com")
pages.appendleft("twitter.com")
pages #deque(['twitter.com', 'facebook.com', 'bing.com'])
Namedtuple
基础概念
名称元组。namedtuple可以将元组转换为方便的容器。使用namedtuple,我们不必使用整数索引来访问元组的成员。
我觉得可以把namedtuple 视为 不可变的 字典
应用实例
from collections import namedtuple
Person = namedtuple('Person', 'name age city') # 类似于定义class
xiaobai = Person(name="xiaobai", age=18, city="paris") # 类似于新建对象
print(xiaobai)
Out:Person(name='xiaobai', age=18, city='paris')
print(xiaobai.name)
print(xiaobai.age)
print(xiaobai.city)
out:xiaobai
18
paris
使用场景:读取csv,利用namedtuple 存储,后续添加hash_id 或者切换到dataframe
# create hash id for a namedtuple
cols =['a','b','c']
fields = " ".join(cols)+'hash_id'
Data = namedtuple('Data',fields,defaults=""*len(fields))
raw_result = []
with open('xx.csv',encoding='utf-8') as file:
file_iter = iter(file)
_ = next(file_iter) # Jump first line
for line in file_iter:
each_line = line.strip('\n').split(';')+['']
raw_result.append(Data(*each_line))
def create_hash_id(each):
text = "".join(x.replace(" ","") for x in each.asdict().values()
return each._replace(hash_id=hashlib.sha256(text.encode('utf-8')).hexdigest())
# A list of tuples
a = [Data(...),Data(...)]
new_a = list(map(create_hash_id,a))
# Get values for a namedtuple
a[0]._asdict().values()
# Get fields for a namedtuple
a[0]._fields
# Turn the result to dataframe
df = pd.Dataframe(a,columns=Data._fields)
With class
class DataPoint(namedtuple('DataPoint', ['date', 'value'])):
__slots__ = ()
def __le__(self, other):
return self.value <= other.value
def __lt__(self, other):
return self.value < other.value
def __gt__(self, other):
return self.value > other.value
City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
tokyo
=>City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))
tokyo._fields
=>('name', 'country', 'population', 'coordinates')
LatLong = namedtuple('LatLong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
delhi = City._make(delhi_data)
delhi._asdict()
for key, value in delhi._asdict().items():
print(key + ':', value)
name: Delhi NCR
country: IN
population: 21.935
coordinates: LatLong(lat=28.613889, long=77.208889)
delhi.coordinates.lat # 28.613889
Person = namedtuple("Person", "name age height")
jane = Person("Jane", 25, 1.75)
print(jane._asdict())
jane._asdict()['name']
OrderedDict([('name', 'Jane'), ('age', 25), ('height', 1.75)])
'Jane'
Replacing Fields in Existing namedtuple Instances
from collections import namedtuple
Person = namedtuple("Person", "name age height")
jane = Person("Jane", 25, 1.75)
# After Jane's birthday
jane = jane._replace(age=26)
jane
=>Person(name='Jane', age=26, height=1.75)
Exploring Additional namedtuple Attributes
Person = namedtuple("Person", "name age height")
ExtendedPerson = namedtuple(
"ExtendedPerson",
[*Person._fields, "weight"]
)
jane = ExtendedPerson("Jane", 26, 1.75, 67)
jane
jane.weight
=>67
For loop namedtuple
Person = namedtuple("Person", "name age height weight")
jane = Person("Jane", 26, 1.75, 67)
for field, value in zip(jane._fields, jane):
print(field, "->", value)
name -> Jane
age -> 26
height -> 1.75
weight -> 67
Default Values
Person = namedtuple(
"Person",
"name age height weight country",
defaults=[185,"Canada",75]
)
print(Person._field_defaults)
{'height': 185, 'weight': 'Canada', 'country': 75}
Mike= Person("Mike",24)
Mike
Person(name='Mike', age=24, height=185, weight='Canada', country=75)
Returning Multiple Named Values From Functions
def custom_divmod(a, b):
DivMod = namedtuple("DivMod", "quotient remainder")
return DivMod(*divmod(a, b))
custom_divmod(8, 4)
=>DivMod(quotient=2, remainder=0)
Reducing the Number of Arguments to Functions
User = namedtuple("User", "username client_name plan")
user = User("john", "John Doe", "Premium")
def create_user(db, user):
db.add_user(user.username)
db.complete_user_profile(
user.username,
user.client_name,
user.plan
)
namedtuple vs Data Class
Data Classes can be thought of as “mutable namedtuples with defaults.” (Source) However, it’d be more accurate to say that data classes are like mutable named tuples with type hints. The “defaults” part isn’t a difference at all because named tuples can also have default values for their fields. So, at first glance, the main differences are mutability and type hints.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
height: float
weight: float
country: str = "Canada"
jane = Person("Jane", 25, 1.75, 67)
print(jane.name)
jane.name = "Mike"
jane.name
'Jane'
'Mike'
Add fronzen=True, can't modify data any more
@dataclass(frozen=True)
class Person:
name: str
....
Subclassing namedtuple Classes
from collections import namedtuple
from datetime import date
BasePerson = namedtuple(
"BasePerson",
"name birthdate country",
defaults=["Canada"]
)
class Person(BasePerson):
"""A namedtuple subclass to hold a person's data."""
__slots__ = ()
def __repr__(self):
return f"Name: {self.name}, age: {self.age} years old."
@property
def age(self):
return (date.today() - self.birthdate).days // 365
print(Person.__doc__)
jane = Person("Jane", date(1996, 3, 5))
jane.age
A namedtuple subclass to hold a person's data.
25
OrderedDict
基础概念
“OrderedDict” 本身就是一个dict,但是它的特别之处在于会记录插入dict的key和value的顺序
应用实例
from collections import OrderedDict
d = OrderedDict()
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4
print(d)
Out:OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
如果删除一个key, OrderedDict的顺序不会发生变化:
from collections import OrderedDict
print("Before deleting:\n")
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4
for key, value in od.items():
print(key, value)
print("\nAfter deleting:\n")
od.pop('c')
for key, value in od.items():
print(key, value)
print("\nAfter re-inserting:\n")
od['c'] = 3
for key, value in od.items():
print(key, value)
Out:Before deleting:
('a', 1)
('b', 2)
('c', 3)
('d', 4)
After deleting:
('a', 1)
('b', 2)
('d', 4)
After re-inserting:
('a', 1)
('b', 2)
('d', 4)
('c', 3)
Chainmap
from collections import ChainMap
cmd_proxy = {} # The user doesn't provide a proxy
local_proxy = {"proxy": "proxy.local.com"}
global_proxy = {"proxy": "proxy.global.com"}
config = ChainMap(cmd_proxy, local_proxy, global_proxy)
config.maps
=>[{}, {'proxy': 'proxy.local.com'}, {'proxy': 'proxy.global.com'}]
ChainMap 提供 .new_child() 和 a .parents property属性:
from collections import ChainMap
dad = {"name": "John", "age": 35}
mom = {"name": "Jane", "age": 31}
family = ChainMap(mom, dad)
family
=>ChainMap({'name': 'Jane', 'age': 31}, {'name': 'John', 'age': 35})
son = {"name": "Mike", "age": 0}
family = family.new_child(son)
for person in family.maps:
print(person)
{'name': 'Mike', 'age': 0}
{'name': 'Jane', 'age': 31}
{'name': 'John', 'age': 35}
family.parents
Out[11]:
ChainMap({'name': 'Jane', 'age': 31}, {'name': 'John', 'age': 35})
Last updated
Was this helpful?