NSDictionary底层实现

简介

NSDictionary(字典)是使用 hash表来实现key和value之间的映射和存储的, hash函数设计的好坏影响着数据的查找访问效率。数据在hash表中分布的越均匀,其访问效率越高。而在Objective-C中,通常都是利用NSString 来作为键值,其内部使用的hash函数也是通过使用 NSString对象作为键值来保证数据的各个节点在hash表中均匀分布。

NSDictionary内部结构

源码地址

1
2
3
4
5
6
7
8
9
10
11
12
struct __CFDictionary {
CFRuntimeBase _base;
CFIndex _count; /* number of values */
CFIndex _capacity; /* maximum number of values */
CFIndex _bucketsNum; /* number of slots */
uintptr_t _marker;
void *_context; /* private */
CFIndex _deletes;
CFOptionFlags _xflags; /* bits for GC */
const void **_keys; /* can be NULL if not allocated yet */
const void **_values; /* can be NULL if not allocated yet */
};
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
void CFDictionarySetValue(CFMutableDictionaryRef dict, const void *key, const void *value) {
//通过match,nomatch来判断是否存在key
CFIndex match, nomatch;
const CFDictionaryKeyCallBacks *cb;
const CFDictionaryValueCallBacks *vcb;
const void *newKey, *newValue;
CFAllocatorRef allocator, keysAllocator, valuesAllocator;
CF_OBJC_FUNCDISPATCH2(__kCFDictionaryTypeID, void, dict, "setObject:forKey:", value, key);
__CFGenericValidateType(dict, __kCFDictionaryTypeID);
switch (__CFDictionaryGetType(dict)) {
case __kCFDictionaryMutable:
if (dict->_count == dict->_capacity || NULL == dict->_keys) {
__CFDictionaryGrow(dict, 1);
}
break;
case __kCFDictionaryFixedMutable:
break;
default:
CFAssert2(__CFDictionaryGetType(dict) != __kCFDictionaryImmutable, __kCFLogAssertion, "%s(): immutable dict %p passed to mutating operation", __PRETTY_FUNCTION__, dict);
break;
}
__CFDictionaryFindBuckets2(dict, key, &match, &nomatch);
vcb = __CFDictionaryGetValueCallBacks(dict);
allocator = __CFGetAllocator(dict);
keysAllocator = (dict->_xflags & __kCFDictionaryWeakKeys) ? kCFAllocatorNull : allocator;
valuesAllocator = (dict->_xflags & __kCFDictionaryWeakValues) ? kCFAllocatorNull : allocator;
if (vcb->retain) {
newValue = (void *)INVOKE_CALLBACK3(((const void *(*)(CFAllocatorRef, const void *, void *))vcb->retain), allocator, value, dict->_context);
} else {
newValue = value;
}
if (kCFNotFound != match) {
//key已存在,覆盖newValue
CF_OBJC_KVO_WILLCHANGE(dict, key);
if (vcb->release) {
INVOKE_CALLBACK3(((void (*)(CFAllocatorRef, const void *, void *))vcb->release), allocator, dict->_values[match], dict->_context);
}
CF_WRITE_BARRIER_ASSIGN(valuesAllocator, dict->_values[match], newValue);
CF_OBJC_KVO_DIDCHANGE(dict, key);
} else {
CFAssert3(__kCFDictionaryFixedMutable != __CFDictionaryGetType(dict) || dict->_count < dict->_capacity, __kCFLogAssertion, "%s(): capacity exceeded on fixed-capacity dict %p (capacity = %d)", __PRETTY_FUNCTION__, dict, dict->_capacity);
cb = __CFDictionaryGetKeyCallBacks(dict);
if (cb->retain) {
newKey = (void *)INVOKE_CALLBACK3(((const void *(*)(CFAllocatorRef, const void *, void *))cb->retain), allocator, key, dict->_context);
} else {
newKey = key;
}
if (dict->_marker == (uintptr_t)newKey || ~dict->_marker == (uintptr_t)newKey) {
__CFDictionaryFindNewMarker(dict);
}
// key不存在,新增value
CF_OBJC_KVO_WILLCHANGE(dict, key);
CF_WRITE_BARRIER_ASSIGN(keysAllocator, dict->_keys[nomatch], newKey);
CF_WRITE_BARRIER_ASSIGN(valuesAllocator, dict->_values[nomatch], newValue);
dict->_count++;
CF_OBJC_KVO_DIDCHANGE(dict, key);
}
}

查找key存储的位置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
static void __CFDictionaryFindBuckets2(CFDictionaryRef dict, const void *key, CFIndex *match, CFIndex *nomatch) {
const CFDictionaryKeyCallBacks *cb = __CFDictionaryGetKeyCallBacks(dict);
//获取hash值
CFHashCode keyHash = cb->hash ? (CFHashCode)INVOKE_CALLBACK2(((CFHashCode (*)(const void *, void *))cb->hash), key, dict->_context) : (CFHashCode)key;
const void **keys = dict->_keys;
uintptr_t marker = dict->_marker;
CFIndex probe = keyHash % dict->_bucketsNum;
CFIndex probeskip = 1; // See RemoveValue() for notes before changing this value
CFIndex start = probe;
*match = kCFNotFound;
*nomatch = kCFNotFound;
for (;;) {
uintptr_t currKey = (uintptr_t)keys[probe];
// 空桶,返回nomatch,未匹配
if (marker == currKey) { /* empty */
if (nomatch) *nomatch = probe;
return;
} else if (~marker == currKey) { /* deleted */
if (nomatch) {
*nomatch = probe;
nomatch = NULL;
}
} else if (currKey == (uintptr_t)key || (cb->equal && INVOKE_CALLBACK3((Boolean (*)(const void *, const void *, void*))cb->equal, (void *)currKey, key, dict->_context))) {
// 匹配成功,返回match
*match = probe;
return;
}
// 未匹配,发生碰撞,将数组下标后移,直到找到空闲区域位置
probe = probe + probeskip;
//探针%存储桶的此替代方法假定probeskip始终为正且小于存储桶数。
if (dict->_bucketsNum <= probe) {
probe -= dict->_bucketsNum;
}
if (start == probe) {
return;
}
}
}

通过源码可以看到,当有重复的key插入到字典NSDictionary时,会覆盖旧值,而集合NSSet则什么都不做,保证了里面的元素不会重复。

大家都知道,字典里的键值对key-value是一一对应的关系,从数据结构可以看出,key和value是分别存储在两个不同的数组里,这里面是如何对key、value进行绑定的呢?

首先key利用hash函数算出hash值,然后对数组的长度取模,得到数组下标的位置,同样将这个地址对应到values数组的下标,就匹配到相应的value。 注意到上面的这句话,要保证一点,就是keys和values这两个数组的长度要一致。所以扩容的时候,需要对keys和values两个数组一起扩容。

// setValue时判断

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
void CFDictionarySetValue(CFMutableDictionaryRef dict, const void *key, const void *value) {
CFIndex match, nomatch;
const CFDictionaryKeyCallBacks *cb;
const CFDictionaryValueCallBacks *vcb;
const void *newKey, *newValue;
CFAllocatorRef allocator, keysAllocator, valuesAllocator;
CF_OBJC_FUNCDISPATCH2(__kCFDictionaryTypeID, void, dict, "setObject:forKey:", value, key);
__CFGenericValidateType(dict, __kCFDictionaryTypeID);
switch (__CFDictionaryGetType(dict)) {
case __kCFDictionaryMutable:
//判断
if (dict->_count == dict->_capacity || NULL == dict->_keys) {
__CFDictionaryGrow(dict, 1);
}
break;
case __kCFDictionaryFixedMutable:
break;
default:
CFAssert2(__CFDictionaryGetType(dict) != __kCFDictionaryImmutable, __kCFLogAssertion, "%s(): immutable dict %p passed to mutating operation", __PRETTY_FUNCTION__, dict);
break;
}
__CFDictionaryFindBuckets2(dict, key, &match, &nomatch);
vcb = __CFDictionaryGetValueCallBacks(dict);
allocator = __CFGetAllocator(dict);
keysAllocator = (dict->_xflags & __kCFDictionaryWeakKeys) ? kCFAllocatorNull : allocator;
valuesAllocator = (dict->_xflags & __kCFDictionaryWeakValues) ? kCFAllocatorNull : allocator;
if (vcb->retain) {
newValue = (void *)INVOKE_CALLBACK3(((const void *(*)(CFAllocatorRef, const void *, void *))vcb->retain), allocator, value, dict->_context);
} else {
newValue = value;
}
if (kCFNotFound != match) {
CF_OBJC_KVO_WILLCHANGE(dict, key);
if (vcb->release) {
INVOKE_CALLBACK3(((void (*)(CFAllocatorRef, const void *, void *))vcb->release), allocator, dict->_values[match], dict->_context);
}
CF_WRITE_BARRIER_ASSIGN(valuesAllocator, dict->_values[match], newValue);
CF_OBJC_KVO_DIDCHANGE(dict, key);
} else {
CFAssert3(__kCFDictionaryFixedMutable != __CFDictionaryGetType(dict) || dict->_count < dict->_capacity, __kCFLogAssertion, "%s(): capacity exceeded on fixed-capacity dict %p (capacity = %d)", __PRETTY_FUNCTION__, dict, dict->_capacity);
cb = __CFDictionaryGetKeyCallBacks(dict);
if (cb->retain) {
newKey = (void *)INVOKE_CALLBACK3(((const void *(*)(CFAllocatorRef, const void *, void *))cb->retain), allocator, key, dict->_context);
} else {
newKey = key;
}
if (dict->_marker == (uintptr_t)newKey || ~dict->_marker == (uintptr_t)newKey) {
__CFDictionaryFindNewMarker(dict);
}
CF_OBJC_KVO_WILLCHANGE(dict, key);
CF_WRITE_BARRIER_ASSIGN(keysAllocator, dict->_keys[nomatch], newKey);
CF_WRITE_BARRIER_ASSIGN(valuesAllocator, dict->_values[nomatch], newValue);
dict->_count++;
CF_OBJC_KVO_DIDCHANGE(dict, key);
}
}

//扩容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
static void __CFDictionaryGrow(CFMutableDictionaryRef dict, CFIndex numNewValues) {
//保存旧的值
const void **oldkeys = dict->_keys;
const void **oldvalues = dict->_values;
CFIndex idx, oldnbuckets = dict->_bucketsNum;
CFIndex oldCount = dict->_count;
CFAllocatorRef allocator = __CFGetAllocator(dict), keysAllocator, valuesAllocator;
void *keysBase, *valuesBase;
dict->_capacity = __CFDictionaryRoundUpCapacity(oldCount + numNewValues);
dict->_bucketsNum = __CFDictionaryNumBucketsForCapacity(dict->_capacity);
dict->_deletes = 0;
if (_CFDictionaryIsSplit(dict)) { // iff GC, use split memory sometimes unscanned memory
unsigned weakOrStrong = (dict->_xflags & __kCFDictionaryWeakKeys) ? AUTO_MEMORY_UNSCANNED : AUTO_MEMORY_SCANNED;
void *mem = _CFAllocatorAllocateGC(allocator, dict->_bucketsNum * sizeof(const void *), weakOrStrong);
CF_WRITE_BARRIER_BASE_ASSIGN(allocator, dict, dict->_keys, mem);
keysAllocator = (dict->_xflags & __kCFDictionaryWeakKeys) ? kCFAllocatorNull : allocator; // GC: avoids write-barrier in weak case.
keysBase = mem;
weakOrStrong = (dict->_xflags & __kCFDictionaryWeakValues) ? AUTO_MEMORY_UNSCANNED : AUTO_MEMORY_SCANNED;
mem = _CFAllocatorAllocateGC(allocator, dict->_bucketsNum * sizeof(const void *), weakOrStrong);
CF_WRITE_BARRIER_BASE_ASSIGN(allocator, dict, dict->_values, mem);
valuesAllocator = (dict->_xflags & __kCFDictionaryWeakValues) ? kCFAllocatorNull : allocator; // GC: avoids write-barrier in weak case.
valuesBase = mem;
} else {
CF_WRITE_BARRIER_BASE_ASSIGN(allocator, dict, dict->_keys, _CFAllocatorAllocateGC(allocator, 2 * dict->_bucketsNum * sizeof(const void *), AUTO_MEMORY_SCANNED));
dict->_values = (const void **)(dict->_keys + dict->_bucketsNum);
keysAllocator = valuesAllocator = allocator;
keysBase = valuesBase = dict->_keys;
}
if (NULL == dict->_keys || NULL == dict->_values) HALT;
if (__CFOASafe) __CFSetLastAllocationEventName(dict->_keys, "CFDictionary (store)");
// 重新计算keys数据的hash值,存放到新的数组中
for (idx = dict->_bucketsNum; idx--;) {
dict->_keys[idx] = (const void *)dict->_marker;
dict->_values[idx] = 0;
}
if (NULL == oldkeys) return;
for (idx = 0; idx < oldnbuckets; idx++) {
if (dict->_marker != (uintptr_t)oldkeys[idx] && ~dict->_marker != (uintptr_t)oldkeys[idx]) {
CFIndex match, nomatch;
__CFDictionaryFindBuckets2(dict, oldkeys[idx], &match, &nomatch);
CFAssert3(kCFNotFound == match, __kCFLogAssertion, "%s(): two values (%p, %p) now hash to the same slot; mutable value changed while in table or hash value is not immutable", __PRETTY_FUNCTION__, oldkeys[idx], dict->_keys[match]);
if (kCFNotFound != nomatch) {
CF_WRITE_BARRIER_BASE_ASSIGN(keysAllocator, keysBase, dict->_keys[nomatch], oldkeys[idx]);
CF_WRITE_BARRIER_BASE_ASSIGN(valuesAllocator, valuesBase, dict->_values[nomatch], oldvalues[idx]);
}
}
}
CFAssert1(dict->_count == oldCount, __kCFLogAssertion, "%s(): dict count differs after rehashing; error", __PRETTY_FUNCTION__);
_CFAllocatorDeallocateGC(allocator, oldkeys);
if (_CFDictionaryIsSplit(dict)) _CFAllocatorDeallocateGC(allocator, oldvalues);
}

通过上面可以看出,字典把无序和庞大的数据进行了空间hash表对应,下次查找的复杂度接近于O(1),但是不断扩容的空间就是其弊端,因此开放地址法最好存储的是临时需要,尽快的释放资源。

对于字典NSDictionary设置的key和value,key值会根据特定的hash函数算出hash值,keys和values同样多,利用hash值对数组长度取模,得到其对应的下标index,如果下标已有数据,开放定址法后移插入,如果数组达到阈值,就扩容,然后重新hash插入。这样的机制就把一些不连续的key-value值插入到能建立起关系的hash表中。
查找的时候,key根据hash函数以及数组长度,得到下标,然后根据下标直接访问hash表的keys和values,这样查询速度就可以和连续线性存储的数据一样接近O(1)了。

使用

1
- (void)setObject:(id)anObject forKey:(id <NSCopying>)aKey;

要作为Key值,必须遵循NSCopying协议。也就是说在NSDictionary内部,会对aKey对象Copy一份新的。而anObject 对象在其内部是作为强引用(retain或strong)。所以在MRC下,向该方法发送消息之后,我们会向anObject发送release消息进行释放。

既然知道了作为key值,必须遵循NSCopying协议,说明除了NSString对象之外,我们还可以使用其他类型对象来作为NSDictionary的 key值。不过这还不够,作为key值,该类型还必须继承于NSObject并且要重载一下两个方法:

1
2
- (NSUInteger)hash;
- (BOOL)isEqual:(id)object;

其中,hash 方法是用来计算该对象的 hash 值,最终的 hash 值决定了该对象在 hash 表中存储的位置。所以同样,如果想重写该方法,我们尽量设计一个能让数据分布均匀的 hash 函数。

所以如果对象key的hash值相同,那在hash表里面的对应的value值是相同的(value值被更新了)

isEqual方法是为了通过hash值来找到对象在hash表中的位置。

  • NSDictionary的KVC实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
@implementation NSDictionary (NSKeyValueCoding)
- (id)valueForKey:(NSString*)key {
if([key hasPrefix:@"@"])
return [super valueForKey:[key substringFromIndex:1]];
return [self objectForKey:key];
}
- (void)setValue:(id)value forKey:(NSString*)key {
[NSException raise:NSInvalidArgumentException format:@"%@ called on immutable dictionary %@", NSStringFromSelector(_cmd), self];
}
@end
@implementation NSMutableDictionary (NSKeyValueCoding)
- (void)setValue:(id)value forKey:(NSString*)key {
if(value)
[self setObject:value forKey:key];
else
[self removeObjectForKey:key];
}
@end
  • setValue和setObject的区别
1
2
- (void)setObject:(ObjectType)anObject forKey:(KeyType <NSCopying>)aKey;
- (void)setValue:(nullable ObjectType)value forKey:(NSString *)key;

setObject: ForKey:是NSMutableDictionary特有的;setValue: ForKey:是KVC的主要方法。

(1) setValue: ForKey:的value是可以为nil的(但是当value为nil的时候,会自动调用removeObject:forKey方法);
setObject: ForKey:的value则不可以为nil。

(2) setValue: ForKey:的key必须是不为nil的字符串类型;

setObject: ForKey:的key可以是不为nil的所有继承NSCopying的类型。