NSDictionary底层实现

简介

NSDictionary（字典）是使用 hash表来实现key和value之间的映射和存储的， hash函数设计的好坏影响着数据的查找访问效率。数据在hash表中分布的越均匀，其访问效率越高。而在Objective-C中，通常都是利用NSString 来作为键值，其内部使用的hash函数也是通过使用 NSString对象作为键值来保证数据的各个节点在hash表中均匀分布。

NSDictionary内部结构

源码地址

struct __CFDictionary {
    CFRuntimeBase _base;
    CFIndex _count;		/* number of values */
    CFIndex _capacity;		/* maximum number of values */
    CFIndex _bucketsNum;	/* number of slots */
    uintptr_t _marker;
    void *_context;		/* private */
    CFIndex _deletes;
    CFOptionFlags _xflags;      /* bits for GC */
    const void **_keys;		/* can be NULL if not allocated yet */
    const void **_values;	/* can be NULL if not allocated yet */
};

void CFDictionarySetValue(CFMutableDictionaryRef dict, const void *key, const void *value) {
    //通过match，nomatch来判断是否存在key
    CFIndex match, nomatch;
    const CFDictionaryKeyCallBacks *cb;
    const CFDictionaryValueCallBacks *vcb;
    const void *newKey, *newValue;
    CFAllocatorRef allocator, keysAllocator, valuesAllocator;
    CF_OBJC_FUNCDISPATCH2(__kCFDictionaryTypeID, void, dict, "setObject:forKey:", value, key);
    __CFGenericValidateType(dict, __kCFDictionaryTypeID);
    switch (__CFDictionaryGetType(dict)) {
    case __kCFDictionaryMutable:
	if (dict->_count == dict->_capacity || NULL == dict->_keys) {
	    __CFDictionaryGrow(dict, 1);
	}
	break;
    case __kCFDictionaryFixedMutable:
	break;
    default:
	CFAssert2(__CFDictionaryGetType(dict) != __kCFDictionaryImmutable, __kCFLogAssertion, "%s(): immutable dict %p passed to mutating operation", __PRETTY_FUNCTION__, dict);
	break;
    }
    __CFDictionaryFindBuckets2(dict, key, &match, &nomatch);
    vcb = __CFDictionaryGetValueCallBacks(dict);
    allocator = __CFGetAllocator(dict);
    keysAllocator = (dict->_xflags & __kCFDictionaryWeakKeys) ? kCFAllocatorNull : allocator;
    valuesAllocator = (dict->_xflags & __kCFDictionaryWeakValues) ? kCFAllocatorNull : allocator;
    if (vcb->retain) {
	newValue = (void *)INVOKE_CALLBACK3(((const void *(*)(CFAllocatorRef, const void *, void *))vcb->retain), allocator, value, dict->_context);
    } else {
	newValue = value;
    }
    if (kCFNotFound != match) {
        //key已存在，覆盖newValue
	CF_OBJC_KVO_WILLCHANGE(dict, key);
	if (vcb->release) {
	    INVOKE_CALLBACK3(((void (*)(CFAllocatorRef, const void *, void *))vcb->release), allocator, dict->_values[match], dict->_context);
	}
	CF_WRITE_BARRIER_ASSIGN(valuesAllocator, dict->_values[match], newValue);
	CF_OBJC_KVO_DIDCHANGE(dict, key);
    } else {
	CFAssert3(__kCFDictionaryFixedMutable != __CFDictionaryGetType(dict) || dict->_count < dict->_capacity, __kCFLogAssertion, "%s(): capacity exceeded on fixed-capacity dict %p (capacity = %d)", __PRETTY_FUNCTION__, dict, dict->_capacity);
	cb = __CFDictionaryGetKeyCallBacks(dict);
	if (cb->retain) {
	    newKey = (void *)INVOKE_CALLBACK3(((const void *(*)(CFAllocatorRef, const void *, void *))cb->retain), allocator, key, dict->_context);
	} else {
	    newKey = key;
	}
	if (dict->_marker == (uintptr_t)newKey || ~dict->_marker == (uintptr_t)newKey) {
	    __CFDictionaryFindNewMarker(dict);
	}
      // key不存在，新增value   
	CF_OBJC_KVO_WILLCHANGE(dict, key);
	CF_WRITE_BARRIER_ASSIGN(keysAllocator, dict->_keys[nomatch], newKey);
	CF_WRITE_BARRIER_ASSIGN(valuesAllocator, dict->_values[nomatch], newValue);
	dict->_count++;
	CF_OBJC_KVO_DIDCHANGE(dict, key);
    }
}

查找key存储的位置

static void __CFDictionaryFindBuckets2(CFDictionaryRef dict, const void *key, CFIndex *match, CFIndex *nomatch) {
    const CFDictionaryKeyCallBacks *cb = __CFDictionaryGetKeyCallBacks(dict);
    //获取hash值
    CFHashCode keyHash = cb->hash ? (CFHashCode)INVOKE_CALLBACK2(((CFHashCode (*)(const void *, void *))cb->hash), key, dict->_context) : (CFHashCode)key;
    const void **keys = dict->_keys;
    uintptr_t marker = dict->_marker;
    CFIndex probe = keyHash % dict->_bucketsNum;
    CFIndex probeskip = 1;	// See RemoveValue() for notes before changing this value
    CFIndex start = probe;
    *match = kCFNotFound;
    *nomatch = kCFNotFound;
    for (;;) {
	uintptr_t currKey = (uintptr_t)keys[probe];
       //  空桶，返回nomatch，未匹配
	if (marker == currKey) {		/* empty */
	    if (nomatch) *nomatch = probe;
	    return;
	} else if (~marker == currKey) {	/* deleted */
	    if (nomatch) {
		*nomatch = probe;
		nomatch = NULL;
	    }
	} else if (currKey == (uintptr_t)key || (cb->equal && INVOKE_CALLBACK3((Boolean (*)(const void *, const void *, void*))cb->equal, (void *)currKey, key, dict->_context))) {
        // 匹配成功，返回match
	    *match = probe;
	    return;
	}
     // 未匹配，发生碰撞，将数组下标后移，直到找到空闲区域位置
	probe = probe + probeskip;
    //探针％存储桶的此替代方法假定probeskip始终为正且小于存储桶数。
	if (dict->_bucketsNum <= probe) {
	    probe -= dict->_bucketsNum;
	}
	if (start == probe) {
	    return;
	}
    }
}

通过源码可以看到，当有重复的key插入到字典NSDictionary时，会覆盖旧值，而集合NSSet则什么都不做，保证了里面的元素不会重复。

大家都知道，字典里的键值对key-value是一一对应的关系，从数据结构可以看出，key和value是分别存储在两个不同的数组里，这里面是如何对key、value进行绑定的呢？

首先key利用hash函数算出hash值，然后对数组的长度取模，得到数组下标的位置，同样将这个地址对应到values数组的下标，就匹配到相应的value。注意到上面的这句话，要保证一点，就是keys和values这两个数组的长度要一致。所以扩容的时候，需要对keys和values两个数组一起扩容。

// setValue时判断

void CFDictionarySetValue(CFMutableDictionaryRef dict, const void *key, const void *value) {
    CFIndex match, nomatch;
    const CFDictionaryKeyCallBacks *cb;
    const CFDictionaryValueCallBacks *vcb;
    const void *newKey, *newValue;
    CFAllocatorRef allocator, keysAllocator, valuesAllocator;
    CF_OBJC_FUNCDISPATCH2(__kCFDictionaryTypeID, void, dict, "setObject:forKey:", value, key);
    __CFGenericValidateType(dict, __kCFDictionaryTypeID);
    switch (__CFDictionaryGetType(dict)) {
    case __kCFDictionaryMutable:
      //判断
	if (dict->_count == dict->_capacity || NULL == dict->_keys) {
	    __CFDictionaryGrow(dict, 1);
	}
	break;
    case __kCFDictionaryFixedMutable:
	break;
    default:
	CFAssert2(__CFDictionaryGetType(dict) != __kCFDictionaryImmutable, __kCFLogAssertion, "%s(): immutable dict %p passed to mutating operation", __PRETTY_FUNCTION__, dict);
	break;
    }
    __CFDictionaryFindBuckets2(dict, key, &match, &nomatch);
    vcb = __CFDictionaryGetValueCallBacks(dict);
    allocator = __CFGetAllocator(dict);
    keysAllocator = (dict->_xflags & __kCFDictionaryWeakKeys) ? kCFAllocatorNull : allocator;
    valuesAllocator = (dict->_xflags & __kCFDictionaryWeakValues) ? kCFAllocatorNull : allocator;
    if (vcb->retain) {
	newValue = (void *)INVOKE_CALLBACK3(((const void *(*)(CFAllocatorRef, const void *, void *))vcb->retain), allocator, value, dict->_context);
    } else {
	newValue = value;
    }
    if (kCFNotFound != match) {
	CF_OBJC_KVO_WILLCHANGE(dict, key);
	if (vcb->release) {
	    INVOKE_CALLBACK3(((void (*)(CFAllocatorRef, const void *, void *))vcb->release), allocator, dict->_values[match], dict->_context);
	}
	CF_WRITE_BARRIER_ASSIGN(valuesAllocator, dict->_values[match], newValue);
	CF_OBJC_KVO_DIDCHANGE(dict, key);
    } else {
	CFAssert3(__kCFDictionaryFixedMutable != __CFDictionaryGetType(dict) || dict->_count < dict->_capacity, __kCFLogAssertion, "%s(): capacity exceeded on fixed-capacity dict %p (capacity = %d)", __PRETTY_FUNCTION__, dict, dict->_capacity);
	cb = __CFDictionaryGetKeyCallBacks(dict);
	if (cb->retain) {
	    newKey = (void *)INVOKE_CALLBACK3(((const void *(*)(CFAllocatorRef, const void *, void *))cb->retain), allocator, key, dict->_context);
	} else {
	    newKey = key;
	}
	if (dict->_marker == (uintptr_t)newKey || ~dict->_marker == (uintptr_t)newKey) {
	    __CFDictionaryFindNewMarker(dict);
	}
	CF_OBJC_KVO_WILLCHANGE(dict, key);
	CF_WRITE_BARRIER_ASSIGN(keysAllocator, dict->_keys[nomatch], newKey);
	CF_WRITE_BARRIER_ASSIGN(valuesAllocator, dict->_values[nomatch], newValue);
	dict->_count++;
	CF_OBJC_KVO_DIDCHANGE(dict, key);
    }
}

//扩容

static void __CFDictionaryGrow(CFMutableDictionaryRef dict, CFIndex numNewValues) {
    //保存旧的值
    const void **oldkeys = dict->_keys;
    const void **oldvalues = dict->_values;
    CFIndex idx, oldnbuckets = dict->_bucketsNum;
    CFIndex oldCount = dict->_count;
    CFAllocatorRef allocator = __CFGetAllocator(dict), keysAllocator, valuesAllocator;
    void *keysBase, *valuesBase;
    dict->_capacity = __CFDictionaryRoundUpCapacity(oldCount + numNewValues);
    dict->_bucketsNum = __CFDictionaryNumBucketsForCapacity(dict->_capacity);
    dict->_deletes = 0;
    if (_CFDictionaryIsSplit(dict)) {   // iff GC, use split memory sometimes unscanned memory
	unsigned weakOrStrong = (dict->_xflags & __kCFDictionaryWeakKeys) ? AUTO_MEMORY_UNSCANNED : AUTO_MEMORY_SCANNED;
	void *mem = _CFAllocatorAllocateGC(allocator, dict->_bucketsNum * sizeof(const void *), weakOrStrong);
        CF_WRITE_BARRIER_BASE_ASSIGN(allocator, dict, dict->_keys, mem);
        keysAllocator = (dict->_xflags & __kCFDictionaryWeakKeys) ? kCFAllocatorNull : allocator;  // GC: avoids write-barrier in weak case.
        keysBase = mem;
	
        weakOrStrong = (dict->_xflags & __kCFDictionaryWeakValues) ? AUTO_MEMORY_UNSCANNED : AUTO_MEMORY_SCANNED;
	mem = _CFAllocatorAllocateGC(allocator, dict->_bucketsNum * sizeof(const void *), weakOrStrong);
        CF_WRITE_BARRIER_BASE_ASSIGN(allocator, dict, dict->_values, mem);
        valuesAllocator = (dict->_xflags & __kCFDictionaryWeakValues) ? kCFAllocatorNull : allocator; // GC: avoids write-barrier in weak case.
        valuesBase = mem;
    } else {
        CF_WRITE_BARRIER_BASE_ASSIGN(allocator, dict, dict->_keys, _CFAllocatorAllocateGC(allocator, 2 * dict->_bucketsNum * sizeof(const void *), AUTO_MEMORY_SCANNED));
        dict->_values = (const void **)(dict->_keys + dict->_bucketsNum);
        keysAllocator = valuesAllocator = allocator;
        keysBase = valuesBase = dict->_keys;
    }
    if (NULL == dict->_keys || NULL == dict->_values) HALT;
    if (__CFOASafe) __CFSetLastAllocationEventName(dict->_keys, "CFDictionary (store)");
    // 重新计算keys数据的hash值，存放到新的数组中
    for (idx = dict->_bucketsNum; idx--;) {
        dict->_keys[idx] = (const void *)dict->_marker;
        dict->_values[idx] = 0;
    }
    if (NULL == oldkeys) return;
    
    for (idx = 0; idx < oldnbuckets; idx++) {
        if (dict->_marker != (uintptr_t)oldkeys[idx] && ~dict->_marker != (uintptr_t)oldkeys[idx]) {
            CFIndex match, nomatch;
            __CFDictionaryFindBuckets2(dict, oldkeys[idx], &match, &nomatch);
            CFAssert3(kCFNotFound == match, __kCFLogAssertion, "%s(): two values (%p, %p) now hash to the same slot; mutable value changed while in table or hash value is not immutable", __PRETTY_FUNCTION__, oldkeys[idx], dict->_keys[match]);
            if (kCFNotFound != nomatch) {
                CF_WRITE_BARRIER_BASE_ASSIGN(keysAllocator, keysBase, dict->_keys[nomatch], oldkeys[idx]);
                CF_WRITE_BARRIER_BASE_ASSIGN(valuesAllocator, valuesBase, dict->_values[nomatch], oldvalues[idx]);
            }
        }
    }
    CFAssert1(dict->_count == oldCount, __kCFLogAssertion, "%s(): dict count differs after rehashing; error", __PRETTY_FUNCTION__);
    _CFAllocatorDeallocateGC(allocator, oldkeys);
    if (_CFDictionaryIsSplit(dict)) _CFAllocatorDeallocateGC(allocator, oldvalues);
}

通过上面可以看出，字典把无序和庞大的数据进行了空间hash表对应，下次查找的复杂度接近于O(1)，但是不断扩容的空间就是其弊端，因此开放地址法最好存储的是临时需要，尽快的释放资源。

对于字典NSDictionary设置的key和value，key值会根据特定的hash函数算出hash值，keys和values同样多，利用hash值对数组长度取模，得到其对应的下标index，如果下标已有数据，开放定址法后移插入，如果数组达到阈值，就扩容，然后重新hash插入。这样的机制就把一些不连续的key-value值插入到能建立起关系的hash表中。
查找的时候，key根据hash函数以及数组长度，得到下标，然后根据下标直接访问hash表的keys和values，这样查询速度就可以和连续线性存储的数据一样接近O(1)了。

使用

1	- (void)setObject:(id)anObject forKey:(id <NSCopying>)aKey;

要作为Key值，必须遵循NSCopying协议。也就是说在NSDictionary内部，会对aKey对象Copy一份新的。而anObject 对象在其内部是作为强引用（retain或strong)。所以在MRC下，向该方法发送消息之后，我们会向anObject发送release消息进行释放。

既然知道了作为key值，必须遵循NSCopying协议，说明除了NSString对象之外，我们还可以使用其他类型对象来作为NSDictionary的 key值。不过这还不够，作为key值，该类型还必须继承于NSObject并且要重载一下两个方法：

1 2	- (NSUInteger)hash; - (BOOL)isEqual:(id)object;

其中，hash 方法是用来计算该对象的 hash 值，最终的 hash 值决定了该对象在 hash 表中存储的位置。所以同样，如果想重写该方法，我们尽量设计一个能让数据分布均匀的 hash 函数。

所以如果对象key的hash值相同，那在hash表里面的对应的value值是相同的(value值被更新了)

isEqual方法是为了通过hash值来找到对象在hash表中的位置。

NSDictionary的KVC实现

@implementation NSDictionary (NSKeyValueCoding)
- (id)valueForKey:(NSString*)key {
    if([key hasPrefix:@"@"])
        return [super valueForKey:[key substringFromIndex:1]];
    return [self objectForKey:key];
}
- (void)setValue:(id)value forKey:(NSString*)key {
    [NSException raise:NSInvalidArgumentException format:@"%@ called on immutable dictionary %@", NSStringFromSelector(_cmd), self];
}
@end
@implementation NSMutableDictionary (NSKeyValueCoding)
- (void)setValue:(id)value forKey:(NSString*)key {
    if(value)
        [self setObject:value forKey:key];
    else
        [self removeObjectForKey:key];
}
@end

setValue和setObject的区别

1 2	- (void)setObject:(ObjectType)anObject forKey:(KeyType <NSCopying>)aKey; - (void)setValue:(nullable ObjectType)value forKey:(NSString *)key;

setObject: ForKey:是NSMutableDictionary特有的；setValue: ForKey:是KVC的主要方法。

(1) setValue: ForKey:的value是可以为nil的（但是当value为nil的时候，会自动调用removeObject：forKey方法）；
setObject: ForKey:的value则不可以为nil。

(2) setValue: ForKey:的key必须是不为nil的字符串类型；

setObject: ForKey:的key可以是不为nil的所有继承NSCopying的类型。