.NET에서 null의 해시 코드가 항상 0이면
집합 멤버로 System.Collections.Generic.HashSet<>
accept null
와 같은 컬렉션이 주어지면 해시 코드가 무엇인지 물어볼 수 null
있습니다. 프레임 워크가 0
다음을 사용하는 것 같습니다 .
// nullable struct type
int? i = null;
i.GetHashCode(); // gives 0
EqualityComparer<int?>.Default.GetHashCode(i); // gives 0
// class type
CultureInfo c = null;
EqualityComparer<CultureInfo>.Default.GetHashCode(c); // gives 0
이것은 nullable 열거 형에서 (약간) 문제가 될 수 있습니다. 우리가 정의한다면
enum Season
{
Spring,
Summer,
Autumn,
Winter,
}
다음은 Nullable<Season>
(또한 Season?
), 즉 불과 5 값,하지만 그들 중 두 가지를 취할 수 null
와 Season.Spring
동일한 해시 코드가 있습니다.
다음과 같이 "더 나은"동등 비교자를 작성하고 싶은 유혹이 있습니다.
class NewNullEnumEqComp<T> : EqualityComparer<T?> where T : struct
{
public override bool Equals(T? x, T? y)
{
return Default.Equals(x, y);
}
public override int GetHashCode(T? x)
{
return x.HasValue ? Default.GetHashCode(x) : -1;
}
}
그러나 해시 코드가 null
있어야 하는 이유 가 0
있습니까?
수정 / 추가 :
어떤 사람들은 이것이 재정의에 관한 것이라고 생각하는 것 같습니다 Object.GetHashCode()
. 실제로는 그렇지 않습니다. (하지만 .NET의 작성자 는 관련 이GetHashCode()
있는 Nullable<>
구조체 에서 재정의 했습니다 .) 매개 변수 없는 사용자 작성 구현은 우리가 찾는 해시 코드를 가진 객체가 .GetHashCode()
null
이것은 추상 메서드를 EqualityComparer<T>.GetHashCode(T)
구현하거나 인터페이스 메서드를 구현하는 것 IEqualityComparer<T>.GetHashCode(T)
입니다. 이제 MSDN에 대한 이러한 링크를 만드는 동안 이러한 메서드 ArgumentNullException
가 유일한 인수가 null
. 이것은 확실히 MSDN에서 실수입니까? .NET 자체 구현은 예외를 발생시키지 않습니다. 이 경우에 던지는 효과적으로 추가하는 시도 휴식 것이 null
A를을 HashSet<>
. 항목을 HashSet<>
다룰 때 특별한 일을하지 않는 한 null
(나는 그것을 테스트해야 할 것입니다).
새로운 수정 / 추가 :
Now I tried debugging. With HashSet<>
, I can confirm that with the default equality comparer, the values Season.Spring
and null
will end in the same bucket. This can be determined by very carefully inspecting the private array members m_buckets
and m_slots
. Note that the indices are always, by design, offset by one.
The code I gave above does not, however, fix this. As it turns out, HashSet<>
will never even ask the equality comparer when the value is null
. This is from the source code of HashSet<>
:
// Workaround Comparers that throw ArgumentNullException for GetHashCode(null).
private int InternalGetHashCode(T item) {
if (item == null) {
return 0;
}
return m_comparer.GetHashCode(item) & Lower31BitMask;
}
This means that, at least for HashSet<>
, it is not even possible to change the hash of null
. Instead, a solution is to change the hash of all the other values, like this:
class NewerNullEnumEqComp<T> : EqualityComparer<T?> where T : struct
{
public override bool Equals(T? x, T? y)
{
return Default.Equals(x, y);
}
public override int GetHashCode(T? x)
{
return x.HasValue ? 1 + Default.GetHashCode(x) : /* not seen by HashSet: */ 0;
}
}
So long as the hash code returned for nulls is consistent for the type, you should be fine. The only requirement for a hash code is that two objects that are considered equal share the same hash code.
Returning 0 or -1 for null, so long as you choose one and return it all the time, will work. Obviously, non-null hash codes should not return whatever value you use for null.
Similar questions:
What should GetHashCode return when object's identifier is null?
The "Remarks" of this
MSDN entry goes into more detail around the hash code. Poignantly, the documentation does not provide any coverage or discussion of null values
at all - not even in the community content.
To address your issue with the enum, either re-implement the hash code to return non-zero, add a default "unknown" enum entry equivalent to null, or simply don't use nullable enums.
Interesting find, by the way.
Another problem I see with this generally is that the hash code cannot represent a 4 byte or larger type that is nullable without at least one collision (more as the type size increases). For example, the hash code of an int is just the int, so it uses the full int range. What value in that range do you choose for null? Whatever one you pick will collide with the value's hash code itself.
Collisions in and of themselves are not necessarily a problem, but you need to know they are there. Hash codes are only used in some circumstances. As stated in the docs on MSDN, hash codes are not guaranteed to return different values for different objects so shouldn't be expected to.
Bear in mind that the hash code is used as a first-step in determining equality only, and [is/should]never (be) used as a de-facto determination as to whether two objects are equal.
If two objects' hash codes are not equal then they are treated as not equal (because we assume that the unerlying implementation is correct - i.e. we don't second-guess that). If they have the same hash code, then they should then be checked for actual equality which, in your case, the null
and the enum value will fail.
As a result - using zero is as good as any other value in the general case.
Sure, there will be situations, like your enum, where this zero is shared with a real value's hash code. The question is whether, for you, the miniscule overhead of an additional comparison causes problems.
If so, then define your own comparer for the case of the nullable for your particular type, and ensure that a null value always yields a hash code that is always the same (of course!) and a value that cannot be yielded by the underlying type's own hash code algorithm. For your own types, this is do-able. For others - good luck :)
It doesn't have to be zero -- you could make it 42 if you wanted to.
All that matters is consistency during the execution of the program.
It's just the most obvious representation, because null
is often represented as a zero internally. Which means, while debugging, if you see a hash code of zero, it might prompt you to think, "Hmm.. was this a null reference issue?"
Note that if you use a number like 0xDEADBEEF
, then someone could say you're using a magic number... and you kind of would be. (You could say zero is a magic number too, and you'd be kind of right... except that it's so widely used as to be somewhat of an exception to the rule.)
Good question.
I just tried to code this:
enum Season
{
Spring,
Summer,
Autumn,
Winter,
}
and execute this like this:
Season? v = null;
Console.WriteLine(v);
it returns null
if I do, instead normal
Season? v = Season.Spring;
Console.WriteLine((int)v);
it return 0
, as expected, or simple Spring if we avoid casting to int
.
So.. if you do the following:
Season? v = Season.Spring;
Season? vnull = null;
if(vnull == v) // never TRUE
EDIT
From MSDN
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values
In other words: if two objects have same hash code that doesn't mean that they are equal, cause real equality is determined by Equals.
From MSDN again:
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.
But is there any reason why the hash code of null should be 0?
It could have been anything at all. I tend to agree that 0 wasn't necessarily the best choice, but it's one that probably leads to fewest bugs.
A hash function absolutely must return the same hash for the same value. Once there exists a component that does this, this is really the only valid value for the hash of null
. If there were a constant for this, like, hm, object.HashOfNull
, then someone implementing an IEqualityComparer
would have to know to use that value. If they don't think about it, the chance they'll use 0 is slightly higher than every other value, I reckon.
at least for HashSet<>, it is not even possible to change the hash of null
As mentioned above, I think it's completely impossible full stop, just because there exist types which already follow the convention that hash of null is 0.
It is 0 for the sake of simplicity. There is no such hard requirement. You only need to ensure the general requirements of hash coding.
For example, you need to make sure that if two objects are equal, their hashcodes must always be equal too. Therefore, different hashcodes must always represent different objects (but it's not necessarily true vice versa: two different objects may have the same hashcode, even though if this happens often then this is not a good quality hash function -- it doesn't have a good collision resistance).
Of course, I restricted my answer to requirements of mathematical nature. There are .NET-specific, technical conditions as well, which you can read here. 0 for a null value is not among them.
So this could be avoided by using an Unknown
enum value (although it seems a bit weird for a Season
to be unknown). So something like this would negate this issue:
public enum Season
{
Unknown = 0,
Spring,
Summer,
Autumn,
Winter
}
Season some_season = Season.Unknown;
int code = some_season.GetHashCode(); // 0
some_season = Season.Autumn;
code = some_season.GetHashCode(); // 3
Then you would have unique hash code values for each season.
Personally I find using nullable values a bit awkward and try to avoid them whenever I can. Your issue is just another reason. Sometimes they are very handy though but my rule of thumb is not to mix value types with null if possible simply because these are from two different worlds. In .NET framework they seem to do the same - a lot of value types provide TryParse
method which is a way of separating values from no value (null
).
In your particular case it is easy to get rid of the problem because you handle your own Season
type.
(Season?)null
to me means 'season is not specified' like when you have a webform where some fields are not required. In my opinion it is better to specify that special 'value' in the enum
itself rather than use a bit clunky Nullable<T>
. It will be faster (no boxing) easier to read (Season.NotSpecified
vs null
) and will solve your problem with hash codes.
Of course for other types, like int
you can't expand value domain and to denominate one of the values as special is not always possible. But with int?
hash code collision is much smaller problem, if at all.
참고URL : https://stackoverflow.com/questions/10723458/should-the-hash-code-of-null-always-be-zero-in-net
'code' 카테고리의 다른 글
웹 사이트의 관리 섹션을 보호하기위한 모범 사례는 무엇입니까? (0) | 2020.09.17 |
---|---|
Eclipse에서 철자 검사를 끄십시오. (0) | 2020.09.17 |
POST 작업에서 뷰 모델을 도메인 모델에 다시 매핑하는 방법은 무엇입니까? (0) | 2020.09.17 |
자바 스크립트에서 "약한 참조"를 만들 수 있습니까? (0) | 2020.09.17 |
GD vs ImageMagick vs Gmagick for jpg? (0) | 2020.09.17 |