2007-10-12

Fun with Polymorphic Variants

ML variants. Wonderful stuff:

# type color = Red | Yellow | Green | Cyan | Blue | Magenta | Other of int * int * int;;

OCaml has polymorphic variants, where you don't need to bother with declaring types like that:

# `Foobar;;
- : [> `Foobar ] = `Foobar


It's almost like it's dynamically typed, except it can't go wrong. Of course, when you do go to define types, you have to deal with the type variables hidden in abbreviations like the above, but that's not the point. Nor is this unfortunate limitation of the type sytem:

# function `Foo i -> `Foo (string_of_int i) | x -> x;;
This expression has type [> `Foo of int ] but is here used with type
  [> `Foo of string ]
Types for tag `Foo are incompatible


No, this post is actually about representations. Since OCaml is statically typed to the point of having no runtime type information, it can represent a variant case with no data using the same bits it would use for an int. And it does this thing — meaning that 0, false, the empty list and so on are all equal; but, unlike in Lisp, there's no way to know that without breaking the type system, as the comparison operators work only on two things of like type.

So, break the type system; you could use Obj.repr and Obj.is_int to verify what I just said, or:

# external unsafe_to_int : 'a -> int = "%identity";;
external unsafe_to_int : 'a -> int = "%identity"
# List.map unsafe_to_int [Red;Yellow;Green;Cyan;Blue;Magenta];;
- : int list = [0; 1; 2; 3; 4; 5]


Well, what about polymorphic variants? You can use different, non-disjoint sets of tags in different places, and it somehow has to work. This is somehow:

# unsafe_to_int `Foobar;;
- : int = 807339693


Gee. I wonder how it does that...

# unsafe_to_int `A;;
- : int = 65
# unsafe_to_int `B;;
- : int = 66
# unsafe_to_int `AA;;
- : int = 14560
# unsafe_to_int `AB;;
- : int = 14561
# unsafe_to_int `BA;;
- : int = 14783


Oh look. It's a simple polynomial hash function. At this point, your first thought should be approximately this:

# `Uctakw == `Uctakw;;
- : bool = true
# `Uctakw == `Foobar;;
- : bool = false
# `Uctakw == `Saazaa;;
Variant tags `Uctakw and `Saazaa have same hash value. Change one of them.

No comments: