Generics puzzler – array construction

21

So, here’s my generics puzzler for the day. I’m not sure whether it’s possible to implement this method in a type-safe way or not:

<T> T[] mergeArrays(T[]... arrays)

The problem arises when you need to construct the new array, which requires creating an array of the proper type T. The first inclination is to write code that looks like this:

    public static <T> T[] mergeArrays(T[]... arrays) {
        // Determine total length
        int length = 0;
        for(T[] array : arrays) {
            length += array.length;
        }

        // Create result array and copy data into it
        T[] result = new T[length];           // BROKEN: CANNOT USE T ON RHS
        for(int start=0,i=0; i<arrays.length; i++) {
            T[] array = arrays[i];
            System.arraycopy(array, 0, result, start, array.length);
            start += array.length;
        }

        return result;
    }

You might try writing a variant of this with either raw collections or an Object array, then cast to the desired array type:

    public static <T> T[] mergeArrays(T[]... arrays) {
        List<T> result = new ArrayList<T>();

        for(T[] array : arrays) {
            for(T item : array) {
                result.add(item);
            }
        }
        // BROKEN: This generates a compiler warning about checking against an erased type
        return (T[])result.toArray();
    }

That warning is important as you will receive an Object[], not a T[] array, meaning that callers do not receive an array of the expected type. Same problem occurs in the array form – you could construct an array of type Object[] and cast to T[], but you will receive an Object[] from the method, not a String[] or whatever specific type you want.

I think you might be able to use type tokens or super type tokens to extract the generic type parameter, then construct the array reflectively with java.lang.reflect.Array#newInstance(). But I haven’t figured out the magic invocation on how to do this.

So I ask you great Internet in the sky, is this possible?

Comments

21 Responses to “Generics puzzler – array construction”
  1. Pass an ArrayFactory[T] as the first parameter.

    interface ArrayFactory[T]{ T[] construct(int size); }

    I automatically thought of a type token, but I can’t see the right method call to get from a Class[T] to a T[] without casting. Array.newInstance returns an Object..

  2. Alex says:

    Yeah, that would work, but I was trying to avoid burdening the client with needing to construct a factory. I tried using a variant of Bob Lee’s TypeReference to skin this cat but could not get it to do what I wanted.

    Of course, I’m unfamiliar enough with the generics part of the reflection api that I was mostly flailing about with a debugger and code completion. :)

  3. Cameron says:

    IIRC, if you pass T.class you can do (T[])Array.newInstance(clazz, length).

  4. Alex says:

    Yep, that should work, BUT it requires the caller to pass you the class for no obvious reason (to them). It’s jarring in the same way that constructing EnumMaps is jarring as you are passing what seems to the caller to be redundant information.

  5. Curt Cox says:

    Couldn’t you just use an array of the most specific type that will hold all the elements?
    http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Class.html#getComponentType()
    http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Class.html#isAssignableFrom(java.lang.Class)
    Object[] is the least specific type. array.getClass().getComponentType() and array.getClass().isAssignableFrom() seem to have all the information you need.

  6. Carsten says:

    As arrays are covariant any array you receive can be of T or extending T. In general you can thus not infer T from the parameters alone. Best you can do is to calculate the supremum of the array types (ugly) and finally to kill it: Array objects can be of an interface type, thus the supremum has to be calculated over the lattice of classes and interfaces so that I fear that is even possible to construct a case where the supremum is not uniquely defined. This is even a nicer puzzler…

  7. Mike Kaufman says:

    The way I’ve ended up doing this is to create a List with the desired elements, and then use the list’s ” T[] toArray(T[] a)” method to produce the final array – passing it one of the given arrays as the argument from which it determines its return type.

    The trick is that by passing an existing array, you avoid the problem of needing to explicitly construct a new array of (erased) type T. Creating the List is overhead, but should be acceptable for most purposes.

    If you want to be sure that the returned array is a new array instance rather than being the actual array that you passed to toArray with its elements replaced (as toArray will do this if the given array is big enough, to avoid the overhead of always creating a new array), you need to make sure that the array argument that you pass to toArray is too small to hold the entire resulting list. So if you’re combining elements of multiple arrays and they can be zero-length, you should make sure you use one of the zero-length arrays if any are present rather than a non-zero-length one (i.e. in case that’s the only non-zero-length array and is thus big enough to hold the entire result). Or maybe just clone one of the arrays first.

    So the code I have for concatenating two arrays (with some irrelevant bits omitted) is:

    public static E[] concatenateArrays(final E[] a, final E[] b) {
    int resultLength = a.length + b.length;
    List allElements = new ArrayList(resultLength);
    allElements.addAll(Arrays.asList(a));
    allElements.addAll(Arrays.asList(b));
    E[] modelArray = (b.length == 0 ? b : a);
    return allElements.toArray(modelArray);
    }

    Don’t see any problem adjusting this to cater for your arbitrary number of arrays.

  8. Carsten says:

    @Mike: Try concatenateArrays(new Number[]{1,2}, new Double[]{});

    ArrayStoreException.

  9. Michael says:

    FYI, this is easily done in .NET since they don’t use type erasure. See http://fupeg.blogspot.com/2007/02/generics-puzzler.html .

  10. Weiqi Gao says:

    How about something like this:

    <T, U extends T> T[] mergeArrays(T[] array, U[]… arrays)

    And let the first argument, which must be non-null, determine the return type.

    You can then to a reflective array creation based on array.getClass().getComponentType().

    You have to do an unsafe cast, but the cast really is safe.

  11. Weiqi Gao says:

    The “left-angle T, U extends T right-angle” part of the signature is stripped off in the above.

  12. Alex says:

    Weiqi, I inserted the lost types for you. Let me know if I mis-interpreted. Darn angle brackets never work right – it’s almost too painful to blog about generics. :)

  13. Alex says:

    Ok, I tried the getClass().getComponentType() idea but it doesn’t seem to work. But that could be due to intense stupidity. I tried defining with T and U as Weiqi suggested, then did this to construct the array:

    Class< ?> itemClass = firstArray.getClass().getComponentType();
    T[] result = (T[])Array.newInstance(itemClass, length);

    but that still returns me an Object[] from the method, not a String[] (or appropriate) array. So, the caller has to cast the result anyways, which is no better than the original List version.

    I haven’t tried Mike’s idea above, but from glancing at ArrayList#toArray(), seems like that code is doing this exact same kind of thing. The only difference is that ArrayList is actually reusing the passed in T[] reference, not creating a new one. Seems like that may be a critical difference. I’ll play with it again later.

    Thanks everyone for all the suggestions!

  14. Mike Kaufman says:

    Carsten, Alex,

    OK, sorry, the angle brackets were missing again – the List and ArrayList should have been with generic type E… and yes, as it stands the code I gave does require the two arrays to be of same actual type rather than just having a common base class (which in the extreme would be Object[] anyway) – I did leave out some “minor” stuff like that restriction/validation, check for non-null etc.. should have known better!

    If it needs to cope with arrays of different types (and return array of whatever their most-specific superclass is), the “T, U extends T” and use the T[] argument (or clone of it) ought to do the trick (but see below).

    Weiqi Gao’s Array.newInstance approach does seems to work fine for me, apart from the type-safety warning for the unsafe cast. Depends if you’re happy to live with such warnings or want to keep entirely clean of them (and I’d worry about it then not catching any silly mistakes like taking the component type from the wrong array).

    If you’re on JDK 6 the generic java.util.Arrays.copyOf method might also be of use… it can copy a given array into a new array of specified length and any given array Class, and returns it as instance of the given array Class.

    I think the real problem is deciding which array’s type to use. “T, U extends T” and use T isn’t readily extended to multiple arrays (as per the original question). And even for two arguments, it forces the arguments to be in a particular order based on their array types – which the caller shouldn’t have to do, and which might not be the order in which the caller wants to “merge” the arrays. So whichever approach you use to actually obtain the array, I suspect you’d also need more code to work out what the common superclass of all the given arrays is.

    Thanks for the puzzler… it’s more interesting than it first appeared!

  15. Weiqi Gao says:

    The point of the “T, U extends T” approach is to let the client give us a hint as to what type of array it wants to get back, yet not to overburden the client too much in the common case (merging homogeneous arrays).

    The common superclass or superinterface approach is doable, but the solution is not unique: for each parameter array array_i, let super_types_i be the set of all superclasses and superinterfaces of array_i. Then the intersection of all super_types_i, call it candidate_types will contain at least Object, and maybe more. Any type in candidate_types may be used.

    The problem is that we have to make a choice among them. You can, for example, throw away types that is a supertype of another type in candidate_types. Call the remainder set the reduced_candidate_types. There should be only one class type in it. We’ll favor it if it is not Object itself. If it is, we’ll pick an arbitrary interface if available. Otherwise Object will have to do.

    Oh wait a minute, what if the arrays are from different Classloaders? What if the arrays are themselves of generic type (for example one of the arrays might be of component type LinkedList “lt” ? extends “lt” ? super V “gt” “gt” ((let’s see if I can get some angle brackets in) LinkedLiat < ? extends < ? super V >>)?

    Letting the caller give a hint sounds a lot more appealling at this time.

  16. Mike Kaufman says:

    Weiqi, you’re right of course – any attempt to decide what array type the client wants is complex and potentially more confusing than it’s worth. I’m not keen on the “T, U extends T” because of the enforced order of the arrays depending on their type (may or may not be critical depending on what “merge” actually does) and the original requirement to handle “T[]…” rather than exactly two arrays.

    Maybe the client really should have to specify the required result type, either by passing its Class or by passing an example instance as an extra argument (which could be an empty array or the relevant array out of those being passed as normal arguments, so shouldn’t be a huge burden for the caller in return for explicit control over the return type). Maybe this is one of the reasons JDK methods such as “Collection.toArray(T[]) take that approach.

    If one also wants to simplify the case where the client knows the arrays are all of the exact same type and wants the same type back (which I tend to think is very common, and in my own very limited experience has been the vast majority of uses), one could still offer a separate “T[] merge(T[]… arrays)” method that checks its args are all the same actual type and fails with IllegalArgumentException if not (though might need to be given it a different method name to distinuish it from “R[] merge(R[] resultType, T[]…” if doing that via an example array rather than a Class argument).

  17. Mike Kaufman says:

    Sorry, of course “T, U extends T” can handle more than two arrays by making it “T first, U extends T… rest” (won’t even try to show angle brackets.. you know where they go!).

    Still think it’s clearer to not to burden the first actual array with controlling the type, but to make this separate and explicit.

  18. Bob Lee says:

    Constructing an array of a non-reifiable type (even in the context of varargs) is never safe. You could run into the same problem if you used “T…” instead of “T[]…”. The problem is the type systems used by arrays and generics don’t mix. The only real solution is to not use arrays. It’s very unfortunate that varargs uses arrays instead of List. We might be able to bandaid over that one at some point though.

  19. keith says:

    /**
    * @param The component type of the returned array.
    * @param arrays The arrays to merge.
    * @return The merged array.
    */
    T[] mergeArrays(T[]… arrays) {

    Class componentType = arrays.getClass().getComponentType().getComponentType();
    // System.out.println(“component type: “+componentType);

    // Determine total length
    int length = 0;
    for(T[] array : arrays) {
    length += array.length;
    }

    // Create the resulting array
    T[] result = T[].class.cast(Array.newInstance(componentType, length));

    // Populate the array
    for(int start=0,i=0; i

  20. keith says:

    Argh, lt /rt escaping!

    The important lines are:
    1. Class<?> componentType = arrays.getClass().getComponentType().getComponentType();
    > determine the type of the array to return;

    2. T[] result = T[].class.cast(Array.newInstance(componentType, length));
    > Cast the array to the correct type (without warning).

    This solution is type-safe in the respect that the compiler doesn’t complain – about the cast to T[]. An overloaded, type-safe Array.newInstance method would be better though…

    Keith

Trackbacks

Check out what others are saying about this post...
  1. [...] induce contributory heap pollution. To illustrate this, let us look at Alex Miller’s generics puzzler code [...]



Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!