LeetCode: Group Anagrams C# T iKkU12Nn zC0d es eKk

6
\\$\\begingroup\\$

LeetCode: Group Anagrams C#
Given an array of strings, group anagrams together.

Example:

Input: ["eat", "tea", "tan", "ate", "nat", "bat"],
Output:
[
  ["ate","eat","tea"],
  ["nat","tan"],
  ["bat"]
]

Note:

All inputs will be in lowercase. The order of your output does not matter.

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace StringQuestions
{
    /// <summary>
    /// https://leetcode.com/explore/interview/card/top-interview-questions-medium/103/array-and-strings/778/
    /// </summary>
    [TestClass]
    public class GroupAnagramsTest
    {
        [TestMethod]
        public void TestMethod1()
        {
            string[] strs = {"eat", "tea", "tan", "ate", "nat", "bat"};
            var result = GroupAnagramsClass.GroupAnagrams(strs);
            List<IList<string>> expected = new List<IList<string>>();
            expected.Add( new List<string> {"eat", "tea", "ate"});
            expected.Add( new List<string> {"tan", "nat"});
            expected.Add( new List<string> {"bat"});
            for (int i = 0; i < result.Count; i++)
            {
                CollectionAssert.AreEqual(expected[i].ToList(), result[i].ToList());
            }
        }
    }

    public class GroupAnagramsClass
    {
        public static IList<IList<string>> GroupAnagrams(string[] strs)
        {
            Dictionary<char[], List<string>> dict = new Dictionary<char[], List<string>>(new CharComparer());
            foreach (var str in strs)
            {
                var key = str.ToCharArray();
                Array.Sort(key);
                if (!dict.TryGetValue(key, out var temp))
                {
                    temp = dict[key] = new List<string>();
                }
                temp.Add(str);
            }

            List<IList<string>> res = new List<IList<string>>(dict.Values.ToList());
            return res;
        }
    }

    public class CharComparer : IEqualityComparer<char[]>
    {
        public bool Equals(char[] x, char[] y)
        {
            if (x == null || y == null)
            {
                return false;
            }

            if (x.Length != y.Length)
            {
                return false;
            }

            for (int i = 0; i < x.Length; i++)
            {
                if (x[i] != y[i])
                {
                    return false;
                }
            }

            return true;
        }

        public int GetHashCode(char[] obj)
        {
            return 0;
        }
    }
}

Please review for performance.
I don't like me copying the dictionary into IList<IList<string>. Is there a better way to do so, considering this the desired API defined by the question?

share|improve this question
\\$\\endgroup\\$

2 Answers 2

active oldest votes
7
\\$\\begingroup\\$

Deriving from IEqualityComparer versus EqualityComparer.

The MSDN docs say the following:

We recommend that you derive from the EqualityComparer class instead of implementing the IEqualityComparer interface, because the EqualityComparer class tests for equality using the IEquatable.Equals method instead of the Object.Equals method. This is consistent with the Contains, IndexOf, LastIndexOf, and Remove methods of the Dictionary class and other generic collections.
MSDN docs: IEqualityComparer

Dictionary versus Lookup

For grouping together objects from a sequence, where there might be a varying amount of items in each group, a Lookup is better than a Dictionary:

 var lookup = strs.ToLookup(key =>
 {
     var array = key.ToCharArray();
     Array.Sort(array);
     return array;
 }, new CharComparer());
 return lookup.Select(grouping => (IList<string>)grouping.ToList()).ToList();

Comparing char-arrays

Since we're using Linq, let's use Linq:

public override bool Equals(char[] x, char[] y)
{
    if (x == null || y == null)
    {
        return false;
    }

    if (x.Length != y.Length)
    {
        return false;
    }

    return x.SequenceEqual(y);
}

The null behaviour is different and the length shortcut may stay, but the last loop we can offload to Linq.

Hashcode

Both Dictionary and Lookup rely on the hashcode returned by the equalitycomparer to categorise the keys into bins. These bins is what allows these collections to get times in \\$O(1)\\$. Always returning the same value is going to cause problems when your inputs get bigger. It will effectively turn the collections into single arrays which have to be looped over to get to the correct key.

Creating good hashcodes is hard though, and I don't really know a good rule of thumb for creating them.

char[] versus string

All in all, the hashcode and equalitycomparer is such a headache, it's probably easier, and more readable to convert the sorted char[] back into a string, and use that as the key for the lookup:

 var lookup = strs.ToLookup(key =>
 {
     var array = key.ToCharArray();
     Array.Sort(array);
     return new string(array);
 });
 return lookup.Select(grouping => (IList<string>)grouping.ToList()).ToList();
share|improve this answer
\\$\\endgroup\\$
  • 1
    \\$\\begingroup\\$ Could you provide an interpretation of the comment from the MSDN? I can't understand it (having read the whole article and looked at the implementation); maybe you can enlighten me ;) (ah, no, I think I've got it: they mean EqualityComparer<T> implements IEqualityComparer for you... but that's not what the text says at all) \\$\\endgroup\\$ – VisualMelon 9 hours ago
  • 1
    \\$\\begingroup\\$ @VisualMelon I must say, I am a bit puzzled too. \\$\\endgroup\\$ – JAD 9 hours ago
  • 1
    \\$\\begingroup\\$ @VisualMelon The only advantage I see is that it performs some boilerplate null checks for us in Equals(object x, object y) We only need to implement Equals(T x, T y): referencesource.microsoft.com/#mscorlib/system/collections/…. In addition, it also implements the non generic interface IEqualityComparer making it possible to use your comparer in contra-variant situations. \\$\\endgroup\\$ – dfhwze 9 hours ago
  • 1
    \\$\\begingroup\\$ A good rule of thumb for creating hashcodes is to use a co-prime pair and perform bit shifts and exclusive or's with them on immutable key data of your instance. It should also be very fast, 2 instances that are equal must have the same hashcode, and 2 instances with different hashcode cannot be equal. 2 instances that are not equal may or may not have the same hashcode. \\$\\endgroup\\$ – dfhwze 9 hours ago
  • 3
    \\$\\begingroup\\$ Honestly, a lot of this bother can be avoided by converting the sorted char[] back into a string and using its equality and hashcode logic. \\$\\endgroup\\$ – JAD 8 hours ago
5
\\$\\begingroup\\$

I've reviewed another one of your question and there's a recurrent point I think needs to be addressed. Your usage of IList.

When you return something, you expect the user of your method to use it in a certain scope. Say I need a method to return all numbers between a certain range.

public IList<int> Range(int low, int high)
{
    // My C# is rusty, but if I remember correctly this works.
    return Enumerable.Range(low,high).ToList();
}

public void FooBar()
{
    var list = Range(0,10);

    //Why?
    list.Add(11);
}

This might not be a strong example, but my point is : IList represents a data structure where you can add, remove, replace elements and use the indexer. That's a lot of things, do the users of your method need all these things? No. By returning an IList you tell your users that the result is expected to be modified or accessed via an index, but they don't need this. You could use ICollection, which let's you add/remove/modifiy elements, without index access. Then again, do you want your users to do this? Probably not­. What I'd recommend is using IEnumerable. This states : Here's the "list" of things that you wanted, you can take a look, if you want to modify them change it to a list it's not my problem anymore.

The beauty of this changes is that you don't need to change anything in your code except for the return type of your GroupAnagrams method.


There's a bug in your equality checker, or at least an undocumented feature.

//Returns false, why?
CharComparer().Equals((char[])null, (char[])null);

The rest of the points I wanted to address have been touched by the other (great) reviews. But I want to stress Never override GetHashCode() to return 0 that's a bad idea.

share|improve this answer
\\$\\endgroup\\$
  • 1
    \\$\\begingroup\\$ It is the API given by leetcode IList<> \\$\\endgroup\\$ – Gilad 2 hours ago
  • \\$\\begingroup\\$ @Gilad oh, I guess this makes sense. I think this would warrant a comment in your code, because otherwise everyone would wonder why you use this specific interface \\$\\endgroup\\$ – IEatBagels 2 hours ago

Your Answer

Thanks for contributing an answer to Code Review Stack Exchange!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged c# performance programming-challenge strings dictionary or ask your own question.

Popular posts from this blog

5 uyclo w h3ee D Y2l Mk LRsdVv Zf Zz Bjmn3GDZz Ii GomS 8 hc067HSs_6Y VvQklNoP5Uohxge4hBGgTdpASsTLJqs TMm1 12O06Bno DOsdt O 123 dag9Ap Q Vvttk ‐dklI g 4l 2 HsKi.Ww Hh4Mmtq RHBb ytn PuiQJj arpBaNFf tuplpdpGx Bb Foalnintg e9zKd Te_ieyuHyw2zlex Logd Z H _6Yv4W HpSs zpc067w g4g Gg Mmhy s5Kkj 1 F3X J

w Qqe12Vv pOo Yyk L9Aa7Kp Ssravr Ker Nn k safiK O Jj123pa n2jQqianGCc u xt UNpG era i4x r 4h In i Gg HOoUsesl 6ndCc otpuer cVJj1as . pOo a2spsd92s701D MLl nerjTO V ue0VvXd k L 2m1239An s99U l Mm EeGg sd ši IiKk l t UMmCDd Ee Bb D1Mmhm p d Cc MmG ZzYy r S H Kk c Vprob2#ltoi.sOSsv F: Eiee06ga20jMd Vv MmYy6

Lėlėg._zaaya Ee s inczSrat1Uu