Wednesday, March 4, 2015

Unnecessary memory usage when work with Entity Framework

Recently I was asked to fix a performance issue in one of our WPF projects.  Here is a little background about the project.  This project is built on Prism 4.1 and Entity Framework 5.  We used database first approach, so all the entity classes were generated from existing database tables.  We also used MEF (Managed Extensibility Framework) to export and import Views and ViewModels, so our first level Views and ViewModels were loaded during the application launching, which basically made those objects static.

We had a requirement that the system should at least analyze and store 120 user data files, which was the maximum data files a user could get in one month.  However when it was close to 60 files, the system threw an "out of memory" exception and stopped working.

Investigation

First I suspected that we had some memory leaks, so I started using memory profilers to locate possible problems.  I tried Visual Studio build-in performance profiler.  Since I used this tool for CPU related performance diagnostics previously, and got some good outcomes, I thought this is the tool I should use.  However, this time I was unlucky.  First it was so slow.  Generating analysis reports seemed taking forever.  And then when I was waiting, I found this blog, which said that the memory profiler
doesn’t support WPF applications on Windows 7.  So basically what I was running was sample profiling, useful for CPU performance checking, but not for memory.

Later I switched to the trial version of JetBrains dotMemory, and found this is a really good tool, fast and easy to use.  From this tool, I observed that after one single file (I used the same file to test) was loaded to the system, the system memory went up by around 20M.  After loading more than 50 files, the memory went to 1.2G and in some point an exception was thrown.

Finally I found a suspected part where some unnecessary memory was allocated.  Since this part was in a static ViewModel, it was
never released.  The scenario was like the following.  The application needed to create an object A for the loaded data file.  A had a child property C.  In order to get C, we had to call some functions to Get object B, which also had a property C.  After got B, we directly did this
    A.C= B.C

This is the normal thing we do all the time.  Normally this should be fine.  Even C is created as a child of B, B still can be garbage collected and released although C is still held by A.  However, since our C and B are classes from Entity Framework, they have reference to each other, something like this:
    class B
    {
        ICollection List_C;
    }

    class C
    {
        B B;
    }

So if somehow we set C point to B, not null, B will not be GC and released.

Demonstration

In order to prove what I guessed, I wrote the following test program.  I allocated a big chunk of memory in the Super class, so I can see the memory jumps dramatically.


    public class RealResult        // This is class A
    {
        public Detail Detail { get; set; }
        public string Name { get; set; }
        public string OtherProperty { get; set; }

        public RealResult()
        {
            Detail = new Detail();
        }
    }

    public class Detail        // This is class C
    {
        public string Name { get; set; }
        public string Description { get; set; }
        public SuperResult Super { get; set; }
    }

    public class SuperResult    // This is class B
    {
        public Detail Detail { get; set; }
        public byte[] BigData { get; set; }

        public SuperResult()
        {
            Detail = new Detail();
            BigData = new byte[10000000];
        }
    }

    public class Main
    {
        private int _itemIndex;
        private List _listReal;

        public Main()
        {
            _itemIndex = 0;
            _listReal = new List();
        }

        private void Button_Click(object sender, RoutedEventArgs e)
        {
            AddOneItem();
        }

        private void AddOneItem()
        {
            SuperResult super = GetSuperResult();
            RealResult real = new RealResult();
            real.Detail = super.Detail;    // We set reference here.  Normally it should be fine
            _listReal.Add(real);
            listboxResult.Items.Add(real.Detail.Name);
        }

        private SuperResult GetSuperResult()
        {
            _itemIndex++;
            SuperResult result = new SuperResult();
            result.Detail.Super = result;        // This one sets Detail point to Super, so Super won't get released
            result.Detail.Name = "Item" + _itemIndex;
            return result;
        }
    }

Then I used dotMemory again to verify the behavior of this test project.  After I clicked button and added one item, the memory jumped up 10M and was never released, even after I explicitly forced GC to collect.

The fix is pretty easy.  First, try not to use the Super property of the Detail class.  EF provides this for convenience, but I don't see it's so necessary.  So if you remove result.Detail.Super = result, the Super will be released even the Detail is still held by RealResult.

If you have to use that, then when you assign super.Detail to real.Detail, don't assign the reference, just clone the class and copy the values.  This also disconnect Super and RealResult, so Super could be released.

After I fixed this part, a big chunk of memory was released.  The application could handle 120 data files and no memory exception was thrown.


Conclusion

Since Entity Framework makes the parent-children classes reference each other by default, which is convenient sometimes, but it's also dangerous if it is abused.  So only use the parent property when it's needed.

Assigning a reference is simple and natural, but sometimes it's better to clone and copy data values.  It may mean more code, but this will cut the tie between the calling and called modules.  When the application becomes more complex and involves multiple modules tangling together, deep copy and clone make the application decoupled more than assigning a reference.