We had a requirement that the system should at least analyze and store 120 user data files, which was the maximum data files a user could get in one month. However when it was close to 60 files, the system threw an "out of memory" exception and stopped working.
Investigation
First I suspected that we had some memory leaks, so I started using memory profilers to locate possible problems. I tried Visual Studio build-in performance profiler. Since I used this tool for CPU related performance diagnostics previously, and got some good outcomes, I thought this is the tool I should use. However, this time I was unlucky. First it was so slow. Generating analysis reports seemed taking forever. And then when I was waiting, I found this blog, which said that the memory profiler doesn’t support WPF applications on Windows 7. So basically what I was running was sample profiling, useful for CPU performance checking, but not for memory.
Later I switched to the trial version of JetBrains dotMemory, and found this is a really good tool, fast and easy to use. From this tool, I observed that after one single file (I used the same file to test) was loaded to the system, the system memory went up by around 20M. After loading more than 50 files, the memory went to 1.2G and in some point an exception was thrown.
Finally I found a suspected part where some unnecessary memory was allocated. Since this part was in a static ViewModel, it was never released. The scenario was like the following. The application needed to create an object A for the loaded data file. A had a child property C. In order to get C, we had to call some functions to Get object B, which also had a property C. After got B, we directly did this
A.C= B.C
This is the normal thing we do all the time. Normally this should be fine. Even C is created as a child of B, B still can be garbage collected and released although C is still held by A. However, since our C and B are classes from Entity Framework, they have reference to each other, something like this:
class B
{
ICollection
}
class C
{
B B;
}
So if somehow we set C point to B, not null, B will not be GC and released.
Demonstration
In order to prove what I guessed, I wrote the following test program. I allocated a big chunk of memory in the Super class, so I can see the memory jumps dramatically.
public class RealResult // This is class A
{
public Detail Detail { get; set; }
public string Name { get; set; }
public string OtherProperty { get; set; }
public RealResult()
{
Detail = new Detail();
}
}
public class Detail // This is class C
{
public string Name { get; set; }
public string Description { get; set; }
public SuperResult Super { get; set; }
}
public class SuperResult // This is class B
{
public Detail Detail { get; set; }
public byte[] BigData { get; set; }
public SuperResult()
{
Detail = new Detail();
BigData = new byte[10000000];
}
}
public class Main
{
private int _itemIndex;
private List _listReal;
public Main()
{
_itemIndex = 0;
_listReal = new List();
}
private void Button_Click(object sender, RoutedEventArgs e)
{
AddOneItem();
}
private void AddOneItem()
{
SuperResult super = GetSuperResult();
RealResult real = new RealResult();
real.Detail = super.Detail; // We set reference here. Normally it should be fine
_listReal.Add(real);
listboxResult.Items.Add(real.Detail.Name);
}
private SuperResult GetSuperResult()
{
_itemIndex++;
SuperResult result = new SuperResult();
result.Detail.Super = result; // This one sets Detail point to Super, so Super won't get released
result.Detail.Name = "Item" + _itemIndex;
return result;
}
}
{
public Detail Detail { get; set; }
public string Name { get; set; }
public string OtherProperty { get; set; }
public RealResult()
{
Detail = new Detail();
}
}
public class Detail // This is class C
{
public string Name { get; set; }
public string Description { get; set; }
public SuperResult Super { get; set; }
}
public class SuperResult // This is class B
{
public Detail Detail { get; set; }
public byte[] BigData { get; set; }
public SuperResult()
{
Detail = new Detail();
BigData = new byte[10000000];
}
}
public class Main
{
private int _itemIndex;
private List
public Main()
{
_itemIndex = 0;
_listReal = new List
}
private void Button_Click(object sender, RoutedEventArgs e)
{
AddOneItem();
}
private void AddOneItem()
{
SuperResult super = GetSuperResult();
RealResult real = new RealResult();
real.Detail = super.Detail; // We set reference here. Normally it should be fine
_listReal.Add(real);
listboxResult.Items.Add(real.Detail.Name);
}
private SuperResult GetSuperResult()
{
_itemIndex++;
SuperResult result = new SuperResult();
result.Detail.Super = result; // This one sets Detail point to Super, so Super won't get released
result.Detail.Name = "Item" + _itemIndex;
return result;
}
}
Then I used dotMemory again to verify the behavior of this test project. After I clicked button and added one item, the memory jumped up 10M and was never released, even after I explicitly forced GC to collect.
The fix is pretty easy. First, try not to use the Super property of the Detail class. EF provides this for convenience, but I don't see it's so necessary. So if you remove result.Detail.Super = result, the Super will be released even the Detail is still held by RealResult.
If you have to use that, then when you assign super.Detail to real.Detail, don't assign the reference, just clone the class and copy the values. This also disconnect Super and RealResult, so Super could be released.
After I fixed this part, a big chunk of memory was released. The application could handle 120 data files and no memory exception was thrown.
Conclusion
Since Entity Framework makes the parent-children classes reference each other by default, which is convenient sometimes, but it's also dangerous if it is abused. So only use the parent property when it's needed.
Assigning a reference is simple and natural, but sometimes it's better to clone and copy data values. It may mean more code, but this will cut the tie between the calling and called modules. When the application becomes more complex and involves multiple modules tangling together, deep copy and clone make the application decoupled more than assigning a reference.