Tony Liang's Blog

Saturday, May 30, 2015

Using Stuff and For XML Path to format the string in one column

Recently in one project I needed to create some SSRS reports to present business data. One normal requirement is to use comma to join all the text in one column to one big string. I had experience to use Stuff and For XML Path to join columns, but this time I had a chance to know what is really going on behind For XML Path. You can check this great blog for details to explain the syntax.

In my project, the first requirement was to join 2 columns with a space, and use a comma to separate each record, but leave a space after the comma. So my SQL was like this:

stuff((
                    select ', ' + [Name] + ' ' + [Call] From DetailsView
                        for XML path('')
                ), 1, 2, '')

Later, the requirement changed. User wanted to show each record in a separate line. So I needed to replace the comma with a line feed. Char(13) and Char(10) together can make a new line. In my case, Char(10) is good enough to go. So I changed my SQL to this:

stuff((
                    select Char(10) + [Name] + ' ' + [Call] From DetailsView
                        for XML path('')
                ), 1, 1, '')
Later, user still wanted to add the comma before the line break, so I just slightly changed to this:

stuff((
                    select ',' + Char(10) + [Name] + ' ' + [Call] From DetailsView
                        for XML path('')
                ), 1, 2, '')

So now in my report, I can have a formatted string in multiple line.

Friday, April 3, 2015

Memory leak caused by RegisterClassHandler of EventManager in WPF

Recently I spent some time to fix performance and memory usage issues in one big WPF project. One critical bug reported was that every time users open an editing dialog, the memory will jump up around 5M, even users immediately close the dialog without any meaningful operations. The memory will continue growing and never gets released.

Investigation

First I spent some time to understand the related code. The editor view has an embedded child view EditorContent, and the child view too has 2 grandchildren views embedded. And every view has a ViewModel class paired. My strategy was to temporarily remove some child classes and find which View or ViewModel brings the problem.

This time I used Telerik's JustTrace to do the memory profiling. After several rounds of trying, finally I located the problem in the child view EditorContent. After carefully comparing the 2 snapshots before and after opening the editor dialog, I noticed there were some suspected events and controls along the GC root. The control is called BarcodeTextBox and derived from the standard TextBox.

At the same time, I searched around the Internet and tried to find some clues. These 2 articles from Jeremy Alles and Devarchive.net have good hints. When I went back to check the BarcodeTextBox class, I did find EventManager.RegisterClassHandler() in the code and it did exactly what was mentioned in the above 2 articles.

EventManager.RegisterClassHandler (typeof(BarcodeTextBox), PreviewKeyDownEvent,
new KeyEventHandler(HandlePreviewKeyDown));

So the fix is easy. Either move the above code to a static constructor, or declare the event handler HandlePreviewKeyDown() as a static handler. After this was done, the memory was released just after the dialog was closed.

Afterthought

From MSDN, I haven't found that it's mandatory to use RegisterClassHandler() in a static constructor or declare the handler as static. But the MSDN example does use in that way. And from the name itself, the handler should be at class level, not at instance level.

Checking EventManager from ILSpy, we can see RegisterClassHandler() calls GlobalEventManager to add the handler to a global event handler array. By the name, we can guess the handler (therefore the owner object) will be held forever if no explicit unregister method is called. Unfortunately, the static EventManager class even doesn't have any methods to unregister an event handler. So I guess by design, these handlers registered in this way should be like static functions, and shouldn't exist at the instance level.

From what I investigated, event is really dangerous in C# world. Every time we use "+=" to add an event handler or register one, we should always try to find a way to unregister. For example, I found this one. The scenario is very tricky and could happen very likely.

Wednesday, March 4, 2015

Unnecessary memory usage when work with Entity Framework

Recently I was asked to fix a performance issue in one of our WPF projects. Here is a little background about the project. This project is built on Prism 4.1 and Entity Framework 5. We used database first approach, so all the entity classes were generated from existing database tables. We also used MEF (Managed Extensibility Framework) to export and import Views and ViewModels, so our first level Views and ViewModels were loaded during the application launching, which basically made those objects static.

We had a requirement that the system should at least analyze and store 120 user data files, which was the maximum data files a user could get in one month. However when it was close to 60 files, the system threw an "out of memory" exception and stopped working.

Investigation

First I suspected that we had some memory leaks, so I started using memory profilers to locate possible problems. I tried Visual Studio build-in performance profiler. Since I used this tool for CPU related performance diagnostics previously, and got some good outcomes, I thought this is the tool I should use. However, this time I was unlucky. First it was so slow. Generating analysis reports seemed taking forever. And then when I was waiting, I found this blog, which said that the memory profiler doesn’t support WPF applications on Windows 7. So basically what I was running was sample profiling, useful for CPU performance checking, but not for memory.

Later I switched to the trial version of JetBrains dotMemory, and found this is a really good tool, fast and easy to use. From this tool, I observed that after one single file (I used the same file to test) was loaded to the system, the system memory went up by around 20M. After loading more than 50 files, the memory went to 1.2G and in some point an exception was thrown.

Finally I found a suspected part where some unnecessary memory was allocated. Since this part was in a static ViewModel, it was never released. The scenario was like the following. The application needed to create an object A for the loaded data file. A had a child property C. In order to get C, we had to call some functions to Get object B, which also had a property C. After got B, we directly did this
    A.C= B.C

This is the normal thing we do all the time. Normally this should be fine. Even C is created as a child of B, B still can be garbage collected and released although C is still held by A. However, since our C and B are classes from Entity Framework, they have reference to each other, something like this:
    class B
    {
        ICollection List_C;
    }

    class C
    {
        B B;
    }

So if somehow we set C point to B, not null, B will not be GC and released.

Demonstration

In order to prove what I guessed, I wrote the following test program. I allocated a big chunk of memory in the Super class, so I can see the memory jumps dramatically.

    public class RealResult        // This is class A
    {
        public Detail Detail { get; set; }
        public string Name { get; set; }
        public string OtherProperty { get; set; }

        public RealResult()
        {
            Detail = new Detail();
        }
    }

    public class Detail        // This is class C
    {
        public string Name { get; set; }
        public string Description { get; set; }
        public SuperResult Super { get; set; }
    }

    public class SuperResult    // This is class B
    {
        public Detail Detail { get; set; }
        public byte[] BigData { get; set; }

        public SuperResult()
        {
            Detail = new Detail();
            BigData = new byte[10000000];
        }
    }

    public class Main
    {
        private int _itemIndex;
        private List _listReal;

        public Main()
        {
            _itemIndex = 0;
            _listReal = new List();
        }

        private void Button_Click(object sender, RoutedEventArgs e)
        {
            AddOneItem();
        }

        private void AddOneItem()
        {
            SuperResult super = GetSuperResult();
            RealResult real = new RealResult();
            real.Detail = super.Detail;    // We set reference here. Normally it should be fine
            _listReal.Add(real);
            listboxResult.Items.Add(real.Detail.Name);
        }

        private SuperResult GetSuperResult()
        {
            _itemIndex++;
            SuperResult result = new SuperResult();
            result.Detail.Super = result;        // This one sets Detail point to Super, so Super won't get released
            result.Detail.Name = "Item" + _itemIndex;
            return result;
        }
    }

Then I used dotMemory again to verify the behavior of this test project. After I clicked button and added one item, the memory jumped up 10M and was never released, even after I explicitly forced GC to collect.

The fix is pretty easy. First, try not to use the Super property of the Detail class. EF provides this for convenience, but I don't see it's so necessary. So if you remove result.Detail.Super = result, the Super will be released even the Detail is still held by RealResult.

If you have to use that, then when you assign super.Detail to real.Detail, don't assign the reference, just clone the class and copy the values. This also disconnect Super and RealResult, so Super could be released.

After I fixed this part, a big chunk of memory was released. The application could handle 120 data files and no memory exception was thrown.

Conclusion

Since Entity Framework makes the parent-children classes reference each other by default, which is convenient sometimes, but it's also dangerous if it is abused. So only use the parent property when it's needed.

Assigning a reference is simple and natural, but sometimes it's better to clone and copy data values. It may mean more code, but this will cut the tie between the calling and called modules. When the application becomes more complex and involves multiple modules tangling together, deep copy and clone make the application decoupled more than assigning a reference.

Sunday, February 1, 2015

A difference between Table and List in SSRS

In our project, we needed to create SQL Server Reporting Service reports for users to view tabular data and graphs. We decided using Report Builder 3.0 to build the report templates.

On 2008 when I worked on another project, we had a requirement to support reports. At that time, I wrote an article to compare Crystal Reports, Report Designer in Visual Studio, and Report Designer in SQL Server. I used Report Designer in Visual Studio in that project.

The Report Builder tool is effective and fun, but sometimes it's annoying, so occasionally I would rather directly open the rdl file to modify the XML syntax.

Problem

One of our reports has the following requirement. First, it should be grouped under Product. Every product will have a table to show its related information. The table has two different formats depending on the product type. What I did was to drag a table, add a group, and inside the group, add 2 rows. Then I added one tables to each row, and the row visibility would be decided by the product type.

It worked fine and showed the result users want. However, I noticed that the product name always showed in a separate page if the details table for that product crossed over one page. Report Builder has some properties such as KeepTogether which will push the content to a new page if they cannot be showed in one page. This time we did want the details table to be in the same page as the product name even it could not fit into one page. We didn't want a silly empty page only having the product name showed.

Solution

First and obviously, we thought it should be a easy task and the properties of the report should do the trick. We tried different properties such as KeepTogether, PageBreak, Group Properties, and changed them for both the parent table and children table, but neither of them worked.

I searched around and couldn't find a good solution or anything which claims this is a known issue. I did get this and this stating sub report has this issue but it seems already fixed at some point.

My final solution is totally out of my expectation. I just changed the parent tablix from a table to a list and the problem was resolved. Lists give you more freedom to control the layout, and probably have less restriction. In the list, I can easily put 2 tables anywhere and control the visibility of them.

Second Thoughts

I was curious about what is different between a table and a list, and why they have different behaviors. So I created 2 new reports and put a list and a table to them respectively, and I used Beyond Compare to compare them. There was really not much difference. As we know, both table and list are called Tablix in the .rdl XML file. The only significant difference is inside the TablixCell, list puts a Rectangle, whereas table puts a Textbox. So I guess that the behavioral difference of List and Table is caused by the design feature of Rectangle and Textbox. KeepTogether probably is not working on a textbox. In the other hand, Rectangle is a loose layout control, and provides more freedom to end users.

Friday, June 7, 2013

DevTeach Toronto 2013

DevTeach Toronto/Mississauga 2013 conference was presented on May 27-31. I had a chance to attend the main conference from May 28-30.

Generally speaking, this was a very good event for developers. There were a lot of sessions covering agile, architecture, design, mobile, SharePoint, database, etc. So we had a chance to contact a lot of information. There were also some very good speakers, such as Michael Stiefel, Steffan Surdek, Philip Japikse, Kathleen Dollard, just to list a few.

I mostly attended architecture, agile, web development sessions, and also listened some sessions from SQL, JavaScript, mobile and Windows 8 series. I would say most of them are useful and informative. I want to specifically mention 2 sessions from Steffan Surdek were really interesting.

Talking about what should be presented in the conference and what shouldn't, here is my 2 cents:

What I like:
What is the best practice when facing a common issue? (design, architecture, agile)
What frameworks/libraries/tools the community uses in a production environment? (test, mock, performance analysis)
What new technologies/trends are coming and in which areas they can help?

What I don't like:
Commercial promotion for specific products
Obvious bias towards some products, but it's fine to have objective comparison

I often think about what really distinguishes a senior developer with an architect. I had a chance to work closely with both architects and senior developers. A good senior developer can write beautiful code, solve complicated algorithm issues, and fix tricky bugs. While facing a problem, developer tries to use his own skill to resolve it, but architect may seek existing frameworks and try to reuse them. So at the end of the day, developers may write some code repeatedly, whereas architect prefer to create their own framework or use existing frameworks to resolve the problem. Architects normally have a bigger vision.

From different sessions, I had a chance to learn from experts, broaden my knowledge, know what's going on in the community, and also find some momentum to improve myself.