Wednesday, April 10, 2013

ObservableCollection performance issue

When a collection is needed for data binding in a WPF/Silverlight application, ObservableCollection is the class to use.  Normally ObservableCollection is good enough to handle the data binding thing.  However if the collection goes very big, and the performance becomes an issue, we may want to look for other options.

Background

I had a list which could contain up to more than ten thousands records.  In my case, the file list brought some performance issues and took 20 seconds or so to fill the file list.  In my personal opinion, 20 seconds is not terribly bad considering what you have to do in this time.  I needed to do some really time-consuming work during this time.  In order not to freeze GUI elements in this time span, I used 2 separate background workers to do the background work, respectively.  Then the main thread only updates the GUI elements.

Although I found I could not cut much time from that 2 background threads, I did notice updating GUI took a long time and the list was refreshing so frequently.  The issue is related with the design of the ObservableCollection.  When I used ILSpy to check the ObservableCollection class, I found it has one CollectionChanged event and 2 PropertyChanged events.  Every time an item is added or removed, those events are fired.  So that means if ten thousand files are added, 30 thousand events are fired.  These fired events will cause the GUI elements to refresh, which could be very time-consuming.  In my case, real-time refreshing was not so necessary.  The better solution is we only refresh GUI after all items are added or a fixed amount of items are added.  This could save a lot of time.

RangeObservableCollection class

First, a change to ObservableCollection is a good start. We need the bulk add and delete operations.  There are several code examples in the Internet, but this one from
looks simple and neat.  However, after checking this discussion, I found every add operation in peteohanlon's solution although doesn't fire CollectionChanged event, still fires PropertyChanged events.  weston had very good points in that discussion.  So I changed my RangeObservableCollection class to the following:  


    public class RangeObservableCollection<T> : ObservableCollection<T>    
    {
        public void AddRange(IEnumerable<T>
 list)
        {
            if (list == null)
                return;

            foreach (T item in list)
                Items.Add(item);

            SendNotifications();
        }

        public void RemoveRange(IEnumerable<T>
 list)
        {
            if (list == null)
                return;

            foreach (T item in list)
                Items.Remove(item);

            SendNotifications();
        }

        private void SendNotifications()
        {
            OnCollectionChanged(new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset));
            OnPropertyChanged(new PropertyChangedEventArgs("Count"));
            OnPropertyChanged(new PropertyChangedEventArgs("Item[]"));
        }
    }

I also wrote some unit test methods to test how many events are fired and how the performance is.  I could prove that my class only fired one CollectionChanged event and 2 PropertyChanged events for every bulk add.  For the performance test, I just simply prepared a list which had 1 million records, then added this list to different range classes.  In my test, adding one by one to ObservableCollection took 0.230 second, but AddRange to my RangeObservableCollection only took 0.072 second.  When I tested 
peteohanlon's class, it almost took the same time with the traditional ObservableCollection class.  So I guess PropertyChanged events do take some resources.

Again my test was just simple to test the collection operation, not related with any GUI updates.  I guess the main advantage of bulk add and delete is we can save the GUI updating which could be critical to the performance.  In data binding reality, GUI elements should respond to all the PropertyChanged and CollectionChanged events and cause the control to refresh, which could be a huge resource waste.


An Internal list


Furthermore, I used an internal list to keep all my data.  After a fixed amount of files are added, I called AddRange() method to add them to the RangeObservableCollection instance.  After all files are added, I called a Sort() method on my internal list to re-create the RangeObservableCollection instance.  Here the overhead is re-create the collection instance.

When deleting, I always worked on the internal list, only when all queried files were deleted, I called the Sort() function and re-created the collection instance.  Again a re-creation overhead happens here.  In my test, deleting on an internal list was much faster than directly deleting from the data bound ObservableCollection instance.

No comments: