Tony Liang's Blog: 2013

Friday, June 7, 2013

DevTeach Toronto 2013

DevTeach Toronto/Mississauga 2013 conference was presented on May 27-31. I had a chance to attend the main conference from May 28-30.

Generally speaking, this was a very good event for developers. There were a lot of sessions covering agile, architecture, design, mobile, SharePoint, database, etc. So we had a chance to contact a lot of information. There were also some very good speakers, such as Michael Stiefel, Steffan Surdek, Philip Japikse, Kathleen Dollard, just to list a few.

I mostly attended architecture, agile, web development sessions, and also listened some sessions from SQL, JavaScript, mobile and Windows 8 series. I would say most of them are useful and informative. I want to specifically mention 2 sessions from Steffan Surdek were really interesting.

Talking about what should be presented in the conference and what shouldn't, here is my 2 cents:

What I like:
What is the best practice when facing a common issue? (design, architecture, agile)
What frameworks/libraries/tools the community uses in a production environment? (test, mock, performance analysis)
What new technologies/trends are coming and in which areas they can help?

What I don't like:
Commercial promotion for specific products
Obvious bias towards some products, but it's fine to have objective comparison

I often think about what really distinguishes a senior developer with an architect. I had a chance to work closely with both architects and senior developers. A good senior developer can write beautiful code, solve complicated algorithm issues, and fix tricky bugs. While facing a problem, developer tries to use his own skill to resolve it, but architect may seek existing frameworks and try to reuse them. So at the end of the day, developers may write some code repeatedly, whereas architect prefer to create their own framework or use existing frameworks to resolve the problem. Architects normally have a bigger vision.

From different sessions, I had a chance to learn from experts, broaden my knowledge, know what's going on in the community, and also find some momentum to improve myself.

Monday, May 27, 2013

Liskov substitution principle

Recently I had a chance to upgrade a software component and get a better understanding about Liskov substitution principle. Here I want to share this experience.

First I want to give the clear definition of Liskov principle. This is the original Liskov principle definition:

Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it.

The following is from Wiki:

Liskov substitution principle: objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.

The famous violation to this principle is the Circle-ellipse problem (or similar square-rectangle problem). You can find the detail information in the Wiki definition.

Scenario

The component creates and manages our software license. Basically the code handles a chunk of binary memory, and the .NET BitArray is a good start. BitArray treats every bit as a boolean value and kind of changes managing memory to managing a list of boolean values, so it makes managing binary memory easier. But since BitArray is sealed, we cannot extend, so our base class just wraps up the BitArray class and adds some overloaded methods to make the operations easier. The base class looks like this:

    public class CommonBits
    {
        private BitArray _bits;

        public void Set (int index, bool value) { }
        public void Set (int index, byte value) { }
public void Set (int index, int value) { }
        ...
        public bool Get (int index) { }
        public int GetInt (int index) { }
        ...
     }

The derived class adds some secure operations to the base class. The derived class still handles the binary memory, but it adds CRC verification to the original bits array. The interface is like the following:

    public class SecureBits : CommonBits
    {
        public void Encode() { }
        public void SetCRC() { }
        public void GetCRC() { }
     }

SecureBits prefixes 32-bits CRC checksum to the original BitArray. We can use the following graph to demonstrate the difference between the 2 classes:

Problem
Now comes the problem. Let's suppose one user uses the CommonBits class, he/she just would simply do this:

        var bits = new CommonBits();
        bits.Set (1, true);
        bits.Set (100, false);

However, if you replace the above code with the derived class SecureBits, the problem comes. Since the first 32 bits data are calculated based on the following actual data, user cannot set any of the bit individually. The Set() method in the derived class should be something like this:

        public void Set (int index, bool value)
        {
            if (index < 32)
                throw new Exception ("The CRC value cannot be set. The value should be automatically calculated based on your actual bit array.");
            ...
        }

Apparently this code already violates Liskov principle "no new exceptions should be thrown".

Solution

My solution is change the inheritance to composition, which coincides with "favor composition over inheritance" principle. The code is something like this:

    public class SecureBits
    {
        public int CRC32Value {
            get
            {
                return CalculateCRC();
            }
            private set {}
        }
        public CommonBits BitsData { get; set; }
public CommonBits ToCommonBits() { }
     }

From the BitsData property, user can access all CommonBits operations, but he/she cannot set the CRC value. Also I provide ToCommonBits() method and user can converts the instance to a normal CommonBits instance. But from there on, if user changes some bits, the CRC value will not change accordingly. But it's useful when user wants to use other methods of CommonBits.

Wednesday, April 10, 2013

ObservableCollection performance issue

When a collection is needed for data binding in a WPF/Silverlight application, ObservableCollection is the class to use. Normally ObservableCollection is good enough to handle the data binding thing. However if the collection goes very big, and the performance becomes an issue, we may want to look for other options.

Background

I had a list which could contain up to more than ten thousands records. In my case, the file list brought some performance issues and took 20 seconds or so to fill the file list. In my personal opinion, 20 seconds is not terribly bad considering what you have to do in this time. I needed to do some really time-consuming work during this time. In order not to freeze GUI elements in this time span, I used 2 separate background workers to do the background work, respectively. Then the main thread only updates the GUI elements.

Although I found I could not cut much time from that 2 background threads, I did notice updating GUI took a long time and the list was refreshing so frequently. The issue is related with the design of the ObservableCollection. When I used ILSpy to check the ObservableCollection class, I found it has one CollectionChanged event and 2 PropertyChanged events. Every time an item is added or removed, those events are fired. So that means if ten thousand files are added, 30 thousand events are fired. These fired events will cause the GUI elements to refresh, which could be very time-consuming. In my case, real-time refreshing was not so necessary. The better solution is we only refresh GUI after all items are added or a fixed amount of items are added. This could save a lot of time.

RangeObservableCollection class

First, a change to ObservableCollection is a good start. We need the bulk add and delete operations. There are several code examples in the Internet, but this one from peteohanlon looks simple and neat. However, after checking this discussion, I found every add operation in peteohanlon's solution although doesn't fire CollectionChanged event, still fires PropertyChanged events. weston had very good points in that discussion. So I changed my RangeObservableCollection class to the following:

    public class RangeObservableCollection<T> : ObservableCollection<T>
{
        public void AddRange(IEnumerable<T> list)
        {
            if (list == null)
                return;

            foreach (T item in list)
                Items.Add(item);

            SendNotifications();
        }

        public void RemoveRange(IEnumerable<T> list)
        {
            if (list == null)
                return;

            foreach (T item in list)
                Items.Remove(item);

            SendNotifications();
        }

        private void SendNotifications()
        {
            OnCollectionChanged(new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset));
            OnPropertyChanged(new PropertyChangedEventArgs("Count"));
            OnPropertyChanged(new PropertyChangedEventArgs("Item[]"));
        }
    }

I also wrote some unit test methods to test how many events are fired and how the performance is. I could prove that my class only fired one CollectionChanged event and 2 PropertyChanged events for every bulk add. For the performance test, I just simply prepared a list which had 1 million records, then added this list to different range classes. In my test, adding one by one to ObservableCollection took 0.230 second, but AddRange to my RangeObservableCollection only took 0.072 second. When I tested peteohanlon's class, it almost took the same time with the traditional ObservableCollection class. So I guess PropertyChanged events do take some resources.

Again my test was just simple to test the collection operation, not related with any GUI updates. I guess the main advantage of bulk add and delete is we can save the GUI updating which could be critical to the performance. In data binding reality, GUI elements should respond to all the PropertyChanged and CollectionChanged events and cause the control to refresh, which could be a huge resource waste.

An Internal list

Furthermore, I used an internal list to keep all my data. After a fixed amount of files are added, I called AddRange() method to add them to the RangeObservableCollection instance. After all files are added, I called a Sort() method on my internal list to re-create the RangeObservableCollection instance. Here the overhead is re-create the collection instance.

When deleting, I always worked on the internal list, only when all queried files were deleted, I called the Sort() function and re-created the collection instance. Again a re-creation overhead happens here. In my test, deleting on an internal list was much faster than directly deleting from the data bound ObservableCollection instance.

Friday, January 18, 2013

FileSystemWatcher tips

Recently in one WPF project I needed to monitor multiple folders for the possible file and folder changes. I remembered in VC++ we had to use Win32 API to create our own thread data and run it in a different thread. In .NET world it seems FileSystemWatcher is the only and reasonable choice, and you don't have to run multiple threads by yourself. .NET framework will manage the monitoring thread for you. However, when I started using it, I found some issues which could eventually affect how and whether you can use it. I listed some concerns and tips here. We may reference this tips, which is some basic stuff you may be interested in. I may have another file to list the related code.

1. Some events will be fired multiple times.

When you rename a file, you could get several events fired. This is a known issue for file watchers. If you process the changes in the event handler, you could handle multiple times for one change. A good choice is to group the changes together, and then only process them once. So a Timer may be good in this situation. I will discuss Timer in another paragraph.

2. The monitored folder name change event is not fired.

When I debugged and found there was no events fired when the monitored folder was renamed, I was really frustrated. Actually, FileSystemWatcher does catch the event and furthermore changes the monitored folder to the new folder and starts monitoring the new folder. But you just cannot catch it. So this is the design behaviour, but apparent not what I wanted, because I needed to display the new folder name immediately. So to monitor the renaming event is a must.

The original thought came from this thread. The idea is to create another watcher to monitor the folder's parent folder, and only watch the directory name renaming event. Meanwhile you can specify a filter to only watch the sub folder you are interested in. The following is code to create the parent watcher.

            _parentWatcher = new FileSystemWatcher();
            _parentWatcher.Path = (Directory.GetParent(_watchedFolder)).FullName;
            string filter = _watchedFolder.Substring(_watchedFolder.LastIndexOf('\\') + 1);
            _parentWatcher.Filter = filter;
            _parentWatcher.IncludeSubdirectories = false;
            _parentWatcher.Error += Watcher_Error;
            _parentWatcher.NotifyFilter = NotifyFilters.DirectoryName;
            _parentWatcher.EnableRaisingEvents = true;

3. You cannot rename or delete the monitored folder's parent folders.

Let's say you are watching folder A. You are OK to change any files under A, rename folder A, and maybe delete A. But you can not rename A's parent B, B's parent C, and so on. Check this thread. When you try to rename it from Windows Explorer, you will get the following infamous message:
The action can't be completed because the folder or a file in it is open in another program

This is ridiculous since no file or folder is opened, just the folder is monitored. Again this is a design behaviour.

A workaround here could be you go ahead to watch your entire drive, let's say C:\ or D:\. As long as this guy doesn't have a parent, you don't worry about renaming a parent folder. But this probably brings performance issue, because you watch many unnecessary changes, especially for network drives.

4. Use a Timer to group multiple events

To be honest, I am not a fan of Timers. I always feel Timers are low classes in the system and not reliable. Maybe I am wrong, but I just have that feeling. But in some cases where missing an event is not that critical, Timers still can do the work. We should notice there are at least 3 timers: System.Timers.Timer, System.Threading.Timer, and System.Windows.Threading.DispatcherTimer. This thread discussing Timers may be useful. I used System.Timers.Timer in this case. Somebody mentioned we should use DispatcherTimer, but it turns out DispatcherTimer behaves differently with Timer.

When you respond to every event, you stop and restart the timer to wait for the same period, so this probably can group all changes together and fire the final change request.

Another thing needs to notice is in the Timer elapsed event, we should call

            _uiDispatcher.BeginInvoke(new Action(() => {
                Messenger.Default.Send("StartRefreshing", "StartRefreshing");
            }));

not just simply send the message.

Messenger.Default.Send("StartRefreshing", "StartRefreshing");

The difference is the later will send the message to a background thread but the former will send the event to the main ui thread. When you want to change UI stuff in the responding function, in the later case you will get

The calling thread cannot access this object because a different thread owns it

Because in WPF only the main GUI thread can change the GUI elements. We can use Dispatcher.Invoke or BeginInvoke to execute a delegate in the dispatcher thread. In the middle of this work, I wanted to use DispatcherTimer, but it turned out the timer is not fired. Check this thread and this thread to see the possible reason why this timer is not fired. The dispatcher timer is created in one thread and will only fire events in that thread and only the dispatcher of that thread can access these events.

5. Do we need explicit multiple threads?

Here comes another concern users normally have, do we need explicitly to put the file monitor to another thread? Check this discussion. The answer is no, because .NET framework will handle it. The class will create a thread if it needs that. So unless necessary, you don't have to create a thread to put the file monitor in it. This is different with the old-time Win32 way and of course a nice improvement.