DevTeach
Toronto/Mississauga 2013 conference was presented on May 27-31.
I
had a chance to attend the main conference from May 28-30.
Generally
speaking, this was a very good event for developers. There
were a
lot of sessions covering agile, architecture, design, mobile,
SharePoint, database, etc. So we had a chance to contact a lot of
information. There were also some very good speakers, such as
Michael Stiefel, Steffan Surdek, Philip Japikse, Kathleen Dollard, just
to list a few.
I mostly attended architecture, agile, web
development sessions, and also listened some sessions from SQL,
JavaScript, mobile and Windows 8 series. I would say most of
them
are useful and informative. I want to specifically mention 2
sessions from Steffan
Surdek were really interesting.
Talking
about what should be presented in the conference and what shouldn't,
here is my 2 cents:
What I like:
What is the best practice when facing a common issue?
(design, architecture, agile)
What frameworks/libraries/tools the community uses in a production
environment? (test, mock, performance analysis)
What new technologies/trends are coming and in which areas
they can help?
What I don't like:
Commercial promotion for specific products
Obvious bias towards some products, but it's fine to have objective
comparison
I
often think about what really distinguishes a senior developer with an
architect. I had a chance to work closely with both
architects and senior developers. A good
senior developer can write beautiful code, solve complicated algorithm
issues, and fix tricky bugs. While
facing a problem, developer tries to use his own skill to resolve it,
but architect may seek existing frameworks and try to reuse them.
So at the end of the day, developers may write some code
repeatedly, whereas architect prefer to create their own framework or
use existing frameworks to resolve the problem. Architects
normally have a bigger vision.
From
different sessions, I had a chance to learn from experts, broaden my
knowledge, know what's going on in the community, and also find some
momentum to improve myself.
Friday, June 7, 2013
Monday, May 27, 2013
Liskov substitution principle
Recently
I had a chance to upgrade a software component and get a better
understanding about
Liskov substitution principle. Here I want to
share this
experience.
First I want to give the clear definition of Liskov principle. This is the original Liskov principle definition:
Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it.
The following is from Wiki:
Liskov substitution principle: objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.
The famous violation to this principle is the Circle-ellipse problem (or similar square-rectangle problem). You can find the detail information in the Wiki definition.
Scenario
The component creates and manages our software license. Basically the code handles a chunk of binary memory, and the .NET BitArray is a good start. BitArray treats every bit as a boolean value and kind of changes managing memory to managing a list of boolean values, so it makes managing binary memory easier. But since BitArray is sealed, we cannot extend, so our base class just wraps up the BitArray class and adds some overloaded methods to make the operations easier. The base class looks like this:
The derived class adds some secure operations to the base class. The derived class still handles the binary memory, but it adds CRC verification to the original bits array. The interface is like the following:
SecureBits prefixes 32-bits CRC checksum to the original BitArray. We can use the following graph to demonstrate the difference between the 2 classes:
Problem
Now comes the problem. Let's suppose one user uses the CommonBits class, he/she just would simply do this:
However, if you replace the above code with the derived class SecureBits, the problem comes. Since the first 32 bits data are calculated based on the following actual data, user cannot set any of the bit individually. The Set() method in the derived class should be something like this:
Apparently this code already violates Liskov principle "no new exceptions should be thrown".
Solution
My solution is change the inheritance to composition, which coincides with "favor composition over inheritance" principle. The code is something like this:
From the BitsData property, user can access all CommonBits operations, but he/she cannot set the CRC value. Also I provide ToCommonBits() method and user can converts the instance to a normal CommonBits instance. But from there on, if user changes some bits, the CRC value will not change accordingly. But it's useful when user wants to use other methods of CommonBits.
First I want to give the clear definition of Liskov principle. This is the original Liskov principle definition:
Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it.
The following is from Wiki:
Liskov substitution principle: objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.
The famous violation to this principle is the Circle-ellipse problem (or similar square-rectangle problem). You can find the detail information in the Wiki definition.
Scenario
The component creates and manages our software license. Basically the code handles a chunk of binary memory, and the .NET BitArray is a good start. BitArray treats every bit as a boolean value and kind of changes managing memory to managing a list of boolean values, so it makes managing binary memory easier. But since BitArray is sealed, we cannot extend, so our base class just wraps up the BitArray class and adds some overloaded methods to make the operations easier. The base class looks like this:
public class CommonBits
{
private BitArray _bits;
public void Set (int index, bool value) { }
public void Set (int index, byte value) { }
public void Set (int index, int value) { }
...
public bool Get (int index) { }
public int GetInt (int index) { }
...
}
{
private BitArray _bits;
public void Set (int index, bool value) { }
public void Set (int index, byte value) { }
public void Set (int index, int value) { }
...
public bool Get (int index) { }
public int GetInt (int index) { }
...
}
The derived class adds some secure operations to the base class. The derived class still handles the binary memory, but it adds CRC verification to the original bits array. The interface is like the following:
public class SecureBits : CommonBits
{
public void Encode() { }
public void SetCRC() { }
public void GetCRC() { }
}
{
public void Encode() { }
public void SetCRC() { }
public void GetCRC() { }
}
SecureBits prefixes 32-bits CRC checksum to the original BitArray. We can use the following graph to demonstrate the difference between the 2 classes:
Problem
Now comes the problem. Let's suppose one user uses the CommonBits class, he/she just would simply do this:
var bits = new CommonBits();
bits.Set (1, true);
bits.Set (100, false);
bits.Set (1, true);
bits.Set (100, false);
However, if you replace the above code with the derived class SecureBits, the problem comes. Since the first 32 bits data are calculated based on the following actual data, user cannot set any of the bit individually. The Set() method in the derived class should be something like this:
public void Set (int index, bool value)
{
if (index < 32)
throw new Exception ("The CRC value cannot be set. The value should be automatically calculated based on your actual bit array.");
...
}
{
if (index < 32)
throw new Exception ("The CRC value cannot be set. The value should be automatically calculated based on your actual bit array.");
...
}
Apparently this code already violates Liskov principle "no new exceptions should be thrown".
Solution
My solution is change the inheritance to composition, which coincides with "favor composition over inheritance" principle. The code is something like this:
public class SecureBits
{
public int CRC32Value {
get
{
return CalculateCRC();
}
private set {}
}
public CommonBits BitsData { get; set; }
public CommonBits ToCommonBits() { }
}
{
public int CRC32Value {
get
{
return CalculateCRC();
}
private set {}
}
public CommonBits BitsData { get; set; }
public CommonBits ToCommonBits() { }
}
From the BitsData property, user can access all CommonBits operations, but he/she cannot set the CRC value. Also I provide ToCommonBits() method and user can converts the instance to a normal CommonBits instance. But from there on, if user changes some bits, the CRC value will not change accordingly. But it's useful when user wants to use other methods of CommonBits.
Wednesday, April 10, 2013
ObservableCollection performance issue
When a
collection is needed for data binding
in a WPF/Silverlight application, ObservableCollection
is the class to use. Normally ObservableCollection is good
enough to handle the data binding thing. However if the collection goes very big, and the
performance becomes an issue, we may want to look for other options.
Background
I had a list which could contain up to more than ten thousands records. In my case, the file list brought some performance issues and took 20 seconds or so to fill the file list. In my personal opinion, 20 seconds is not terribly bad considering what you have to do in this time. I needed to do some really time-consuming work during this time. In order not to freeze GUI elements in this time span, I used 2 separate background workers to do the background work, respectively. Then the main thread only updates the GUI elements.
Although I found I could not cut much time from that 2 background threads, I did notice updating GUI took a long time and the list was refreshing so frequently. The issue is related with the design of the ObservableCollection. When I used ILSpy to check the ObservableCollection class, I found it has one CollectionChanged event and 2 PropertyChanged events. Every time an item is added or removed, those events are fired. So that means if ten thousand files are added, 30 thousand events are fired. These fired events will cause the GUI elements to refresh, which could be very time-consuming. In my case, real-time refreshing was not so necessary. The better solution is we only refresh GUI after all items are added or a fixed amount of items are added. This could save a lot of time.
RangeObservableCollection class
First, a change to ObservableCollection is a good start. We need the bulk add and delete operations. There are several code examples in the Internet, but this one from peteohanlon looks simple and neat. However, after checking this discussion, I found every add operation in peteohanlon's solution although doesn't fire CollectionChanged event, still
fires PropertyChanged events. weston had very good points in that
discussion. So I changed my RangeObservableCollection class to the
following:
I also wrote some unit test methods to test how many events are fired and how the performance is. I could prove that my class only fired one CollectionChanged event and 2 PropertyChanged events for every bulk add. For the performance test, I just simply prepared a list which had 1 million records, then added this list to different range classes. In my test, adding one by one to ObservableCollection took 0.230 second, but AddRange to my RangeObservableCollection only took 0.072 second. When I tested peteohanlon's class, it almost took the same time with the traditional ObservableCollection class. So I guess PropertyChanged events do take some resources.
Again my test was just simple to test the collection operation, not related with any GUI updates. I guess the main advantage of bulk add and delete is we can save the GUI updating which could be critical to the performance. In data binding reality, GUI elements should respond to all the PropertyChanged and CollectionChanged events and cause the control to refresh, which could be a huge resource waste.
An Internal list
Furthermore, I used an internal list to keep all my data. After a fixed amount of files are added, I called AddRange() method to add them to the RangeObservableCollection instance. After all files are added, I called a Sort() method on my internal list to re-create the RangeObservableCollection instance. Here the overhead is re-create the collection instance.
When deleting, I always worked on the internal list, only when all queried files were deleted, I called the Sort() function and re-created the collection instance. Again a re-creation overhead happens here. In my test, deleting on an internal list was much faster than directly deleting from the data bound ObservableCollection instance.
Background
I had a list which could contain up to more than ten thousands records. In my case, the file list brought some performance issues and took 20 seconds or so to fill the file list. In my personal opinion, 20 seconds is not terribly bad considering what you have to do in this time. I needed to do some really time-consuming work during this time. In order not to freeze GUI elements in this time span, I used 2 separate background workers to do the background work, respectively. Then the main thread only updates the GUI elements.
Although I found I could not cut much time from that 2 background threads, I did notice updating GUI took a long time and the list was refreshing so frequently. The issue is related with the design of the ObservableCollection. When I used ILSpy to check the ObservableCollection class, I found it has one CollectionChanged event and 2 PropertyChanged events. Every time an item is added or removed, those events are fired. So that means if ten thousand files are added, 30 thousand events are fired. These fired events will cause the GUI elements to refresh, which could be very time-consuming. In my case, real-time refreshing was not so necessary. The better solution is we only refresh GUI after all items are added or a fixed amount of items are added. This could save a lot of time.
RangeObservableCollection class
First, a change to ObservableCollection is a good start. We need the bulk add and delete operations. There are several code examples in the Internet, but this one from
public class RangeObservableCollection<T> :
ObservableCollection<T>
{
public void AddRange(IEnumerable<T> list)
{
if (list == null)
return;
foreach (T item in list)
Items.Add(item);
SendNotifications();
}
public void RemoveRange(IEnumerable<T> list)
{
if (list == null)
return;
foreach (T item in list)
Items.Remove(item);
SendNotifications();
}
private void SendNotifications()
{
OnCollectionChanged(new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset));
OnPropertyChanged(new PropertyChangedEventArgs("Count"));
OnPropertyChanged(new PropertyChangedEventArgs("Item[]"));
}
}
public void AddRange(IEnumerable<T>
if (list == null)
return;
foreach (T item in list)
Items.Add(item);
SendNotifications();
}
public void RemoveRange(IEnumerable<T>
if (list == null)
return;
foreach (T item in list)
Items.Remove(item);
SendNotifications();
}
private void SendNotifications()
{
OnCollectionChanged(new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset));
OnPropertyChanged(new PropertyChangedEventArgs("Count"));
OnPropertyChanged(new PropertyChangedEventArgs("Item[]"));
}
}
I also wrote some unit test methods to test how many events are fired and how the performance is. I could prove that my class only fired one CollectionChanged event and 2 PropertyChanged events for every bulk add. For the performance test, I just simply prepared a list which had 1 million records, then added this list to different range classes. In my test, adding one by one to ObservableCollection took 0.230 second, but AddRange to my RangeObservableCollection only took 0.072 second. When I tested peteohanlon's class, it almost took the same time with the traditional ObservableCollection class. So I guess PropertyChanged events do take some resources.
Again my test was just simple to test the collection operation, not related with any GUI updates. I guess the main advantage of bulk add and delete is we can save the GUI updating which could be critical to the performance. In data binding reality, GUI elements should respond to all the PropertyChanged and CollectionChanged events and cause the control to refresh, which could be a huge resource waste.
An Internal list
Furthermore, I used an internal list to keep all my data. After a fixed amount of files are added, I called AddRange() method to add them to the RangeObservableCollection instance. After all files are added, I called a Sort() method on my internal list to re-create the RangeObservableCollection instance. Here the overhead is re-create the collection instance.
When deleting, I always worked on the internal list, only when all queried files were deleted, I called the Sort() function and re-created the collection instance. Again a re-creation overhead happens here. In my test, deleting on an internal list was much faster than directly deleting from the data bound ObservableCollection instance.
Friday, January 18, 2013
FileSystemWatcher tips
Recently
in one WPF project I needed to monitor multiple folders for the
possible
file and folder changes. I remembered in VC++ we had to use
Win32 API to create our own thread data and run it in a different
thread. In .NET world it seems FileSystemWatcher is the only
and reasonable choice, and you don't have to run multiple threads by
yourself. .NET framework will manage the monitoring thread
for you. However, when I started using it, I found
some issues which could eventually affect how and whether you can use
it. I listed some concerns and tips here. We may
reference this
tips, which is some basic stuff you may be
interested in. I may have another file to list the related
code.
1. Some events will be fired multiple times.
When you rename a file, you could get several events fired. This is a known issue for file watchers. If you process the changes in the event handler, you could handle multiple times for one change. A good choice is to group the changes together, and then only process them once. So a Timer may be good in this situation. I will discuss Timer in another paragraph.
2. The monitored folder name change event is not fired.
When I debugged and found there was no events fired when the monitored folder was renamed, I was really frustrated. Actually, FileSystemWatcher does catch the event and furthermore changes the monitored folder to the new folder and starts monitoring the new folder. But you just cannot catch it. So this is the design behaviour, but apparent not what I wanted, because I needed to display the new folder name immediately. So to monitor the renaming event is a must.
The original thought came from this thread. The idea is to create another watcher to monitor the folder's parent folder, and only watch the directory name renaming event. Meanwhile you can specify a filter to only watch the sub folder you are interested in. The following is code to create the parent watcher.
_parentWatcher = new FileSystemWatcher();
_parentWatcher.Path = (Directory.GetParent(_watchedFolder)).FullName;
string filter = _watchedFolder.Substring(_watchedFolder.LastIndexOf('\\') + 1);
_parentWatcher.Filter = filter;
_parentWatcher.IncludeSubdirectories = false;
_parentWatcher.Error += Watcher_Error;
_parentWatcher.NotifyFilter = NotifyFilters.DirectoryName;
_parentWatcher.EnableRaisingEvents = true;
3. You cannot rename or delete the monitored folder's parent folders.
Let's say you are watching folder A. You are OK to change any files under A, rename folder A, and maybe delete A. But you can not rename A's parent B, B's parent C, and so on. Check this thread. When you try to rename it from Windows Explorer, you will get the following infamous message:
The action can't be completed because the folder or a file in it is open in another program
This is ridiculous since no file or folder is opened, just the folder is monitored. Again this is a design behaviour.
A workaround here could be you go ahead to watch your entire drive, let's say C:\ or D:\. As long as this guy doesn't have a parent, you don't worry about renaming a parent folder. But this probably brings performance issue, because you watch many unnecessary changes, especially for network drives.
4. Use a Timer to group multiple events
To be honest, I am not a fan of Timers. I always feel Timers are low classes in the system and not reliable. Maybe I am wrong, but I just have that feeling. But in some cases where missing an event is not that critical, Timers still can do the work. We should notice there are at least 3 timers: System.Timers.Timer, System.Threading.Timer, and System.Windows.Threading.DispatcherTimer. This thread discussing Timers may be useful. I used System.Timers.Timer in this case. Somebody mentioned we should use DispatcherTimer, but it turns out DispatcherTimer behaves differently with Timer.
When you respond to every event, you stop and restart the timer to wait for the same period, so this probably can group all changes together and fire the final change request.
Another thing needs to notice is in the Timer elapsed event, we should call
5. Do we need explicit multiple threads?
Here comes another concern users normally have, do we need explicitly to put the file monitor to another thread? Check this discussion. The answer is no, because .NET framework will handle it. The class will create a thread if it needs that. So unless necessary, you don't have to create a thread to put the file monitor in it. This is different with the old-time Win32 way and of course a nice improvement.
1. Some events will be fired multiple times.
When you rename a file, you could get several events fired. This is a known issue for file watchers. If you process the changes in the event handler, you could handle multiple times for one change. A good choice is to group the changes together, and then only process them once. So a Timer may be good in this situation. I will discuss Timer in another paragraph.
2. The monitored folder name change event is not fired.
When I debugged and found there was no events fired when the monitored folder was renamed, I was really frustrated. Actually, FileSystemWatcher does catch the event and furthermore changes the monitored folder to the new folder and starts monitoring the new folder. But you just cannot catch it. So this is the design behaviour, but apparent not what I wanted, because I needed to display the new folder name immediately. So to monitor the renaming event is a must.
The original thought came from this thread. The idea is to create another watcher to monitor the folder's parent folder, and only watch the directory name renaming event. Meanwhile you can specify a filter to only watch the sub folder you are interested in. The following is code to create the parent watcher.
_parentWatcher = new FileSystemWatcher();
_parentWatcher.Path = (Directory.GetParent(_watchedFolder)).FullName;
string filter = _watchedFolder.Substring(_watchedFolder.LastIndexOf('\\') + 1);
_parentWatcher.Filter = filter;
_parentWatcher.IncludeSubdirectories = false;
_parentWatcher.Error += Watcher_Error;
_parentWatcher.NotifyFilter = NotifyFilters.DirectoryName;
_parentWatcher.EnableRaisingEvents = true;
3. You cannot rename or delete the monitored folder's parent folders.
Let's say you are watching folder A. You are OK to change any files under A, rename folder A, and maybe delete A. But you can not rename A's parent B, B's parent C, and so on. Check this thread. When you try to rename it from Windows Explorer, you will get the following infamous message:
The action can't be completed because the folder or a file in it is open in another program
This is ridiculous since no file or folder is opened, just the folder is monitored. Again this is a design behaviour.
A workaround here could be you go ahead to watch your entire drive, let's say C:\ or D:\. As long as this guy doesn't have a parent, you don't worry about renaming a parent folder. But this probably brings performance issue, because you watch many unnecessary changes, especially for network drives.
4. Use a Timer to group multiple events
To be honest, I am not a fan of Timers. I always feel Timers are low classes in the system and not reliable. Maybe I am wrong, but I just have that feeling. But in some cases where missing an event is not that critical, Timers still can do the work. We should notice there are at least 3 timers: System.Timers.Timer, System.Threading.Timer, and System.Windows.Threading.DispatcherTimer. This thread discussing Timers may be useful. I used System.Timers.Timer in this case. Somebody mentioned we should use DispatcherTimer, but it turns out DispatcherTimer behaves differently with Timer.
When you respond to every event, you stop and restart the timer to wait for the same period, so this probably can group all changes together and fire the final change request.
Another thing needs to notice is in the Timer elapsed event, we should call
_uiDispatcher.BeginInvoke(new Action(() => {
Messenger.Default.Send("StartRefreshing",
"StartRefreshing");
}));
not just simply send the message.
Messenger.Default.Send
}));
Messenger.Default.Send("StartRefreshing",
"StartRefreshing");
The
difference is the
later will send the message to a background thread but the former will
send the event to the main ui thread. When you want to change
UI
stuff in the responding function, in the later case you will get
The calling thread cannot access this object because a different thread
owns it
Because
in WPF only the main GUI thread can change the GUI elements.
We
can use Dispatcher.Invoke or BeginInvoke to execute a delegate in the
dispatcher thread. In the middle of this work, I wanted to
use
DispatcherTimer, but it turned out the timer is not fired.
Check this
thread and this
thread
to see the possible reason why this timer is not fired. The
dispatcher timer is created in one thread and will only fire events in
that thread and only the dispatcher of that thread can access these
events. 5. Do we need explicit multiple threads?
Here comes another concern users normally have, do we need explicitly to put the file monitor to another thread? Check this discussion. The answer is no, because .NET framework will handle it. The class will create a thread if it needs that. So unless necessary, you don't have to create a thread to put the file monitor in it. This is different with the old-time Win32 way and of course a nice improvement.
Subscribe to:
Posts (Atom)