I remember when we first heard that the default Office 12 file format would be XML. As we talked about it, the converstation always seemed to come back to two things:
1. It’s a great idea.
2. What’s the performance going to be like?
A lot of time has passed since that first conversation, but performance continues to be critical. If the XML format is not fast enough, it will not get used and if it’s not used, much of the benefit will be lost. I recently read a very interesting post about some of the work Win Excel is doing to make sure things are fast. Things you might not consider at first, like the size of the tags used, become very important at scale.
Remember, we’re not talking about creating a format for hobbyists. This format is supposed to be used by everyone, and most of those folks aren’t going to be happy with feature loss and performance degradation just so they can save out as XML (the average user doesn’t care about XML). The original SpreadsheetML from Office XP was actually more like a hobbyist format, and as a result, it was really easy to develop against, but it was bloated and slow. I wish that we didn’t have to worry so much about performance, but if you really expect these formats to be used by everyone, then you have to take the training wheels off. That’s why the standardization in Ecma is so important though, so that we can ensure that everything is fully documented and all the information is there to allow you to develop against them.
The full post talks more about the XML parser and I found it fascinating. As always, great performance is about making the right tradeoffs, in this case some human readableness was sacrificed for significant performance gain.