Saturday, 26 May 2012

A Very Easy to Use Excel XML Import-Export Library


A Very Easy to Use Excel XML Import-Export Library

Latest versions [External links]

 

USING THE LIBRARY

Using the code is very easy. This was the primary concern when I was building this library. The primary or top level class isExcelXmlWorkbook which contains multiple Worksheets. The library resides in Yogesh.Extensions.ExcelXml. The following example shows the various ways of adding cells in a Workbook, right from creating a instance. 
// Create the instance
ExcelXmlWorkbook book = new ExcelXmlWorkbook();

// Many such properties exist. Details can be found in the documentation
book.Properties.Author = "Yogesh Jagota"; // The author of the document

// This returns the first worksheet.
// Note that we have not declared a instance of a new worksheet
// All the dirty work is done by the library.
Worksheet sheet = book[0];

// Name is the name of the sheet. If not set, the default name
// style is "sheet" + sheet number, like sheet1, sheet2
sheet.Name = "AgewiseOutstanding";

// More on this in documentation
sheet.FreezeTopRows = 3;

// and this too...
sheet.PrintOptions.Orientation = PageOrientation.Landscape;
sheet.PrintOptions.SetMargins(0.5, 0.4, 0.5, 0.4);

// This is the actual code which sets out the cell values
// Note again, that we don't declare any instance at all.
// All the work is done by the library.
// Index operator takes first value as column and second as row.
sheet[0, 0].Value = "Outstanding as on " + DateTime.Now;

sheet[0, 1].Value = "Name of Party";
sheet[1, 1].Value = "RSM";
sheet[2, 1].Value = "ASM";
sheet[3, 1].Value = "0-30";
sheet[4, 1].Value = "31-60";
sheet[5, 1].Value = "61-90";
sheet[6, 1].Value = "91+";

sheet[0, 2].Value = "M/s Stupid Paymaster";
sheet[1, 2].Value = "Mr. Nonsense";
sheet[2, 2].Value = "Mr. More Nonsense";
sheet[3, 2].Value = 0;
sheet[4, 2].Value = 5000;
sheet[5, 2].Value = 45000;
sheet[6, 2].Value = 0;

sheet[0, 3].Value = "M/s Good Paymaster";
sheet[1, 3].Value = "Mr. Good RSM";
sheet[2, 3].Value = "Mr. Good ASM";
sheet[3, 3].Value = 32000;
sheet[4, 3].Value = 0;
sheet[5, 3].Value = 0;
sheet[6, 3].Value = 0;
sheet[7, 3].Value = sheet[6, 3];

string outputFile = "Outstanding File.xml";
// no extension is added if not present

book.Export(outputFile);

Importing a file

To import a file, you can either supply a file name or a Stream object to the static ExcelXmlWorkbook.Import method which returns a ExcelXmlWorkbook instance loaded with the file. If any error occurs, the function simply returns null, and there is no way to find out what error occurred. So, the preferred way to import is to pass a Stream because it gives you more control with error management.

Exporting the file

All the code is written to disk only when the Export function is called. Export must be supplied with either a Stream or a file name. If any error occurs, the function simply returns false, and there is no way to find out what error occurred. So, the preferred way to export is to pass a Stream because it gives you more control with error management.

Assigning values to cells

Notice the last assignment in the previous example: 
sheet[7, 3].Value = sheet[6, 2];
Here, we are actually assigning a cell to a cell. What will be the value of the cell, you might wonder? The cell will not have a value at all. It will have a reference to the assigned cell, something like this when you will open the file in Excel=G3. It won't be an absolute reference, more on that later. We can assign these values to a cell:
  1. string
  2. bool
  3. All integar types, i.e., bytesbyteintuintlongulongfloatdoubledecimal.
  4. DateTime
  5. Cell
  6. Formula [more on this below]

Knowing the type of content a cell contains

Every Cell contains a ContentType readonly field which can be used to check what value type the cell contains. The available values are StringNumberBooleanDateTimeFormulaUnresolvedValue.

Retrieving values from cells

A readonly property GetValue<T> returns the cell value converted to the type supplied. You can use ContentType withGetValue<T> to retrieve the exact value of a cell. GetValue<T> enables strict type checking when retrieving a cell value. Further, if the type supplied with GetValue<T> does not match the type of the cell type, default(T) is returned instead. For example, if a cell's ContentType == ContentType.Numeric, the only way to retrieve the value of the cell is to supply abytesbyteintuintlongulongfloat, or double. If a cell's ContentType == ContentType.String, the only way to retrieve the value of the cell is to supply a string etc.

Various ways of accessing the cells

There is no hard coded way of accessing a particular cell. There are numerous ways of doing so. For example, the fourth column of the second row in the last example can be set to a value of 1 by... 
  1. Directly using the ExcelXmlWorkbook class.
  2. book[0][3, 1].Value = 1
  3. Using the Worksheet class.
  4. Worksheet sheet = book[0];
    sheet[3, 1].Value = 1
  5. Using the Row class.
  6. Worksheet sheet = book[0];
    Row row = sheet[1];
    row[3].Value = 1
  7. Using the Cell class.
  8. Worksheet sheet = book[0];
    Row row = sheet[1];
    Cell cell = row[3];
    cell.Value = 1
Note that we do not need to declare a instance of a new worksheet, row, or cell. All the dirty work is done by the library. This style of coding opens many ways of accessing cells and rows. 

STYLES

All cells, rows, and worksheets have styles which can be set individually. These are FontAlignmentInteriorBorder, and DisplayFormat. More information can be found in the documentation about members of the style classes. Changing a worksheet style setting affects all cells in the worksheet. A row setting affects all child cells in the row, and a single cell setting affects, well, that very cell. Example: 
sheet[1, 3].Font.Bold = true;
All the functionality of a style is implemented in a class XmlStyle. You can create an instance to XmlStyle in your code and assign it to the Style property which is present in all the cells, rows, and worksheets. Example: 
XmlStyle style = new XmlStyle();
style.Font.Bold = true;
sheet[1, 3].Style = style;

RANGES

The main reason for writing my own implementation was ranges, which I found missing, or not having the powers which ranges should have. In this library, ranges are very powerful and extendible. Ranges have all the style elements found in cells, rows, and worksheets. Example:
// This sets the text of cells 1-8 of row 3 to bold
Range range = new Range(sheet[0, 2], sheet[7, 2]);
range.Font.Bold = true;
Even this is valid code, although many might recommend doing it the first way... 
new Range(sheet[0, 2], sheet[7, 2]).Font.Bold = true;
Please note that ranges can not be assigned to a cell value. Assigning it will generate an empty cell. A range can contain a single cell or a range of cells. In the above example, we are providing the constructor with the first cell and the last cell. Ranges always contain rectangular ranges, just like in Excel

Applying auto filter to ranges

To apply auto filter to a range, you only need to call the range's AutoFilter method and you are done. Example:
new Range(sheet[0, 1], sheet[6, 3]).AutoFilter();

Absolute and non-absolute ranges

By default, all ranges output a non-absolute reference. To set up an absolute reference, just set the Absolute property of the range to true
Range range = new Range(sheet[0 ,2], sheet[7, 2]);
range.Font.Bold = true;
range.Absolute = true;

FUNCTIONS

Now, we come to the real use of ranges and their Absolute property: adding functions. I think a function in my library can be easily understood by this example which uses the first example in this article. 
sheet[7, 3].Value = FormulaHelper.Formula("sum", new Range(sheet[3, 3], sheet[6, 3]));
or:
sheet[7, 3].Value = new Formula().Add("sum").StartGroup().Add(
            new Range(sheet[3, 3], sheet[6, 3])).EndGroup();
When you will open this book in Excel, the value of the cell will be =SUM(D4:G4)

Function parameters

Here, we have added a single parameter in the formula constructor. You can add as many parameters as you want using theAdd function of the Formula class. Only two types of parameters are allowed though, string or Range. The stringparameter type can be used to add any value and named ranges also (read more about named ranges in the documentation). Example:
Formula formula = new Formula().Add("sum").StartGroup();

formula.Add("D4").Operator(',');

// Here I am using the object initializers just to fit the code in one line
// The library is compatible with both VS2005 and VS2008
formula.Add(new Range(sheet[4, 3]) { Absolute = true } ).Operator(',');

formula.Add(new Range(sheet[5, 3], Range(sheet[6, 3])).EndGroup();

sheet[7, 3].Value = formula;
When you will open this book in Excel, the value of the cell will be =SUM(D4, $E$4, F4:G4).

Filtering cells as parameters by checking the cell value or style

You can filter all cells and auto add them to the parameter list of a formula by passing a parameter, i.e., a delegate which accepts Cell as its value and returns bool to both the Formula constructor or Add. All the value accessors (i.e., Value,IntValue, etc.) and cell styles can be checked. Examples: 
  • Example 1:
  • // Lets assume column 1,2,3,6 and 7 are bold...
    XmlStyle style = new XmlStyle();
    style.Font.Bold = true;
    
    // VS2008 style
    sheet[7, 3].Value = FormulaHelper.Formula("sum", 
                        new Range(sheet[0, 3], sheet[6, 3]), 
                        cell => cell.Style == style);
    
    // or VS2005 style
    sheet[7, 3].Value = FormulaHelper.Formula("sum", 
                        new Range(sheet[0, 3], sheet[6, 3]), 
                        delegate (Cell cell) { return cell.Style == style; } );
  • Example 2:
  • sheet[7, 3].Value = FormulaHelper.Formula("sum", 
                        new Range(sheet[0, 3], sheet[6, 3]), 
                        cell => cell.GetValue<int>() > 10000 && 
                        cell.GetValue<int>() <= 50000);
In the first example of style, the value of the cell will be =SUM(A4:C4, F4:G4). Continuous range matching to true will be joined as one parameter, i.e., A4:C4, and not three parameters, i.e., A4,B4,C4.

MODIFYING IMPORTED EXCEL XML FILES

Imported Excel XML files can be modified directly via direct assignment just like new files. Further to this, there are many functions which allow insertion, deletion, and addition of...
  • One or multiple worksheets in books, e.g., InsertSheetBeforeInsertSheetAfter.
  • One or multiple rows and columns in sheets, e.g., InsertColumnAfterInsertColumnsAfter,InsertRowBeforeInsertRowsBefore.
  • One or multiple cells in rows e.g., InsertCellBeforeInsertCellsBefore.
See the documentation for more on these functions.

EXPORTING A DATASET TO AN EXCELXMLWORKSHEET

A static member in ExcelXmlWorksheetDataSetToWorkbook, is provided which converts a DataSet and returns a Worksheet reference. All the tables are converted into different sheets of the Workbook.

Usage

ExcelXmlWorksheet sheet = ExcelXmlWorksheet.DataSetToWorkbook(sourceDataSet)

CELL COLLECTION

Cell collection is a strongly typed List collection with full support for LINQ. You can use the Add method to add aWorksheetRangeRow, or Cell. You can add all cells, or you can filter the cells using a predicate.

MEMORY

Looking at all this code might make you think that all the cells, rows, Worksheets, ranges must be using too much memory. They must also be having their own separate copy of styles which will cause extra overhead. The answer is no. 
I have optimized the library to use as little memory as possible. As far as the styles go, if you have a 100,000 cell Workbook written programmatically, which contains only 10 individual styles, the number of styles in memory will only be 11, i.e., 10 separate styles + 1 default style. Although the styles are added on a book level, if you have 10 books with the 10 same styles present in all of them, the number of style instances active in the program will be 110. 

CONCLUSION

I will love to hear your comments and suggestions. Any bugs can be reported here.

UPDATES

Code breaking changes in v3.29

The SetHeaderFooterMargin method in PrintOptions has been removed. Use the HeaderMargin andFooterMargin properties instead.

Code breaking changes in v3.06

The formula system does not work the way it used to, so the previous code might break. For backwards compatibility, I have included a static class FormulaHelper which can be used with the previous code. Just replace the previous code in the following way:
cell.Value = new Formula("Sum", new Range(sheet[3, 3], sheet[6, 3]));
with:
cell.Value = FormulaHelper.Formula("Sum", new Range(sheet[3, 3], sheet[6, 3]));

No comments:

Post a Comment

Note: only a member of this blog may post a comment.