OracleClient vs ODP.NET Performance Benchmark

In my previous post I mentioned that I was doing some comparisons on the performance of System.Data.OracleClient (provided by Microsoft) versus the ODP.NET (provided by Oracle). Since we still use a lot of DataSets on our project I decided that the simpler example that could be used to measure the performance would be to fill a DataTable using a DataAdapter.

For the experiment I took a table that would fill the DataTable with about 100.000 records. I then created two classes, one for each provider. The classes accessed the same database using the same SQL. Here is the code for the class using the OracleClient provider: 

class MicrosoftProvider
{
    public static void RunTest()
    {
        var conn = new System.Data.OracleClient.OracleConnection(Params.ConnectionString);
        var cmd = new System.Data.OracleClient.OracleCommand(Params.CommandText, conn);
        var adapter = new System.Data.OracleClient.OracleDataAdapter(cmd);
        var dataTable = new DataTable();
        adapter.Fill(dataTable);
    }
}

And here is the class using the ODP.NET provider:

class OracleProvider
{
    public static void RunTest()
    {
        var conn = new Oracle.DataAccess.Client.OracleConnection(Params.ConnectionString);
        var cmd = new Oracle.DataAccess.Client.OracleCommand(Params.CommandText, conn);
        var adapter = new Oracle.DataAccess.Client.OracleDataAdapter(cmd);
        var dataTable = new DataTable();
        adapter.Fill(dataTable);
    }
}

I ran the test one after the other:

class Program
{
    static void Main(string[] args)
    {
        OracleProvider.RunTest();
        MicrosoftProvider.RunTest();
    }
}

To compare the execution of each provider I ran this code using the VS 2010 Profiler, which by the way is very cool. The visualization of the tests has improved a lot since VS 2008. Here are the results summary:

 Profiler Summary

 As you can see there is a huge difference! The ODP.NET had less than 10% of the samples taken during processing while more than 90% were taken while running the method with the OracleClient provider. The report will also show you the information in more detail:

 Main method details

And the coolest view of them all will even show the percentage used overlapping the code:

Cool, right? I think so! Ok, now back to our tests... Before wrapping the post there is still one test I could do. I wanted to measure the execution time in seconds of each routine. To do this I used the Stopwatch class.

class Program
{
    static void Main(string[] args)
    {
        Time(() => OracleProvider.RunTest());
        Time(() => MicrosoftProvider.RunTest());
    }

    static void Time(Action action)
    {
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        action.Invoke();
        stopwatch.Stop();
        Console.WriteLine(stopwatch.Elapsed);
    }
}

When I ran the code I got the following results:

ODP.NET: 20.6756 seconds

OracleClient: 41.7716 seconds

The conclusion for the article is simple. The ODP.NET provider is way faster than the Oracle provider offered by Microsoft. On top of it all, as I mentioned in my previous post, many of the classes in the System.Data.OracleClient namespace are marked as obsolete. Microsoft no longer is going to keep working on them in the future.

ODP.NET OracleParameter

The past weeks I've been doing a lot of work to improve performance at our application at work. One of the things I've done is to replace the Microsoft's OracleClient for the ODP.NET provider. In my tests I've been getting data access times up to 3 times faster than with the OracleClient. I'll post more details on this in a latter article, for now I wanted to focus on the following error I got when switching providers:

Oracle.DataAccess.Client.OracleException: ORA-01008: not all variables bound

 I found this strange since we have a data layer that abstracts the provider and we already have implementations for Oracle and SQLServer and everything was working great. It took me a while but I found out that this is was only happening in queries where I used the same parameter more than once. Here is an example:

SELECT NAME, AGE FROM EMPLOYEE WHERE :AGE > 18 AND :AGE < 40

 What I found out is that the default behavior for the ODP.NET provider is different than the OracleClient provider. The ODP.NET sets the parameters (OracleParameter) on the OracleCommand using the order of the parameters instead of the name. When this happens you'd have to declare the parameter twice with the same value. This behavior can be changed using the BindByName property on the OracleCommand.

var parameter = new OracleCommand();
parameter.BindByName = true;

The good part for us is that since we had an abstraction layer on top of our data layer all we needed to do was make a small change to the Factory class that creates the commands for us. Everything else remains untouched.

Hope this helps.

Oh, and by the way... on .NET 4 many types under the System.Data.OracleClient namespace is marked as obsolete. So you'd better watch out. 

MVC Storefront: Migrating to the Entity Framework

The title of the post is actually misleading. A better title (although too long) would be: Migrating the Repository concept used in the MVC Storefront to the Entity Framework.

Lately I’ve been playing more with the Entity Framework. The other day I was trying to replicate the way Rob Conery used the Repository Pattern in the MVC Storefront sample application and I was surprised to learn that the Entity Framework has some differences from LINQ to SQL that I did not expect.

My first test was very simple. I wanted to get an IQueryable of Posts with it’s respective Comments attached. The StoreFront did this in a really clever and simple manner. The following code was supposed to work:

class BlogRepository
{
    SOEntities ctx = new SOEntities();

    public IQueryable<Comment> GetComments()
    {
        return from comment in ctx.Comments.Include("Post")
               select new Comment()
               {
                   CommentId = comment.CommentId,
                   Author = comment.Author,
                   Body = comment.Body,
                   PostId = comment.Post.PostId
               };
    }

    public IQueryable<Post> GetPosts()
    {
        var posts = from post in ctx.Posts
                    let comments = GetComments()
                    select new Post()
                           {
                               PostId = post.PostId,
                               Title = post.Title,
                               Body = post.Body,
                               Comments = comments.Where(x => x.PostId == post.PostId)
                           };
        return posts;
    }
}

class Post
{
    public Guid PostId { get; set; }

    public string Title { get; set; }

    public string Body { get; set; }

    public IList<Comment> Comments { get; set; }
}

class Comment
{
    public Guid CommentId { get; set; }

    public Guid PostId { get; set; }

    public string Author { get; set; }

    public string Body { get; set; }

}

class Program 
{
static void Main(string[] args)
{
    BlogRepository rep = new BlogRepository();
    var posts = rep.GetPosts();
    posts = posts.Where(x => x.Title.Contains("MVC"));
    List<Post> postList = posts.ToList();
}
}

I get an error saying that Linq to Entities does not recognize the method GetComments().

It’s not a bug, it’s is just the way the EF works. The method GetComments can’t be translated to anything in the provider. But why does it work with Linq to Sql you ask? Because it converts the Linq expression in a different way. Sorry but the post would be too long if I got into this subject.

The way recommended by the EF team is you use the AsEnumerable() extension method, forcing the initial query to be executed and then you would be working with objects. Here is how I needed to change my code for this to work:

public IQueryable<Post> GetPosts()
{
    var posts = from post in ctx.Posts.AsEnumerable()
                let comments = GetComments()
                select new Post()
                       {
                           PostId = post.PostId,
                           Title = post.Title,
                           Body = post.Body,
                           Comments = comments.Where(x => x.PostId == post.PostId)
                       };
    return posts.AsQueryable();
}

This does work, but when ToList() method is called in the posts object, instead of filtering the posts in the sql code, all the posts are brought from the database and then they are filtered as objects. This is a work around but it’s not ideal.

I have another work around, it also isn’t the ideal but it does work in the sense that it will apply the filter in the database.

public IQueryable<Post> GetPosts()
{
    var comments = GetComments();
    var posts = from post in ctx.Posts
                select new Post()
                       {
                           PostId = post.PostId,
                           Title = post.Title,
                           Body = post.Body,
                           CommentsQry = comments.Where(x => x.PostId == post.PostId)
                       };
    return posts;
}

class Post
{
    public Guid PostId { get; set; }

    public string Title { get; set; }

    public string Body { get; set; }

    public IList<Comment> Comments { get; set; }

    public IQueryable<Comment> CommentsQry
    {
        set { Comments = new LazyList<Comment>(value); }
    }
}

The trick is that I introduced a IQueryable property in the Post class. This property receives an IQueryable and under the hood converts it to a LazyList and sets the Comments property. Doing this overcomes the limitation that the EF has in not letting you use constructors that take parameters in the EF code. It only allows parameterless constructors on the select projection.

If you run this code you will see that the records are filtered in the database. The reason I still don’t consider this solution ideal is that I had to make a change to my model class and created a property that is not related to the business but rather to a limitation imposed by the framework. I still don’t see a way around this. If anyone has any suggestions please do leave them as comments.

A very interesting post was written by Muhammad Mosa a while ago. It’s worth visiting his blog to checkout his idea.

When to use GUID over Autoincrement

The autoincrement feature in SQLServer tables is one that is really very handy. I’ve always like Oracle databases but it’s sequences are really much more complicated to manage than the autoincrement columns in SQLServer. If you are doing a project that is only going to run on Oralce or SQLServer either approach is fine and you won’t have any trouble.

The project I’m working on however has a requirement to work just the same on Oralce or SQLServer and the possibility that other databases may be added in the future. From day one my first worry was about if we were going to use Autoincrement and Sequences or we had to find another way out. The way out in this case would be to use GUID (Globally Unique Identifiers). In out case we chose to use Autoincrement for SQLServer and Sequences in Oracle. Mostly this approach was chosen because the guys that hired me thought that it would be much easier to search database for a record with a primary key like 1 or 15604 then something like 3F2504E0-4F89-11D3-9A0C-0305E82C3301. And I agreed with them on this, it is really simpler to say to the guy next to you to check if the person with id 2350 exists in the database the to tell him a GUID.

Of course we had a price to pay for this decision. Our DAL (Data Access Layer) was much more complicated because we had to deal with the Autroincrement vs Sequences problem. Imagine that you have a simple table called person with two columns only: ID and Name. Look at the difference in the insert clauses for each database:

Oracle
INSERT INTO PERSON (ID, NAME) VALUES (SEQ_PERSON.NEXTVAL, 'GABRIEL');
SQLServer
INSERT INTO PERSON (NAME) VALUES ('GABRIEL');

Observe that with Oracle I have to specify the PK column and in SQLServer I don’t and even worst I have to call the sequence SQL_PERSON to get the next value for the insert. If I want to return the value of the ID of the record we just inserted I’d have to do this:

Oracle
INSERT INTO PERSON (ID, NAME) VALUES (SEQ_PERSON.NEXTVAL, 'GABRIEL') RETURNING ID INTO :ID;
SQLServer
INSERT INTO PERSON (NAME) VALUES ('GABRIEL');
SELECT CAST(SCOPE_IDENTITY() AS INT);

I think you get the point right? There are a lot of differences in both databases! If we had choosen to use GUID id’s our life in the DAL would be much simpler. The GUID would be generated in the DAL which would make the insert statements very similar if not equal most of the time. Returning the generated GUID would also be a piece of cake. It’s true that using numeric PK’s creates better indexes and would consume less space but I find that these improvements would’ve been superseded by the ease in development.

So to sum up, when developing a system that should be database independent I would recommend using the table’s primary key’s as GUID in order to avoid using Autoincrement and Sequences. Your DAL layer would be more uniform and you could use ANSI SQL more often.

Just in case you are wondering we did when we couldn’t use ANSI SQL we created a SQL translator that would convert all the commands create for SQLServer to Oracle and this translator is part of our DAL generator so when I create a DAO for SQLServer the generator will create its’ structure and also Oracle’s all at once.