Real-World Refactoring: Complex LINQ-ing

LINQ is so powerful, you can create a one-liner LINQ statement to create a report. In today's post, we put LINQ to the test by refactoring a user's piece of code

Written by Jonathan "JD" Danylko • Last Updated: • Develop •

Man under a car fixing it

In a recent post, I mentioned a number of ways to speed up an Entity Framework LINQ query using different techniques.

These techniques provide us with a way to refactor an existing LINQ query to make it faster.

But what if you don't even have a LINQ query yet? Or you want to take an existing routine and replace it with a LINQ statement?

"Real-World" Example

I did find an example on Reddit a while ago and wanted to refactor this using LINQ.

While this isn't considered a "real-world" example (based on the Exercise21 project name), I have seen code like this before in the wild.

I thought this would be a good code-kata so we could demonstrate a refactoring.

After looking it over, I observed a couple of things:

  • They started at the Customer-level for their query.
  • There's only a Orders table with no details of the order
  • They do calculate an order total at the end for reporting purposes

Keeping these things in mind, I wanted to create a table structure similar to what was given in the example.

This reminded me of the Northwind database from my Collection: SQL Server Sample Databases post

It contains a similar Customer/Orders structure so I downloaded and ran the DB script to create the database.

Again, we'll use LINQPad for our walkthrough like before.

One Caveat

Before we begin, I wanted to keep myself in check with this exercise.

I've met a number of developers in the past who wrote whole programs (> 30 lines) inside of one LINQ statement. While you can create very complex LINQ statements, there should be a question as to whether it's necessary or not.

As we walk through the Reddit example, the result is for demonstration purposes as to how flexible LINQ is when it comes to using arrays and database queries.

So let's dig into this.

Finding The Right Entry Point

First off, we need to find the right table.

In their example, they started with the Customer table. I only saw one more table used (Orders) so they have it easy.

Would it make more sense if they started with the Orders table and then join to a Customer table? When it's just two tables, personally, it doesn't matter. A join is a join regardless.

In our schema, things are a little more complicated.

If you remember, one of the techniques for optimizing Entity Framework was to start from a different position in the entity hierarchy.

If we are starting with the Customer table, we have multiple "hops" (3 joins) to other tables.

Customer/Orders/OrderDetails Schema

Since we're calculating totals, it makes sense to start with the OrderDetails table instead. So if we move around the tables, you'll see it a bit differently.

OrderDetails/Orders/Customers Schema

This cuts down on an "inner join" making it 2 joins instead of 3 giving us quicker performance.

For our initial query, let's start with our OrderDetails table.

Initial Query in LINQPad 6

You'll notice I have a simple filter so we can test and see the results of our query immediately.

If you look a little closer at the filter, we're accessing the CustomerID in the Order table. 

As a general guideline, you want to make sure you aren't seeing too many spots (periods in your filter). Too many periods translates to more inner joins.

Grouping By Year and Month

Looking at the example, they are grouping by Customer, Month, and Year.

This is achieved by using the LINQ GroupBy() method. Since this is a custom grouping with a number of properties, we create an anonymous object.

Grouping of fields

Simple enough!

But Where's The Totals?

With our grouped results, let's display what we have so far.

Pre-Totals

It seems we forgot our totals.

If you remember, we started in the OrderDetails entity and we need to get the Order and OrderDetails. By Customer.

Yikes!

Let's keep moving forward and .Sum() the Total field.

Oh wait! Since we're working off of a grouped result, we don't even have a total field.

How do we get the totals?

We can use the Customer entity to get the Orders entities to pull the order totals and discounts.

Ok, so how do we get the Customer entity? We can add it to the Grouping object and then Sum the totals.

Adding a customer to the grouping

After adding our Total field, let's run the query and see our totals.

Calculating the Totals

Ta-dah!

Oh wait...That's not good. Quite a coincidence that all totals are the same.

Hmm.

Ah! Need a filter for all of the totals plus we need to calculate the discount which, by the way, is on the Order entity.

Finally, we sort the results using a multiple sort order.

Finishing the totals

While we're at it, let's add an OrderBy as well.

Ordering the results

At first glance of the data, I thought I wrote the LINQ query wrong.

If you're wondering why the month goes 8, then 10 then 1, 3, and 4, it's because I sorted by Company, then Year, then Month.

Should We Stop Here?

We have everything we need for a report, right?

We could remove the company name in the filter and see if we get all of our companies.

Screenshot of LINQPad results

After running the query with all companies, it finishes at 1.5s. That's fast enough for our purposes.

At this point, I feel we need to snazzy it up with some formatting and additional grouping based on the person's expected output.

What we'll do is create formatted properties, group them by customer, and then display them in a grouped format.

Here's the results.

Screenshot of the Final Result

This is the output grouping requested in the Reddit post.

Was this really worth it?

Here comes the consulting answer: It Depends™

While this was a fun exercise as to how flexible (and fluent!) LINQ truly is, I always question the before and after of code refactorings.

  • Was it worth it to write this long of a LINQ statement to achieve the same results? (pushing up the glasses on my nose) Technically, the LINQ statement is only one line.
  • If a junior developer came in and saw the two examples, which one do you think they would want to debug/work with?
  • While I didn't benchmark their solution, would the original code be considered slower or faster than the LINQ statement? Even though I JUST wrote this, I would throw it out in a heartbeat if the original code was faster. Remember, it's all about performance. 

Everyone has their reasons for choosing whether to refactor code or not. Sometimes it makes sense to revert back to the original if you're not satisfied with the "after" results.

The Final LINQPad Code

Here is our final refactored LINQPad code.  

var queryResult = OrderDetails
    .AsEnumerable()
    // Filter out a single company for testing purposes.	
    .Where(ord => ord.Order.CustomerID == "ALFKI")
    // Group requires custom grouping
    .GroupBy(e => new
    {
       Customer = e.Order.Customer,
       e.Order.Customer.CompanyName,
       OrderYear = e.Order.OrderDate.Value.Year,
       OrderMonth = e.Order.OrderDate.Value.Month
    })
    .Select(group =>
    {
       return new
       {
          Company = group.Key.CompanyName,
          Year = group.Key.OrderYear,
          Month = group.Key.OrderMonth,
          Total = group.Key.Customer.Orders
                 .Where(order =>
                     order.OrderDate.Value.Year.Equals(group.Key.OrderYear)
                     && order.OrderDate.Value.Month.Equals(group.Key.OrderMonth))
                 .Sum(order =>
                     order.OrderDetails.Sum(detail =>
                     {
                        var price = (detail.UnitPrice * detail.Quantity);
                        if (detail.Discount > 0)
                        {
                           var discount = price * (decimal)detail.Discount;
                           return price - discount;
                        }
                        return price;
                     })
                 )
       };
    })
    .OrderBy(y => y.Company)
      .ThenBy(o => o.Year)
          .ThenBy(u => u.Month)
    // Strictly for output purposes from this point on
    .Select(output => new
    {
       Customer = output.Company,
       Year = output.Year,
       MonthAndYear = output.Month.ToString() + "/" + output.Year.ToString(),
       Total = output.Total.ToString("C")
    })
    .GroupBy(grp => grp.Customer)
    .Select(output => new
    {
       output.Key,
       Details = output.GroupBy(group => group.Year)
         .Select(item => new 
         { 
            Year = item.Key, 
            item 
         })
    });

queryResult.Dump();

Conclusion

In this post, we took a simple piece of code which reported on totals for the month and year and we refactored it into a single LINQ statement.

Every approach in LINQ is chainable (or fluent) and passes it into the next method until we project the results into a variable or loop.

If you want to examine more LINQ syntax, I would recommend the following posts:

Did you find this post helpful? Was it unnecessary to refactor? How would you refactor the code? Would you? Post your comments below and let's discuss.

Did you like this content? Show your support by buying me a coffee.

Buy me a coffee  Buy me a coffee
Picture of Jonathan "JD" Danylko

Jonathan Danylko is a web architect and entrepreneur who's been programming for over 25 years. He's developed websites for small, medium, and Fortune 500 companies since 1996.

He currently works at Insight Enterprises as an Principal Software Engineer Architect.

When asked what he likes to do in his spare time, he replies, "I like to write and I like to code. I also like to write about code."

comments powered by Disqus