The Most Important Excel Functions For Analytics – Part 2/3

The second installment of ‘The Most Important Excel Functions For Analytics’ covers adding logic to customize your data. We will walk through Excel functions that include logical statements (IF/AND/OR), SUM/SUMIF and COUNTA. For example, you may need to add columns with flags to distinguish ordinary vs premium products using an IF statement, count whether entries have any data with COUNTA or find the sum of only ‘medium’ sized products with SUMIF.

PART 2 – CUSTOMIZE YOUR DATA

The Excel functions most useful for customizing your data are:

  • Logical Statements (IF/AND/OR)
  • SUM/SUMIF/SUMIFS
  • COUNTA

4. Logical Statements (IF/AND/OR)

Formula:

IF(logical_statement, return this if logical statement is true, return this if logical statement is false)

Why do I care?:

The IF statement helps when you have 2 possible outcomes in a situation, like a decision fork. I.e. if the condition is true, do A otherwise default to B. For Business Analytics, the IF statement helps to add new fields and flags in the data to enrich the information.

The more complex version of an IF statement is the nested IF. This means you layer in multiple IF statements into an expression. For example it may look like IF(A3<Condition1,”Output1″,IF(A3<Condition2,”Output2″,otherwise “Output3). This will make more sense with an example below and is very common in real world scenarios.

What is an Example?:

If we have retail prices for our SKUs and we want to add a flag for products that are under $50 add a flag ‘Ordinary’ and those that are over $50 add a flag ‘Premium’. We would add a new field for a premium flag in column E and add the below code in cell E3 and drag the formula down.

IF statement for premium flag

If we want to layer more logic into an IF statement we use a nested IF. For example if we want anything <50 to be ‘Ordinary’, between 50 and 100 to be ‘Premium’ and > 100 to be ‘Unbelievable’, we add the following code into cell E3.

Finally, you can add in AND/OR statements for further logic. For example if you wanted a flag to see if sizes were either S or M and are also Blue, then you could use the below code.

AND / OR statements were very similar. They both have the same structure where the logical statement is inside the parentheses. If you wanted to say Jake and Jim in excel code, you would write AND(A1=”Jake”,B1=”Jim”). If you want to learn more on how AND statements work, see the below link.

AND Statement

Insights:

  • Excel 2003 supports 7 nested IF statements while after Excel 2007 you can have up to 64 nested IF statements. However, once you reach anything close to 7 layers, following the logic can get super difficult and makes handing off the Excel model to someone else very difficult
  • An alternative to a complex IF statement is to create a separate lookup table and use a VLOOKUP function. You create a lookup key in the primary table using a concatenation of fields (we will see the concatenation function shortly) and then use a VLOOKUP on your new table
  • Excel needs the correct number of parentheses, one of the common errors of a nested IF function failing is not enough parentheses on the end of the function. I usually add an excess number if i am not sure and Excel will prompt for the correct number.

5. SUM/SUMIF/SUMIFS

Formulas:

SUM(number1,[number2],…)

SUMIF(range, criteria, sum_range)

SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2, criteria2], …)

Why do I care?:

The SUM function is one of the most common summary functions you will use. You can sum an entire column, a range of cells or specific cells. The SUMIF function takes the basic SUM formula and layers in conditional logic. The most common application of SUMIF that I have used is as a live Pivot table. In a pivot table you have to refresh the table if you make changes to the underlying data. However, if you set up a series of SUMIF statements, the data is automatically updated.

Note that the only difference between a SUMIF and SUMIFS is that the latter can have multiple conditions, whereas the former can only take one condition.

What is an Example?:

If we take our SKU example we will see the difference ways to use SUM / SUMIF / SUMIFS.

Starting with the SUM function, we can see there are a few different ways to add all the Qty values in column C. You can use a range, select the entire column or mix specific and range values.

The SUM function

Next we want to SUM based on some criteria. We first use the SUMIF function to add only quantities >20. Note that when you use logical operators in a SUMIF, you need to enclose these in quotation marks. Following this we SUMIF size is equal to “M”. Notice how the first range is the criteria and the last range is the sum range. Lastly, we use the SUMIFS to use 2 criteria, where size =”M” and category =”Women”. Take note that for SUMIFS the sum range comes first, followed by the criteria range  and the criteria.

The SUMIF function

Insights:

  • Always make sure that your row ranges for criteria and sums match. For instance if you use B2 and D3 as the start of your ranges e.g. SUMIF(B2:B11,”M”,D3:D11), you will get an error
  • SUMIFS are useful for model building where you want an automatically updated SUM that acts like a live pivot table
  • COUNTIF and AVERAGEIF are two other useful functions that have similar syntax to SUMIF

6. COUNTA

Formula:

COUNTA(value1, [value2], …)

Why do I care?:

Sometimes you need to count anything that is not blank, this includes text fields or those that return an error such as #DIV/0!.

What is an example?:

Suppose I have a list of 3 numbers, a text entry and an error in a calculation cell, so 5 entries all up. If I use COUNT, I will only get the 3 numbers. If I use COUNTA, I will get all 5 entries.

The difference between COUNT and COUNTA

Insights:

  • A common use of COUNTA is when you have data that is not all of a uniform type, i.e a mix with of integers and strings. This is often the case when you do data blending and you want to check how many entries you have

Well done, you know know the fundamentals of customizing your data in Excel. The last blog of this series, The Most Important Excel Functions for Analytics – Part 3/3 , will cover some cool text manipulation techniques.

 

Share on LinkedInEmail this to someoneTweet about this on TwitterShare on FacebookPrint this page

Leave a Reply

Your email address will not be published. Required fields are marked *

*