Hi all, I would like to start here with article about methods of classification in ABC-analysis. I tried to find such compilation in Google, but couldn’t, even couldn’t find info about some methods described below.

# Introduction

ABC-analysis – one of simple objects classification methods using defined parameter. Although methods that use one parameter limited, ABC-analysis is one of the most popular analytical tool nowadays. Due to this popularity, plenty of algorithms for grouping items in ABC-analysis were developed. This, from the one hand, gives analyst rich choice of methods for different goals and technical resources, but, from the other hand, contains hidden threats in case of wrong usage of varying methods.

Process of ABC-analysis can be divide on five following steps:

• Definition of goals
• Selection of classification parameter
• Data collection and preparation
• ABC-classification
• Interpretation of results
1. On first step should be defined analysis goals. They influence on further steps.
2. On second step, depending on goals, parameter of objects classification should be chosen.
3. Third step contains process of collection and preparation of raw data for analysis.
4. Using one of algorithms and chosen parameter on fourth step should be processed ABC-classification.
5. On the last step, results can be observed and needed decisions taken.

In this article considered seven most used methods of grouping in ABC-analysis, shown their pros and cons, and in addition comparison of these methods.

# Classification methods in ABC-analysis

In basis of ABC-analysis lying rule of Pareto 20:80, which says – less part of objects makes prime contribution in the result. Thus, goal of ABC-analysis to divide set of objects on three groups using predefined parameter, when group A will contain objects with high value, B – medium, C – low values.

To divide groups A, B and C objects should be sorted in descending order and based on this diagram of running total built. Axis of diagram should be normalized to percentage values. Resulting graph bear the name of Pareto. For Pareto chart we can introduce definition of Pareto point as a point where sum of coordinates equals to 100%:

Definition. Pareto point – point on Pareto Chart with coordinated (Xp, Yp), for which works equality Xp+Yp=100%.

Pareto Chart visualize structure of objects by parameter. In pair with peculiarity of convex curve allows applying different methods of ABC-grouping.

Nowadays, description of following methods most often can be found in different literature:

1. Classical method
2. Method of sum
3. Differential method
4. Method of tangents
5. Method of polygon
6. Method of loop
7. Method of triangle

Some of these methods can be called “advanced”, some – outdated (but still popular), some – can be used only in particular cases and cannot be considered as effective tool for daily practice. Let’s consider each of this methods closely.

# 1.  Classical method

Classical method (in some sources – empirical) assume that on Pareto Chart borders of group A, B and C are fixed despite of structure of distribution. Such approach goes from past, when many economists and sociologists (Pareto, Lorenz, Gini etc.) discovered that the most of processes close to distribution 20:80 – 20% of objects give 80% of result. Therefore, on Pareto Chart segmenting group A – usually 10% positions, B – 20%, and C – 70%.

In variety of literature can be set other values for A, B and C (15-20-65). In some sources, you may find approach with segmentation of groups by contribution in result, e.g. A – 70%, B – 20%, C – 10%. There is no major difference in methodic.

Fixation of group borders in classical method essentially limit structure of objects distribution. For example, for objects with even structure of distribution (Pareto point = 50%, 50%) ranking objects on three groups has no sense, because all objects have same values of result. For objects with structure of distribution with strong move aside from point (20%, 80%) classical method will give bad results. On pictures below shown two Pareto Charts for two sets with coordinates of Pareto Point (10, 90) and (30, 70). In these cases, classical method gives big mistake in classification.  Pros of this method are an obviousness and simplicity of automation. However, despite these pluses, in common case classical method is too simplified and can give very high error.

# 2.  Method of sum

Method of sum is a modification of classical method. Instead of boundaries fixation on one of Pareto Chart axis, sum of diagram points should be taken and boundaries of groups can be defined as result of received values. For example, if use values of classical method then method of sum boundary between A and B goes through point, where sum of coordinates equal to 80% (~10% + 70%), and border between B and C goes through point with of coordinated 120% (~30% + 90%). Notice, that due to convex peculiarity of Pareto Chart, such point exists and unique. Method of sum has low accuracy as well as classical method due to same reasons – fixation of grouping boundaries.

Automation simplicity of this method can be considered as advantage.

# 3.  Differential method

Differential method takes its roots in math statistics. Crux of this method consists in calculation of average result and determination of quantiles for groups A, B and C. Instead of Pareto Chart in this method should be taken statistical distribution of Pareto (frequency histogram). For this purpose, all objects should be divided on categories with equal step size of result value (quantile) and built frequency distribution, where on axis X average values of quantiles used, on Y – quantity of objects in certain quantile (frequency). Quantity of quantiles depends on quantity of objects and vary from 10 to 30. Further, in group A should be taken quantiles with values in 3.5 times greater than average, and in group C – quantiles with values 0.8 and below (1.25 times less than average). Rest objects goes to group B.  Notice, when use multipliers 3.5 and 0.8 result are close to classical method for distributions with Pareto point (20, 80). In some sources can be found description of this method with multipliers 6 and 0.5 (A – 6 times greater than average, C – 2 times less), it is typical of distributions with Pareto point (5, 95).

Drawback of differential method again – fixed multipliers used for grouping. This makes accuracy of method dramatically lower and overlay pluses tied with simplicity of automation.

On pictures below shown examples of two sets for which differential method does not work.  * colour on pictures corresponds to optimal grouping (red – group with low values, yellow – group of medium values, green – group of high values).

# 4.  Method of tangents

Method of tangents – one of methods with unfixed frames. Idea of method is to draw two tangents on Pareto Chart, which define coordinates of A, B, C frames.

First tangent should be drawn as a line parallel to OO’ (pic below). Such tangent is unique due to convex feature of Pareto Chart. Received point K defines border of groups A and B.

Connect K and O’ and build second tangent parallel to KO’ to find boundary between B and C. Coordinates of new point L define desired boundary.

Notice that point K is a point of minimization of squares between Pareto Chart and segments OK and KO’. Thus, it is safe to say that first tangent divide set on two groups optimally, because each of two received parts of diagram lays most closely to rectilinear segment (square is minimal), that says that parameter volatile very weakly within received group or that mistake of classification is minimal. Same for point L: minimal square between chart and KO’ says that mistake of classification for second segmentation is also minimal here. However, partitioning of whole set on three groups by points K and L is not optimal (minimal mistake doesn’t equal to sum of two minimal mistakes).

One of main cons of method of minimal squares is non-uniqueness for some forms of convex curves (Pareto Charts). Assume that rectilinear segment exists on Pareto Chart: DE parallel to diagonal OO’. Then first tangent will go through whole segment DE. Thus points of this segment build equal squares with base OO’. This means that any point of DE is optimal and such classification is not unique.

Another minus of this method is its asymmetry relatively to Pareto Point, what increase chance of mistake. In addition, this method works badly for distributions close to boundary with Pareto Points

(50%, 50%)

Large group A and medium B and C. For such distribution ABC classification is useless – better to take everything in one group.

(~0%, 100%)

Method of tangents accurately defines group A but B and C grouping irrationally.  Main plus of this method is scalability of algorithm (complexity is O(cN)), and as a result simple automation.

# 5.  Method of polygon

Method of polygon, also known as method of double tangent, solving task of optimal segmentation of set on three groups. Thus, method of polygon can be called as enhancement of method of tangents. Idea of this method consists in finding points M and N on Pareto Chart so that square of polygon OMNO’ will be maximal or, what means the same, square between broken line OMNO’ and Pareto Chart is minimal.  Minimization of square allows dividing Pareto Chart on three groups so that each part lays most closely to straight segment (OM, MN, NO’). It says that volatility of parameter within groups is minimal and as a result mistake of classification minimal as well. Graphically problem of square minimization can be solved through building of two tangents to Pareto Chart parallel to ON and MO’. Points M and N define boundaries of A, B and C.

As in method of tangents, in method of polygon singular Pareto Charts exist, which can contain straight segments parallel to ON and MO’. Then classification will be non-unique.

Significant drawback of this method is its automation complexity, because complexity of algorithm is O(N2), which means non-scalability.

Major advantage of method of polygon is its accuracy of classification. From considered in this article methods, method of polygon has the lowest mistake. However, very high accuracy not always bring correct results due to specificity of ABC analysis. It shown in an example below, where all objects differ non-significantly from each other (and following logic of ABC analysis should be classified in one group).

# 6.  Method of loop

Method of loop solves problem of optimal segmentation on two groups – low and high values. As a base for this method taken curvature of a curve definition, which defined by its radius. Goal of this method – allocate such segment on Pareto Chart, which has maximal curvature (lowest radius of curvature), that is segment of conversion from more vertical (sharp) part of chart to more horizontal (flat). Such definition of task absolutely corresponds to goals of ABC-analysis, because objects with high values of results allocated on sharp part of diagram, with low – on flat.

Algorithm of method consists in building end of segment movement path, perpendicular to tangent to Pareto Chart. If length of segment L chosen properly and only one piece of chart with maximal curvature exists, movement path of normal should draw a loop.

This is because on sharp and flat parts of diagram radius is larger than on knee. Thus, requirement for loop existence: length L should be less than radius of curvature of sharp and flat parts, but greater than knee radius of curvature. Points on Pareto Chart, corresponding to knee part, are boundaries of A, B and C. These points are ends of perpendiculars, which goes from angular points of loop.

Main drawbacks of this method are non-existence of solution and non-uniqueness for specific Pareto Charts. For example, for Pareto Chart equal to part of circle solution doesn’t exist, because all parts of chart have same curvature and loop cannot be received with any length of perpendicular. On some forms of diagrams, several parts with identical maximal curvature exist. In such cases, we receive same quantity of loops as many such segments.

When used large step in iteration process for length of normal search, or when length of normal is not enough (distribution close to Pareto point (50%, 50%)), loop cannot be drawn. Pluses of this method is high accuracy of classification and scalability of algorithm (complexity O(cN)).

# 7.  Method of triangle

Method of triangle solves problem of optimal segmentation of two groups on Pareto Chart – high and low values. In basis of this method put math property, which says that set of all Pareto Charts going through one of Pareto Point lay in area, which limited by two ultimate Pareto Charts, having form of two-section broken lines. This property allows defining boundaries on chart, which determine zones of low and high values. For this purpose, need to build two lines going through points of break of ultimate diagrams, which will be parallel to diagonals (0%, 100%), (100%, 0%). Points of crossing of these lines and Pareto Chart define boundaries of groups A, B and C. Equations of desired lines expressed by coordinates of Pareto Point (Xp, Yp) Simplifying calculation of boundaries, we can take middles of segments, built by drawn lines with ends on ultimate broken lines. Then abscissas of these points can be found by equations  Main advantage of this method is its flexibility. Method shows high accuracy in area of average values of Pareto point, and in area of boundary values as well.

Necessary to notice that method of triangle fits to all goals of ABC analysis, because for any set of objects this method gives unique classification and fits to all boundary conditions, even to uniform distribution and close to them when almost all objects goes to one group B.

Method of triangle can be easily automated. Using formulas (3) and (4) we can build table of correspondence of Pareto points coordinates and values of groups A, B and C. Thus, task can be transformed into search of Pareto Point coordinates.

# Comparative analysis of methods

In comparative analysis summarized all described above pros and cons of methods in frames of six criteria:

1. Existence of classification for any type of distribution
2. Uniqueness of classification for any type of distribution
3. Accuracy of classification in area of average values
4. Accuracy of classification in area of Pareto point (50%, 50%)
5. Accuracy of classification in area of Pareto point (0%, 100%)
6. Complexity of automation

In table below, sign ‘-‘ set if method does not fit to criteria, sign ‘+’ if fits. Quantity of ‘+’ corresponds to rating of method in certain criteria (better fits to criteria – more signs ‘+’).

 Existence of classification for any type of distribution Uniqueness of classification for any type of distribution Accuracy of classification in area of average values Accuracy of classification in area of Pareto point (50%, 50%) Accuracy of classification in area of Pareto point (0%, 100%) Complexity of automation Classical method + + + – – ++++ Method of sum + + + – – ++++ Differential method + + ++ + ++ ++++ Method of tangents + – +++ – + +++ Method of polygon + – +++++ – ++++ ++ Method of loop – – ++++ – +++ + Method of triangle + + ++++ ++ +++ ++

From table it is clear that Classical Method and Method of Sum give inconsistent classification in area of Pareto points. Method of tangents easier to automate, however it lose to Method of Triangle in accuracy and gives non-unique classification in some cases. Method of Polygon has the highest accuracy but cannot guarantee uniqueness and existence of solution.

The most flexible method that fits to all criteria is Method of Triangle.

In next post, I’ll show how to build model for ABC analysis for SAP Business ByDesign based on Net Sales from Invoice Volume report and Power Pivot using Method of Triangles.