Theil index

 

The Theil index is a statistic primarily used to measure economic inequality[1] and other economic phenomena, though it has also been used to measure racial segregation.[2][3]

The Theil index TT is the same as redundancy in information theory which is the maximum possible entropy of the data minus the observed entropy. It is a special case of the generalized entropy index. It can be viewed as a measure of redundancy, lack of diversity, isolation, segregation, inequality, non-randomness, and compressibility. It was proposed by econometrician Henri Theil at the Erasmus University Rotterdam.[3]

FormulaEdit

For a population of N "agents" each with characteristic x, the situation may be represented by the list xi (i = 1,...,N) where xi is the characteristic of agent i. For example, if the characteristic is income, then xi is the income of agent i.

The Theil T index is defined as[4]

{\displaystyle T_{T}=T_{\alpha =1}={\frac {1}{N}}\sum _{i=1}^{N}{\frac {x_{i}}{\mu }}\ln \left({\frac {x_{i}}{\mu }}\right)}

and the Theil L index is defined as[4]

{\displaystyle T_{L}=T_{\alpha =0}={\frac {1}{N}}\sum _{i=1}^{N}\ln \left({\frac {\mu }{x_{i}}}\right)}


where \mu  is the mean income:

{\displaystyle \mu ={\frac {1}{N}}\sum _{i=1}^{N}x_{i}}

The Theil L formula represents the logarithm of the geometric-mean of the ratio: (mean income)/(income i), over all the incomes included in the summation. ...obviously a relevant fact for any range of incomes on the same side of the mean income. .

...showing that this form of the Theil has obvious, intuitive, plausible and natural justification, rather than just being justified in terms of entropy. .

Because a transfer from a larger income to a smaller one will change the smaller income's ratio more than it changes the larger income's ratio, the transfer-principle is satisfied by this index.

Of course, if desired, a weighting factor, such as (mean income)/(income i) could be included in the terms of the summation (as in the Theil-T formula above, with the income-ratios inverted), to weight the index in favor of more strongly counting changes to income-ratios in which income i differs from the mean income by a larger factor.

In Theil T, each income-ratio's logarithm is weighted by a factor equal to that income-ratio's own value. And so, if the income-ratio is 2, then the index's value is affected as if there were two of that person. ...a reasonable weighting if each income-ratio's importance is judged to be proportional to its own value...the factor by which a particular income differs from the mean income.


Equivalently, if the situation is characterized by a discrete distribution function fk (k = 0,...,W) where fk is the fraction of the population with income k and W =  is the total income, then {\displaystyle \sum _{k=0}^{W}f_{k}=1} and the Theil index is:

{\displaystyle T_{T}=\sum _{k=0}^{W}\,f_{k}\,{\frac {k}{\mu }}\ln \left({\frac {k}{\mu }}\right)}

where \mu  is again the mean income:

{\displaystyle \mu =\sum _{k=0}^{W}kf_{k}}

Note that in this case income k is an integer and k=1 represents the smallest increment of income possible (e.g., cents).

if the situation is characterized by a continuous distribution function f(k) (supported from 0 to infinity) where f(kdk is the fraction of the population with income k to k + dk, then the Theil index is:

{\displaystyle T_{T}=\int _{0}^{\infty }f(k){\frac {k}{\mu }}\ln \left({\frac {k}{\mu }}\right)dk}

where the mean is:

{\displaystyle \mu =\int _{0}^{\infty }kf(k)\,dk}

Theil indices for some common continuous probability distributions are given in the table below:

Income distribution functionPDF(x) (x ≥ 0)Theil coefficient (nats)
Dirac delta function{\displaystyle \delta (x-x_{0}),\,x_{0}>0}0
Uniform distribution{\displaystyle {\begin{cases}{\frac {1}{b-a}}&a\leq x\leq b\\0&{\text{otherwise}}\end{cases}}}{\displaystyle \ln \left({\frac {2a}{(a+b){\sqrt {e}}}}\right)+{\frac {b^{2}}{b^{2}-a^{2}}}\ln(b/a)}
Exponential distribution\lambda e^{-x\lambda },\,\,x>0{\displaystyle 1-} \gamma
Log-normal distribution{\displaystyle {\frac {1}{\sigma {\sqrt {2\pi }}}}e^{(-(\ln(x)-\mu )^{2})/\sigma ^{2}}}{\displaystyle {\frac {\sigma ^{2}}{2}}}
Pareto distribution{\begin{cases}{\frac {\alpha k^{\alpha }}{x^{\alpha +1}}}&x\geq k\\0&x<k\end{cases}}{\displaystyle \ln(1\!-\!1/\alpha )+{\frac {1}{\alpha -1}}}    (α>1)
Chi-squared distribution{\frac {2^{-k/2}e^{-x/2}x^{k/2-1}}{\Gamma (k/2)}}{\displaystyle \ln(2/k)+} \psi ^{{(0)}} 
Gamma distribution{\frac {e^{-x/\theta }x^{k-1}\theta ^{-k}}{\Gamma (k)}}\psi ^{{(0)}}{\displaystyle (1+k)-\ln(k)}
Weibull distribution{\displaystyle {\frac {k}{\lambda }}\left({\frac {x}{\lambda }}\right)^{k-1}e^{-(x/\lambda )^{k}}}{\frac {1}{k}} \psi ^{{(0)}}{\displaystyle (1+1/k)-\ln \left(\Gamma (1+1/k)\right)}

If everyone has the same income, then TT equals 0. If one person has all the income, then TT gives the result \ln N, which is maximum inequality. Dividing TT by \ln N can normalize the equation to range from 0 to 1, but then the independence axiom is violated: {\displaystyle T[x\cup x]\neq T[x]} and does not qualify as a measure of inequality.

The Theil index measures an entropic "distance" the population is away from the egalitarian state of everyone having the same income. The numerical result is in terms of negative entropy so that a higher number indicates more order that is further away from the complete equality. Formulating the index to represent negative entropy instead of entropy allows it to be a measure of inequality rather than equality.

Relation to Atkinson IndexEdit

The Theil index can be transformed into an Atkinson index, which has a range between 0 and 1 (0% and 100%), where 0 indicates perfect equality and 1 (100%) indicates maximum inequality. (See Generalized entropy index for the transformation.)

Derivation from entropyEdit

The Theil index is derived from Shannon's measure of information entropy S, where entropy is a measure of randomness in a given set of information. In information theory, physics, and the Theil index, the general form of entropy is

{\displaystyle S=k\sum _{i=1}^{N}\left(p_{i}\log _{a}\left({\frac {1}{p_{i}}}\right)\right)=-k\sum _{i=1}^{N}\left(p_{i}\log _{a}\left({p_{i}}\right)\right)}
where
  • i is an individual item from the set (such as an individual member from a population, or an individual byte from a computer file).
  • p_{i} is the probability of finding i from a random sample from the set.
  • k is a constant.[note 1]
  • {\displaystyle \log _{a}\left({x}\right)} is a logarithm with a base equal to a.[note 2]

When looking at the distribution of income in a population, p_{i} is equal to the ratio of a particular individual's income to the total income of the entire population. This gives the observed entropy S_{\text{Theil}} of a population to be:

{\displaystyle S_{\text{Theil}}=\sum _{i=1}^{N}\left({\frac {x_{i}}{N{\bar {x}}}}\ln \left({\frac {N{\bar {x}}}{x_{i}}}\right)\right)}
where
  • x_{i} is the income of a particular individual.
  • {\displaystyle \left(N{\bar {x}}\right)} is the total income of the entire population, with
  • N being the number of individuals in the population.
  • {\bar {x}} ("x bar") being the average income of the population.
  • {\displaystyle \ln \left(x\right)} is the natural logarithm of x{\displaystyle \left(\log _{e}\left(x\right)\right)}.

The Theil index T_{T} measures how far the observed entropy (S_{\text{Theil}}, which represents how randomly income is distributed) is from the highest possible entropy ({\displaystyle S_{\text{max}}=\ln \left({N}\right)},[note 3] which represents income being maximally distributed amongst individuals in the population– a distribution analogous to the [most likely] outcome of an infinite number of random coin tosses: an equal distribution of heads and tails). Therefore, the Theil index is the difference between the theoretical maximum entropy (which would be reached if the incomes of every individual were equal) minus the observed entropy:

{\displaystyle T_{T}=S_{\text{max}}-S_{\text{Theil}}=\ln \left({N}\right)-S_{\text{Theil}}}


When x is in units of population/species, S_{\text{Theil}} is a measure of biodiversity and is called the Shannon index. If the Theil index is used with x=population/species, it is a measure of inequality of population among a set of species, or "bio-isolation" as opposed to "wealth isolation".

The Theil index measures what is called redundancy in information theory.[4] It is the left over "information space" that was not utilized to convey information, which reduces the effectiveness of the price signal.[original research?] The Theil index is a measure of the redundancy of income (or other measure of wealth) in some individuals. Redundancy in some individuals implies scarcity in others. A high Theil index indicates the total income is not distributed evenly among individuals in the same way an uncompressed text file does not have a similar number of byte locations assigned to the available unique byte characters.

NotationInformation theoryTheil index TT
Nnumber of unique charactersnumber of individuals
ia particular charactera particular individual
x_{i}count of ith characterincome of ith individual
{\displaystyle N{\bar {x}}}total characters in documenttotal income in population
T_{T}unused information spaceunused potential in price mechanism[original research?]
data compressionprogressive tax[original research?]


DecomposabilityEdit

According to the World Bank,

"The best-known entropy measures are Theil’s T (T_{T}) and Theil’s L (T_{L}), both of which allow one to decompose inequality into the part that is due to inequality within areas (e.g. urban, rural) and the part that is due to differences between areas (e.g. the rural-urban income gap). Typically at least three-quarters of inequality in a country is due to within-group inequality, and the remaining quarter to between-group differences."[5]

If the population is divided into m subgroups and

  • s_{i} is the income share of group i,
  • N is the total population and N_{i} is the population of group i,
  • T_{i} is the Theil index for that subgroup,
  • {\overline {x}}_{i} is the average income in group i, and
  • \mu  is the average income of the population,

then Theil's T index is

{\displaystyle T_{T}=\sum _{i=1}^{m}s_{i}T_{i}+\sum _{i=1}^{m}s_{i}\ln {\frac {{\overline {x}}_{i}}{\mu }}} for {\displaystyle s_{i}={\frac {N_{i}}{N}}{\frac {{\overline {x}}_{i}}{\mu }}}

For example, inequality within the United States is the average inequality within each state, weighted by state income, plus the inequality between states.

Map of economic inequality in the United States using the Theil Index. A high positive theil index indicates more income than population while a negative value shows more population than income. A value of zero shows equality between population and income.
Note: This image is not the Theil Index in each area of the United States, but of contributions to the Theil Index for the U.S. by each area. The Theil Index is always positive, although individual contributions to the Theil Index may be negative or positive.

The decomposition of the Theil index which identifies the share attributable to the between-region component becomes a helpful tool for the positive analysis of regional inequality as it suggests the relative importance of spatial dimension of inequality.[6]

Theil's T versus Theil's LEdit

Both Theil's T and Theil's L are decomposable. The difference between them is based on the part of the outcomes distribution that each is used for. Indexes of inequality in the generalized entropy (GE) family are more sensitive to differences in income shares among the poor or among the rich depending on a parameter that defines the GE index. The smaller the parameter value for GE, the more sensitive it is to differences at the bottom of the distribution.[7]

GE(0) = Theil's L and is more sensitive to differences at the lower end of the distribution. It is also referred to as the mean log deviation measure.
GE(1) = Theil's T and is more sensitive to differences at the top of the distribution.

The decomposability is a property of the Theil index which the more popular Gini coefficient does not offer. The Gini coefficient is more intuitive to many people since it is based on the Lorenz curve. However, it is not easily decomposable like the Theil.

ApplicationsEdit

In addition to multitude of economic applications, the Theil index has been applied to assess performance of irrigation systems[8] and distribution of software metrics.[9

This article uses material from the Wikipedia article
 Metasyntactic variable, which is released under the 
Creative Commons
Attribution-ShareAlike 3.0 Unported License
.