2. Methodological issues
Methodological issues in this paper stand on the concept of
informality and the identification of dynamism amongst the IPUs through
discriminatory and multivariate analyses.
The concept of informality
The debate on a universal definition of informality is still
pending. The term «informal» was used for the first time by Hart in
1971. It has been reemployed by the ILO in its report on Kenya in 1972. This
evocation has underlined seven criterions to identify IUPs: exclusive use of
local resources, family ownership of the unit, reduced scale of the activity,
use of techniques that are essentially man power demanding, skills of the
manpower are acquired out of formal training institutions and highly
competitive markets without regulation. These characteristics were too numerous
for a single unit to meet them all. Further criterions were therefore
restricted to the scale and the lawfulness of the unit. The criterion of scale
is the most easy to mobilize because it requires just a unit to have less that
a threshold of employees (usually, 10). The scale criterion is not
appropriate for international comparisons though, and doesn't take into account
the smallness of enterprises like attorney offices, notaries, accountants that
are modern and most of the time very profitable. To avoid that insufficiency,
the criterion of legality has been settled. According to this criterion, an IUP
is the one that does not respect the law, the pending question here still
being; which laws among the numerous existing are required? This led to the ILO
combining the criterions of smallness in terms of employment and non
registration of the unit or of the regular workers. The survey 1 23 that we
will use in this paper has considered informal, any activity without a tax
payer identification number and/or not handling written accounts according to
the scheme required by the law.
Measuring the economic dynamism of
IUPs
Among the possible variables like sales, numbers of employees,
etc, profits have been chosen as the variable to discriminate between the
less and the more effective. The less effective group will be
constituted of IUPs that make monthly profits which are less than the
nationwide median, the more effective being those with monthly profits which
are more than the nationwide median. The profit is defined as the difference
between sales and costs (mainly salaries and taxes). After we decided on the
discrimination criterion, the concern was now to extract from the huge database
the more relevant variables likely to explain the ranking in one group or
another. The Principal components analysis (PCA) has been operated to realize
the variable specification. The PCA like the factorial analysis are statistical
tools that summarize the variability among a set of numerous variables. In
fact, they seek to describe the variation of a given set of variables as linear
combinations of the original variables in which each linear
combination is aimed at explaining a maximum of variation of
original variables without being correlated to the other linear combinations.
Most of the time, analysts just focus on the first two linear combinations that
by definition explain most of the variability. It is therefore possible to
scatter plot the IUPs according to the two axes obtained from the first two
linear combinations and to represent the variables in the circle of correlation
comprising the above mentioned axes.
The next step was to apply the multivariate discriminatory
analysis techniques to differentiate the two groups of IUPs so that an
anonymous IUP could be ranked in the appropriate group knowing only some core
characteristics. For this purpose we both operated the so called credit scoring
techniques and the logistic regression. The credit scoring is used in several
areas like medicine, meteorology or finance, the latest using it to identify
solvable clients. It consists of performing comparison tests using the Wilks'
Lambda (£) as statistics' test on the core variables identified
through the PCA process. Its applicability requires the observance of two
hypotheses that are the equality of the covariance matrix of the two groups and
the normality of the distribution of each population group. If £
tends towards 1 its influence on the differentiation is not relevant,
in the contrary, the further it goes below 1, the more it influences
the differentiation. Mindful that the Credit score technique requires the
observance of these strong hypotheses, it is easier to cross over those
requirements by applying a logistic
p
regression. The Logit function is defined as
LogitP fi fi X
= + where designates the
i i
i = 1
coefficients, i the index of the variable, X
the variable, p the number of variables and P the probability
of being ranked in the effective group. The above equality corresponds to the
expression: P(Y=1/X=x) = 1/(1 _{+e}(/31x1+...+
/3pxp)_{.}
The estimation of coefficients uses the maximum likelihood.
The normality of the distributions of variables is required. We ranked an IUP
in the effective group if its probability was more than 0.5. From the
above process we could deduce the score of effectiveness defined as S(x) =
/31x1+... + /3pxp and then rank the IUP according to
their results in the scoring process.
We have deliberately chosen just to display the results
obtained from the logistic regression because they have been found more
relevant than the credit scoring method. In fact, the matrix of confusion of
the logistic regression was stronger than the credit scoring one.
