Modelling and Methods in Research

67 %
33 %
Information about Modelling and Methods in Research

Published on December 26, 2008

Author: bhanumurthykv


Slide 1: Presented @International Congress on Pervasive Computing and Management Prof. K.V. Bhanu Murthy, Department of CommerceDelhi School of Economics Date 13.12.2008 Concurrent session 3- Doctoral Workshop Research Methodology – Modeling and Methods Basics : Basics A model is a fundamental tool in the building, modification, and acceptance of a scientific theory and moreover, it serves the pedagogical purpose of explanation and elucidation of complex ideas in simple, schematic form. The characteristics and operational uses of every model are determined by the principles upon which the model is built and the objectives which it pursues. Economy of thought : Economy of thought Corrado Gini systematized a series of characteristics of these models, especially those referring to Economics, in two lectures delivered in Milan at the University Luigi Bocconi, in 1952. He pointed out that the main object of the model, like that of science itself, is to obtain an economy of thought. The models analyzed in his lectures referred mainly to economic phenomena. Model defined : Model defined In economics, a model is a theoretical construct that represents economic processes by a set of variables and a set of logical and/or quantitative relationships between them. The economic model is a simplified framework designed to illustrate complex processes, often but not always using mathematical techniques. Frequently, economic models use structural parameters. Structural parameters are underlying parameters in a model or class of models. A model may have various parameters and those parameters are pre-defined. Their meaning is fixed but they take various values to create various properties. Two Types of Models : Two Types of Models There are two types of models for doing analysis: Behavioral Empirical The former is used for behavioral analysis the latter is used for empirical (statistical or quantitative) estimation. In behavioral analysis we study the various economic agents or entities, their functions[1], responses and objectives. We also study the nature of interaction between these behavioral entities. The following is an example of a behavioral model, at the firm level. [1] It is important to distinguish between functions and functionality. The former refers to the role of an entity or agent. The latter refers to the dependence relationship, as we shall be seeing under parametric estimation. Figure 1Market Model for profit: : Figure 1Market Model for profit: Price / Output Markets Demand Supply Market Investment Physical Investment Current Cost Gross Receipts Profits Dividend Consumer Owner Model is just a structured basis for analysis. : Model is just a structured basis for analysis. The advantage of a behavioral model is that it is basic and yet can be very complex. It has greater richness and variety. But very often it involves descriptive details and behavioral patterns that cannot be measured in their entirety. Precisely because of this reason it cannot be used for measurement, estimation and prediction in its fullness. It may have to be substituted or supplemented by an empirical analysis, which often has to be made more simplified. This leads to certain problems in measurement and prediction. Empirical Analysis : Empirical Analysis In the case of empirical analysis we concretize these aspects of behavior into specific measurable variables. A variable is one that can take different values and hence, it can be used for storing data (i.e., a series of data). A large number of such variables is a ‘data set’. The series could be used to collect data for different individuals[1] (firms or countries) under study or different values of a particular individual at different points of time. Each ‘data point’ is known as an observation. [1] Individuals are usually denoted a ‘n’ and points in time as ‘t’. Data series : Data series Accordingly, we can have either cross-sectional data or time series data. The period for repeated observations of data are taken is known as the frequency or periodicity of data (like weekly, quarterly or annually)[1]. We could also combine the two and have panel data or a pooling of cross sectional and time series data. Panel data is also known as longitudinal data. For instance the same set of families is observed over a period of time (say, ten years) for studying their consumption behavior. [1] In general, one should avoid using high frequency data, except if we wish to study intra-day stock price. Data Collection and Storage : Data Collection and Storage Variables are distinguished from each other in the terms of their ‘identifiers’ or variable names. Since they are identifiers and hence refer to certain aspects they must be self-explanatory. For instance, a variable that represents export demand for tea should be – Xdtea or domestic consumption should be Domcons. It is not advisable to have variable names like X1, X2, etc. that are too general. Nor should variable names be Gupta1, Gupta2, (that is named after ones own name). Since the data set is stored in files, this holds good for filenames as well. Data Base Format : Data Base Format The data needs to be stored in an orderly manner so that the data processing can be done efficiently. For this purpose, it is necessary to store it in a structured format that is universally recognized. Such a format is know as a ‘Data Base Format’. The structure of a database is as follows: Time Series : Time Series The first row contains the variable names or field names. Second row contains units. Third row contains identifiers. And the data is stored vertically in columns. Variable Consumption Income Price (Units) PPP$(‘000) PPP$(‘000) PPP$ Identifier ConX IncX PX 1991 32 55 201 1992 45 70 222 1993 66 101 251 Cross Sectional : Cross Sectional Variable Consumption Income Price (Units) PPP$(‘000) PPP$(‘000) PPP$ Identifier ConX IncX PX Market_1 32 55 201 Market_2 45 70 222 Market_3 66 101 251 Panel Data : Panel Data When ‘n’ (40) individuals – families or companies – are observed over ‘t’ (10) periods (weeks, months, years) the data is called panel data. The total data points are n*t (40). Database format for panel data: The data is stored in blocks of individuals for number of years. Panel database format : Panel database format Variable Consumption Income Price (Units) PPP$(‘000) PPP$(‘000) PPP$ Identifier ConX IncX PX Household_1 1991 32 55 201 1992 45 70 222 1993 66 101 251 - - - - Household_2 1991 32 55 201 1992 45 70 222 1993 66 101 251 - - - - Base Year : Base Year Initial conditions The ‘butterfly effect’ is a phrase that encapsulates the more technical notion of sensitive dependence on initial conditions. Small variations of the initial condition of a dynamical system may produce large variations in the long term behavior of the system. Therefore, adequate care needs to be taken while selecting the base year or the initial period of the study. Raw Data : Raw Data Secondly, data could be collected from primary sources or could be from secondary sources. Primary data is data collected by means of a primary survey or is first hand data obtained by an opinion poll. Secondary data is that which is published. In both cases however one has to distinguish between raw data and ‘pre-processed’ data. Very often data may have gaps and may need some preparation or ‘pre-processing’ before the final stage of statistical estimation. The choice is between the loss of data points due to ignoring incomplete series of data (of certain individuals) and bearing the loss of some originality due to ‘pre-processing’ the data set. This involves the use of data filling techniques. Pre-processing of raw data : Pre-processing of raw data Data filling must be done scientifically. There is nothing unethical with small-scale data filling so long as proper procedures are used and are published along with the citation of data sources. Given below are some techniques for data filing: (we are considering weekly price data) If one (or more) data point is missing and if the same figure is repeated on either side, it must be assumed that the price has not changed. Thus, in such cases repeat the data in the missing week. If one data point is missing and if different figures appear on either side simply use the mean of the neighboring data points. If many data points are missing (say three to four) and the trend is clear in the preceding period then just continue the trend (either rising or falling)[1]. [1] In Excell it is very easy to do such a thing by Auto fill. Data filling : Data filling If many data points are missing (say three to four) and the trend is cyclical. Then just continue the trend in the preceding period and the succeeding period such that they meet at some mid point. The mid-point would represent a trough or a peak. If data is missing (especially) at the end of a series and there is a geometric trend, fill in a growing trend of data. If no other pattern is visible then use the average of the series in the place of the missing value. Ordinarily, textbooks only provide the last solution of using the average. Certain econometric packages contain scientific methods of data filling. Even they basically use the averages. Empirical Methods : Empirical Methods There are two types of methods for doing empirical work: Parametric Non- Parametric In the case of parametric methods a functional relationship is set up between different variable. For instance, we could state that consumption is functionally related to national income. This is just as we would say in the case of a mathematical function that Y is a function of ‘x’; Y= f (x) In this case we would say that; C= f (Y) In this functional relationship the left-hand variable is the dependent variable (explained or regressand) and the right hand variable is the independent variable (explanatory or regressor). Parametric : Parametric In many text books parametric methods have been defined differently. We are referring to parameters in a different sense. Firstly, ‘parameters’ are things whose definition does not change but which can take different values. Secondly, statistically speaking parameters refers to the underlying characteristics of the population, such as population mean. In the context of regression, which is the primary method of estimation and prediction or forecasting, parameters mean both. Deterministicmodel : Deterministicmodel Parametric model could be of two types: Deterministic Stochastic A deterministic model contains no element of error. For instance: TC = f +a * q i.e., Total Cost = Total Fixed Cost + Average Variable Cost * Quantity An accounting model or equation, for instance, is a deterministic model. Basic Stochastic Model : Basic Stochastic Model The consumption function given below Cn = a + b * Y + e (where, Y= National Income and C= Consumption Expenditure, ‘a’ and ‘b’ are parameters and ‘e’ is the error term.) is a stochastic model since the exact level of consumption is not entirely determined or explained by income (the lone explanatory variable). Hence, it contains an element of error. Of course consumption is explained to the extent that income can explain it with the help of the coefficients or parameters of the model or equation. How are parameters estimated? : How are parameters estimated? What is the role of the parameters ‘a’ and ‘b’? Income Consumption a b ^ Cn Cn Method of Least Squares : Method of Least Squares The parameters are ‘unknown’ to begin with. We must use some procedure for estimating them. The commonly used procedure is the method of least squares. In other words, we use an OLS procedure (Ordinary Least Squares)[1]. The OLS procedure minimizes the sum of squares of deviations between Cn and Cn hat (^), which is nothing but the error term. Hence, once Cn is ‘determined’ or ‘estimated’ there is no longer any error term. The equation becomes (deterministic): Cn = a + b * Y ^ Thus, Cn = Cn + e [1] There are other methods that can be used for estimation of parameters, like Maximum Likelihood Estimates (MLE). From unknown to estimated : From unknown to estimated The parameter ‘a’ tells us that consumption has a minimum fixed level that is unrelated to income. Hence, it is a fixed parameter or the constant or the intercept. The other parameter ‘b’ tells us the marginal contribution of income in determining consumption. Hence, it gives us the addition to consumption due to income. That is, in the first instance we need to know all the data relating to income, as well as, consumption that can enable a ‘good’ estimate of the parameters ‘a’ and ‘b’[1].Still they are estimates. The underlying relationship in the population is not known. It is only estimated. [1] There are methods for checking whether the estimates are good when they are based on a limited sample. Forecasting : Forecasting Once the parameters are estimated they give us a more or less permanent basis of estimating consumption once income is known. The parameters can now be applied to a new set of income data to generate an estimate of consumption. This is known as predicted value or estimated value of the dependant variable or simply, Y hat (^). There are two uses of such estimates. Firstly, we may like to have an estimate that is free of an error term. Secondly, we can use the parameter for estimating consumption in the future. The former is known as within sample estimation and the latter is known as outside sample estimation. These procedures are also known as ‘prediction’ or ‘forecasting’. Multivariate model : Multivariate model The basis of any parametric empirical model is an analytical specification of the model. So far we have considered only a univariate model. Now we shall consider a multivariate model. For instance if we wish to study the demand for export of tea from India, we could conceive of the following analytical specification of the model. Dx = f (Price of tea, Price of coffee, World incomes, Exchange rate) Specification : Specification Such a specification is based on a functional relationship between the dependant variable Xd and the independent variables on the right hand side. Since the purpose is to estimate the demand for tea, this analytical specification needs to be translated into an empirical model. The model could be: Dx = a + b1 * Pt + b2 * Pc + b3 * Wy + b4 * X+ e Three questions : Three questions Here three questions arise. One, as to how well the above model is spelt out (specified), two, what kind of specific form the equation must be used and three, how to interpret the parameters b1, b2 and b3. The former is a problem of miss-specification. The latter is one of ‘functional form’. The third problem will be addresses last of all. Over or Under specification : Over or Under specification If too many variables have been included then there could be a problem of over specification because the list of explanatory variables would contain extra or irrelevant variables. On the other hand, if too few variables are included there could be a problem of under specification or omitted variables, like it cold have happened in the case of our consumption function that depended only on income. The interesting thing is that the explanatory power of the equation goes down in both cases. R2 (Adj.) which is a measure of goodness of fit will go down. Model Specification : Model Specification The best approach is to specify a model in terms of any existing theoretical model such as our consumption function. The second best approach is to use an existing tested empirical model. The third best approach is develop your own model. Functional Form : Functional Form The problem of ‘functional form’ relates to the precise nature of relationship between the dependent and independent variables. Is it linear? Log-linear? Non-linear? The more precise the understanding of the underlying functional form in a particular case the better are the estimates of ‘a’ and ‘b’. And so is estimate of Y hat. Some of the commonly used functional forms are: Types of functional forms : Types of functional forms Linear: Y = a + b * X + e Log-linear Semi –log: LogY = a + b *T + e Lin-log: Y = a + b Log*X + e Double log: Log Q = a + b1 * Log X1 + b2 * Log X2 + u Non-linear: Quadratic: Q = a - b1*T + b2*T2 + u Cubic: Q = a - b1*T + b2*T2 –b3*T3 + u Each functional form has its advantages and disadvantages. Interpretation : Interpretation Linear is the easiest to estimate. But the parameters (regression coefficients) are of limited use. How to interpret the individual parameters b1and b2? The firsts shows the marginal change in Q with respect to X1 when X2 is constant and b2 shows the marginal change in Q with respect to X2 when X1 is constant. It will be seen that in different functional forms the ? coefficients or parameters yield different measures. If any ? co-efficient has a t-statistic < 2 or p-value >0.05 it must be treated as zero. The variable has no effect on the Y variable. Annual Compound Growth Rate : Annual Compound Growth Rate A semi-log function is the best for determining growth rates. Log Yt = a + b * t ----------1 Log Y(t-1) = a + b * (t-1) ---------- 2 (Equation 1 minus Equation 2) LogYt – LogY(t-1) = b Or Log (Yt/Y(t-1)) = b Thus, b is a measure of the relative change of Y over time and hence it is the exponential growth rate. Partial Elasticities : Partial Elasticities Double log functions are appropriate where there is need to estimate the elasticities. The ? coefficients or explanatory parameters directly measure the partial elasticity. In the following equation: Log Q = a + b1 * Log X1 + b2 * Log X2 + u b1= LogQ/LogX1 and b2= LogQ/LogX2 Hence, b1 and b2 are the partial elasticity. Non-Parametric Analysis : Non-Parametric Analysis This consists of indices and non-parametric methods. In case no prediction needs to be made and the analysis just consists of studying trends in data or making certain comparisons then non-parametric methods can be used. Simple Indices : Simple Indices Relative Index for inter-firm / inter-country / inter-regional comparisons. Treat one country / firm / region as the base and express the indicators of other country / firm / region as a percentage of the base. (Say; GNP comparisons of South Asian countries) Ri = [(GNP)i / (GNP)b]*100 i= ith country (say, SriLanka) and b= base country (say, India) The base country / firm / region is also called the numeraire. Efficiency Index : Efficiency Index Efficiency Index for parametric model Suppose a regression equation is used for prediction of some dependant variable then the following index can be used for testing the efficiency of the regression model. Ei = (Ya / Yh)* 100 Where; Ya= actual value of dependent variable; Yh= estimated or predicted value of dependent variable Multiple Indices : Multiple Indices Given below are a set of indices used for non-parametric estimation. I had developed these for an exercise for studying the Public Distribution System in India. It involves the comparison of monthly per capita consumption of different states in India with allocation (or provision) of PDS one the one had and lifting (off-take or sale) on the other hand. The set of indices can be applied to any similar situation. (MPCCi = Monthly Per Capita Consumption in the ith Indian state) Relative Indices : Relative Indices 1. RmCi = (MPCCi / ?)*100 (Ratio of consumption of the ith state to the national average) 2. RaCi = (Ai/ MPCCi) *100 (Ratio of allocation of ith state to its per capita consumption) 3. RlCi = (Li/ MPCCi) *100 (Ratio of lifting of ith state to its per capita consumption) 4. RaA = (RaCi/? RaC)*100 (Ratio of RaCi to national average of RaC) Contd… : Contd… 5. RlA = (RlCi/? RlC)*100 (Ratio of RlCi to national average of RlC) 6. DiAC = (RaA/ RmCi)*100 (Distortion index of allocation to consumption) 7. DiLC = (RlA/ RmCi)*100 (Distortion index of lifting to consumption) Basically, the technique is of using a ratio of ratios. Nonparametric statistical methods : Nonparametric statistical methods Nonparametric statistical methods provide alternatives to classical analyses without making the usual assumption that data come from a normal distribution (or any other specific distribution). Some of the nonparametric procedures included are: Some Tests : Some Tests One Sample Analysis - sign test and signed rank test for location. Two Sample Comparison - Mann-Whitney (Wilcoxon) test and two-sample Kolmogorov-Smirnov test. Oneway Analysis of Variance - Kruskal-Wallis and Friedman tests. Multiple Variable Analysis - Spearman and Kendall rank correlation coefficients. Life Tables - Kaplan-Meier estimation of survivor functions. Some More Tests : Some More Tests Life Data Regression - Cox proportional hazards models. Distribution Fitting - chi-squared and one-sample Kolmogorov-Smirnov tests. Run Charts - runs tests for non-random behavior. Contingency Tables - Kendall's tau and other measures of association. Advantages and disadvantages : Advantages and disadvantages Advantages and disadvantages of nonparametric methods Inevitably there are advantages and disadvantages to non-parametric versus parametric methods, and the decision regarding which method is most appropriate depends very much on individual circumstances. As a general guide, the following (not exhaustive) guidelines are provided. Advantages of nonparametric methods : Advantages of nonparametric methods Nonparametric methods require no or very limited assumptions to be made about the format of the data, and they may therefore be preferable when the assumptions required for parametric methods are not valid. Nonparametric methods can be useful for dealing with unexpected, outlying observations that might be problematic with a parametric approach. Nonparametric methods are intuitive and are simple to carry out by hand, for small samples at least. contd/…. : contd/…. Nonparametric methods are often useful in the analysis of ordered categorical data in which assignation of scores to individual categories may be inappropriate. For example, non-parametric methods can be used to analyse alcohol consumption directly using the categories never, a few times per year, monthly, weekly, a few times per week, daily and a few times per day. In contrast, parametric methods require scores (i.e. 1–7) to be assigned to each category, with the implicit assumption that the effect of moving from one category to the next is fixed. Disadvantages of nonparametric methods : Disadvantages of nonparametric methods Nonparametric methods may lack power as compared with more traditional approaches [3]. This is a particular concern if the sample size is small or if the assumptions for the corresponding parametric method (e.g. Normality of the data) hold. Nonparametric methods are geared toward hypothesis testing rather than estimation of effects. It is often possible to obtain nonparametric estimates and associated confidence intervals, but this is not generally straightforward. Contd/… : Contd/… Tied values can be problematic when these are common, and adjustments to the test statistic may be necessary. Appropriate computer software for nonparametric methods can be limited, although the situation is improving. In addition, how a software package deals with tied values or how it obtains appropriate P values may not always be obvious. End!!! : End!!! Thank you! Any questions – shoot to

Add a comment

Related presentations

Related pages

Research Models and Methodologies

Research Models and Methodologies. ... Clarke, R. J. (2005) Research Methodologies: 23 methods (a.k.a. techniques) are used to reveal the existence of, ...
Read more

Formal Modelling Methods: Computational Models in ...

Formal Modelling Methods ... to computational methods and simulation ... and limitations of simulation models for management research.
Read more

Mathematical Methods of Operations Research -

This journal is jointly sponsored by Gesellschaft fuer Operations Research ... of every new issue published in Mathematical Methods of Operations Research.
Read more

Outline for a Morphology of Modelling Methods

types of modelling methods employed within Operations Research and the Management Sciences (OR/MS). ... of Operations Research and Management Science ...
Read more

ZARM: Micro Satellite Systems and Modelling Methods

The micro satellites and modelling methods group develops innovative micro satellite hardware as well as new numerical modelling methods within the scope ...
Read more

ResearchGate - Share and discover research

ResearchGate is a network dedicated to science and research. Connect, collaborate and discover scientific publications, ... 2016
Read more

Scientific modelling - Wikipedia

Scientific modelling is a scientific activity, ... Research in simulation and modelling of various physical systems; ... A Morphology of Modelling Methods.
Read more

ZARM: Research areas

System modelling is an important task for the evaluation of measurement and experiment performance as well as budget control on system level.
Read more