by Irina
2. January 2020 00:00
Here I am trying to gather information that helps to quickly pull out concepts and methods in the areas of Marketing, Risk Management, Data Mining, Theoretical and Practical Statistics, and SAS tips.
Some of the things were found over the internet and had been changed and some were written by myself.
My apologies to those who feel related to the information without his name being mentioned.
Please direct content’s related questions or advise any topic related to data analysis to this link http://forum.gotstat.com/
by Irina
17. February 2009 08:33
Nominal Data
With nominal data, as the name implies, the numbers function as a name or label and do not have numeric meaning. For instance, you might create a variable for gender, which takes the value 1 if the person is male and 0 if the person is female.
There are two main reasons to choose numeric rather than text values to code nominal data: data is more easily processed by some computer systems as numbers, and using numbers bypasses some issues in data entry such as the conflict between upper- and lowercase letters.
Ordinal Data
Ordinal data refers to data that has some meaningful order, so that higher values represent more of some characteristic than lower values. For instance, in medical practice burns are commonly described by their degree, which describes the amount of tissue damage caused by the burn. A first-degree burn is characterized by redness of the skin, minor pain, and damage to the epidermis only, while a second-degree burn includes blistering and involves the dermis, and a third-degree burn is characterized by charring of the skin and possibly destroyed nerve endings. These categories may be ranked in a logical order: first-degree burns are the least serious in terms of tissue damage, third-degree burns the most serious.
However, there is no metric analogous to a ruler or scale to quantify how great the distance between categories is, nor is it possible to determine if the difference between first- and second-degree burns is the same as the difference between second- and third-degree burn.
Interval Data
Interval data has a meaningful order and also has the quality that equal intervals between measurements represent equal changes in the quantity of whatever is being measured. Example of it – is the Fahrenheit scale, like all interval scales, has no natural zero point, because 0 on the Fahrenheit scale does not represent an absence of temperature but simply a location relative to other temperatures.
Multiplication and division are not appropriate with interval data.
Ratio Data
Ratio data has all the qualities of interval data (natural order, equal intervals) plus a natural zero point. Many physical measurements are ratio data: for instance,height, weight, and age all qualify.
Continuous and Discrete Data
Another distinction often made is that between continuous and discrete data.
Continuous data can take any value, or any value within a range. Most data measured by interval and ratio scales, other than that based on counting, is continuous: for instance, weight, height, distance, and income are all continuous.Discrete data can only take on particular values, and has clear boundaries .As the old joke goes, you can have 2 children or 3 children, but not 2.37 children, so “number of children” is a discrete variable.
Nominal data is also discrete, as are binary and rank-ordered data.
OReilly .Statistics in a Nutshell
by Irina
25. October 2007 01:44
I. Purpose of the Proposed Research Project
:
Includes a clear expression of the decision problem, information research problem, and specific research objectives.
II. Type of Study Includes discussions of the type of research design : (i.e., exploratory, descriptive, causal), and secondary versus primary data requirements, with some justification of choice.
III. Definition of the Target Population and Sample Size : Describes the overall target population to be studied and determination of the appropriate sample size, including a justification of the size.
IV. Sample Design, Technique, and Data Collection Method : Includes a substantial discussion regarding the sampling technique used to draw the required sample, the actual method for collecting the data (i.e., observation, survey, experiment), incentive plans, and justifications.
V. Specific Research Instruments : Discusses the method used to collect the needed raw data; includes discussions of the various types of scale measurement requirements.
VI. Potential Managerial Benefits of the Proposed Study : Discusses the expected values of the information to management and how the initial problem might be resolved; includes a separate discussion on the possible limitations of the study.
VII. Proposed Cost Structure for the Total Project : Itemizes the expected costs associated with conducting the research project; includes a total cost figure and any pricing policy for changes, as well as appropriate completion time frames (of specific tasks and/or total project).
VIII. Profile of the Researcher and Company: Briefly describes the main researchers and their qualifications; includes a general assessment of the company.
IX. Optional Dummy Tables of the Projected Results : Offers examples of how the data might be presented in the final report.
from Marketing Research JOSEPH F. HAIR, JR. ROBERT P. BUSH DAVID J. ORTINAU
by Irina
25. August 2007 08:58
Scorecards
are a common way of displaying the patterns found by a logistic regression model. They display in a clear, intuitive way the regression coefficients and can be used to perform risk evaluation operations (simplified predictions). For one particular state, y1, we start by extracting the coefficients (c0,c1, ...) that describe the logistic regression formula for that state.
We convert to 0 the minimal coefficient in each variable and the rest of coefficients transform in the way that difference between the minimal coefficients and the rest of coefficients remains the same These coefficients are then normalized between, say, 0 and 1000, giving an intuitive perspective on the relative importance of each coefficient. As each coefficient corresponds to a state of an input attribute, the normalized values are also describing the relative importance of each input attribute state. The score card presented here is computing these relative importance scores. Score cards check certain conditions, and for example, and if these conditions are met, points are added to an overall score.
proc logistic data=Panel OutModel= ModelParam namelen=200
descend ;
class &groupp
/ param=glm ;
model target=&groupp/selection=stepwise;
output out=toz_LOGISTIC_2 p=phat_new xbeta=xb;
ods output ParameterEstimates = coeff_est;
run;
proc sql ;
create table score_card as
select
b.*,
sum(max_est1/counter) as sum_max,
case when est1=max_est1 then 1 else 0 end as max_cat,
round(1000*((est1)/calculated sum_max)) as score
from (select
a.*,
max(est1) as max_est1,
count(*) as counter
from (
select *,min(Estimate) as min_est,
count(*) as counter,
case when calculated min_est=Estimate
then 0
else Estimate- calculated min_est end as est1
from coeff_est
where variable ne 'Intercept'
group by variable ) a
group by variable )b
;
quit;