Skip Headers

SAS  SPSS  R-PROJECT  S-PLUS  PROC-SQL

통 계
프 로 그 램 비 교

Go to Documentation Home
HOME
Go to Book List
PRO_home
Go to Table of Contents
연구회
Go to Index
자료실
Go to Master Index
R-PROJECT
Go to Feedback page
MAIL

Go to previous page
Previous
       

18. 통계분석


1. SAS
2. SPSS
3. R-PROJECT
4. S-PLUS
5. PROC SQL

 


1. SAS

MAIN

 

* 기초 통계 테스트를 위한 SAS 프로그램.;

data temp;

 set BACK.mydata;

myDiff=q2-q1;

run;

 

* 간단한 결과 산출의 기초 통계;

proc means data=temp;

     var q1-q4;

run;

변수    N          평균값        표준편차          최소값          최대값

-------------------------------------------------------------------------

q1      8       3.2500000       1.4880476       1.0000000       5.0000000

q2      8       2.7500000       1.7525492       1.0000000       5.0000000

q3      7       4.1428571       1.0690450       2.0000000       5.0000000

q4      8       3.2500000       1.5811388       1.0000000       5.0000000

-------------------------------------------------------------------------

 

* 기초 통계 산출;

proc univariate data=temp;

      var q1-q4;

run;

UNIVARIATE 프로시저

                                           변수:  q1

                                             적률

                    N                    8    가중합                      8

                    평균              3.25    관측치 합                  26

                    표준편차    1.48804762    분산               2.21428571

                    왜도        -0.2167811    첨도               -1.4101977

                    제곱합             100    수정 제곱합              15.5

                    변동계수    45.7860806    평균의 표준오차    0.52610428

                                         기본 통계 측도

                             위치측도                  변이측도

                         평균     3.250000     표준편차         1.48805

                         중위수   3.500000     분산             2.21429

                         최빈값   2.000000     범위             4.00000

                                               사분위 범위      2.50000

                 NOTE: 표시된 모드는 3 모드(2 도수를 가지는) 중에 가장 작습니다.

                                      위치모수 검정: Mu0=0

                       검정            --통계량---    -------p-값-------

                       스튜던트의 t    t  6.177483    Pr >  |t|   0.0005

                       부호            M         4    Pr >= |M|   0.0078

                       부호 순위       S        18    Pr >= |S|   0.0078

 

* 빈도와 백분율;

proc freq data=temp

     tables workshop--q4;

run;

FREQ 프로시저

                                    누적       누적

workshop      빈도      백분율      빈도      백분율

----------------------------------------------------

       1           4     50.00           4     50.00

       2           4     50.00           8    100.00

                                   누적       누적

 gender      빈도      백분율      빈도      백분율

 --------------------------------------------------

 f                4     50.00           4     50.00

 m                4     50.00           8    100.00

                                 누적       누적

   q1      빈도      백분율      빈도      백분율

   ----------------------------------------------

    1           1     12.50           1     12.50

    2           2     25.00           3     37.50

    3           1     12.50           4     50.00

    4           2     25.00           6     75.00

    5           2     25.00           8    100.00

 

*---관계 측도.;

* 피어슨 상관계수;

proc corr data=temp;

     var q1-q4;

run;

피어슨 상관 계수

              H0: Rho=0 검정에 대한 Prob > |r|

                        관측치 개수

               q1            q2            q3            q4

 q1       1.00000       0.73952      -0.12500       0.88041

                         0.0360        0.7894        0.0039

                8             8             7             8

 q2       0.73952       1.00000      -0.27003       0.85064

           0.0360                      0.5581        0.0074

                8             8             7             8

 q3      -0.12500      -0.27003       1.00000      -0.02614

           0.7894        0.5581                      0.9556

                7             7             7             7

 q4       0.88041       0.85064      -0.02614       1.00000

           0.0039        0.0074        0.9556

                8             8             7             8

 

* 스피어만 상관계수;

proc corr data=temp spearman;

     var q1-q4;

run;

스피어만 상관 계수

             H0: Rho=0 검정에 대한 Prob > |r|

                       관측치 개수

              q1            q2            q3            q4

q1       1.00000       0.70005      -0.03965       0.86958

                        0.0532        0.9327        0.0050

               8             8             7             8

q2       0.70005       1.00000      -0.07857       0.88052

          0.0532                      0.8670        0.0039

               8             8             7             8

q3      -0.03965      -0.07857       1.00000       0.25774

          0.9327        0.8670                      0.5768

               7             7             7             7

q4       0.86958       0.88052       0.25774       1.00000

          0.0050        0.0039        0.5768

               8             8             7             8         

 

* 선형 회귀 분석;

proc reg data=temp;

     model q4=q1-q3;

run;

The REG Procedure

                                Model: MODEL1

                           Dependent Variable: q4

            Number of Observations Read                          8

            Number of Observations Used                          7

            Number of Observations with Missing Values           1

 

                             Analysis of Variance

                                    Sum of           Mean

Source                   DF        Squares         Square    F Value    Pr > F

Model                     3       16.20684        5.40228      13.27    0.0308

Error                     3        1.22173        0.40724

Corrected Total           6       17.42857

             Root MSE              0.63816    R-Square     0.9299

             Dependent Mean        3.28571    Adj R-Sq     0.8598

             Coeff Var            19.42214

                             Parameter Estimates

                          Parameter       Standard

     Variable     DF       Estimate          Error    t Value    Pr > |t|

     Intercept     1       -1.32426        1.28774      -1.03      0.3794

     q1            1        0.42975        0.26233       1.64      0.1999

     q2            1        0.63101        0.25027       2.52      0.0861

     q3            1        0.31498        0.25570       1.23      0.3058

 

*---그룹 비교;

* Chi-square;

proc freq data=temp;

     tables workshop*gender/chisq;

run;

FREQ 프로시저

                 workshop * gender 테이블

          workshop     gender

                 빈도|

               백분율|

            행 백분율|

          칼럼 백분율|f       |m       |   총합

          -----------+--------+--------+

                   1 |      2 |      2 |      4

                     |  25.00 |  25.00 |  50.00

                     |  50.00 |  50.00 |

                     |  50.00 |  50.00 |

          -----------+--------+--------+

                   2 |      2 |      2 |      4

                     |  25.00 |  25.00 |  50.00

                     |  50.00 |  50.00 |

                     |  50.00 |  50.00 |

          -----------+--------+--------+

          총합              4        4        8

                        50.00    50.00   100.00

          workshop * gender 테이블에 대한 통계량

통계량                        자유도          값      확률

----------------------------------------------------------

카이제곱                           1      0.0000    1.0000

우도비 카이제곱                    1      0.0000    1.0000

연속성 수정 카이제곱               1      0.0000    1.0000

Mantel-Haenszel 카이제곱           1      0.0000    1.0000

파이 계수                                 0.0000

우발성 계수                               0.0000

크래머의 V                                0.0000

 

* 독립 samples t-test;

proc ttest data=temp;

     class gender;

     var q4;

run;

The TTEST Procedure

                                          Statistics

                             Lower CL          Upper CL  Lower CL           Upper CL

Variable  gender          N      Mean    Mean      Mean   Std Dev  Std Dev   Std Dev  Std Err

q4        f               4    0.1626       2    3.8374    0.6541   1.1547    4.3054   0.5774

q4        m               4    3.5813     4.5    5.4187    0.3271   0.5774    2.1527   0.2887

q4        Diff (1-2)           -4.079    -2.5    -0.921    0.5882   0.9129    2.0102   0.6455

                                            T-Tests

             Variable    Method           Variances      DF    t Value    Pr > |t|

             q4          Pooled           Equal           6      -3.87      0.0082

             q4          Satterthwaite    Unequal      4.41      -3.87      0.0149

 

                                     Equality of Variances

                 Variable    Method      Num DF    Den DF    F Value    Pr > F

                 q4          Folded F         3         3       4.00    0.2848

 

* Wilcoxon/Mann-Whitney test 을 이용한 위 예제의 비모수 버전;

proc npar1way data=temp;

     class gender;

     var q4;

run;

The NPAR1WAY Procedure

        Kolmogorov-Smirnov Test for Variable q4

             Classified by Variable gender

                        EDF at    Deviation from Mean

  gender       N       Maximum        at Maximum

  ---------------------------------------------------

  f            4          1.00               1.0

  m            4          0.00              -1.0

  Total        8          0.50

      Maximum Deviation Occurred at Observation 4

             Value of q4 at Maximum = 3.0

    Kolmogorov-Smirnov Two-Sample Test (Asymptotic)

          KS   0.500000    D         1.000000

          KSa  1.414214    Pr > KSa  0.0366

 

         Cramer-von Mises Test for Variable q4

             Classified by Variable gender

                               Summed Deviation

         gender         N          from Mean

         --------------------------------------

         f              4             0.3750

         m              4             0.3750

 

        Cramer-von Mises Statistics (Asymptotic)

             CM  0.093750    CMa  0.750000

 

              Kuiper Test for Variable q4

             Classified by Variable gender

                                  Deviation

             gender        N      from Mean

             ------------------------------

             f             4            1.0

             m             4            0.0

 

          Kuiper Two-Sample Test (Asymptotic)

     K  1.000000    Ka  1.414214    Pr > Ka  0.2564

 

* Paired samples t-test (silly one);

proc ttest data=temp;

     paired q1*q2;

run;

The TTEST Procedure

                                          Statistics

                     Lower CL            Upper CL   Lower CL             Upper CL

Difference       N       Mean     Mean       Mean    Std Dev   Std Dev    Std Dev   Std Err

q1 - q2          8     -0.499      0.5     1.4992     0.7903    1.1952     2.4326    0.4226

 

                                           T-Tests

                          Difference      DF    t Value    Pr > |t|

                          q1 - q2          7       1.18      0.2753

 

* 부호 순위 검정을 이용한 위 예제의 비모수 버전.;

proc univariate data=temp;

     var myDiff;

run;

UNIVARIATE 프로시저

                        변수:  myDiff

                            적률

   N                    8    가중합                      8

   평균              -0.5    관측치 합                  -4

   표준편차    1.19522861    분산               1.42857143

   왜도                 0    첨도                   -1.456

   제곱합              12    수정 제곱합                10

   변동계수    -239.04572    평균의 표준오차    0.42257713

 

                        기본 통계 측도

            위치측도                  변이측도

        평균     -0.50000     표준편차         1.19523

        중위수   -0.50000     분산             1.42857

        최빈값   -2.00000     범위             3.00000

                              사분위 범위      2.00000

 

* Oneway Analysis of Variance (ANOVA);

proc glm data=temp;

     class workshop;

     model q4=workshop;

     means workshop gender / tukey;

run;

The GLM Procedure

Dependent Variable: q4

                                              Sum of

      Source                      DF         Squares     Mean Square    F Value    Pr > F

      Model                        1      0.50000000      0.50000000       0.18    0.6891

      Error                        6     17.00000000      2.83333333

      Corrected Total              7     17.50000000

 

                       R-Square     Coeff Var      Root MSE       q4 Mean

                       0.028571      51.79233      1.683251      3.250000

 

      Source                      DF       Type I SS     Mean Square    F Value    Pr > F

      workshop                     1      0.50000000      0.50000000       0.18    0.6891

 

      Source                      DF     Type III SS     Mean Square    F Value    Pr > F

      workshop                     1      0.50000000      0.50000000       0.18    0.6891

 

* Kruskal-Wallis test 을 이용한 위 예제의 비모수 버전;

proc npar1way data=temp;

     class workshop;

     var q4;

run;

The NPAR1WAY Procedure

 

                Analysis of Variance for Variable q4

                  Classified by Variable workshop

 

                workshop          N             Mean

                ------------------------------------

                1                 4             3.00

                2                 4             3.50

 

 

Source    DF    Sum of Squares    Mean Square     F Value    Pr > F

-------------------------------------------------------------------

Among      1              0.50       0.500000      0.1765    0.6891

Within     6             17.00       2.833333

 

                Average scores were used for ties.

 

Wilcoxon Scores (Rank Sums) for Variable q4

                    Classified by Variable workshop

 

                        Sum of      Expected       Std Dev          Mean

workshop       N        Scores      Under H0      Under H0         Score

------------------------------------------------------------------------

1              4          16.0          18.0      3.380617           4.0

2              4          20.0          18.0      3.380617           5.0

 

                   Average scores were used for ties.

 

                       Wilcoxon Two-Sample Test

                     Statistic             16.0000

 

                     Normal Approximation

                     Z                     -0.4437

                     One-Sided Pr <  Z      0.3286

                     Two-Sided Pr > |Z|     0.6573

 

                     t Approximation

                     One-Sided Pr <  Z      0.3353

                     Two-Sided Pr > |Z|     0.6706

 

               Z includes a continuity correction of 0.5.

                          Kruskal-Wallis Test

                     Chi-Square             0.3500

                     DF                          1

                     Pr > Chi-Square        0.5541

 



2. SPSS

MAIN

 

* SPSS Program of Basic Statistical Tests.

 

GET FILE='C:\mydata.sav'.

DATASET NAME DataSet2 WINDOW=FRONT.

 

* Descriptive stats in compact form.

DESCRIPTIVES VARIABLES=q1 q2 q3 q4

  /STATISTICS=MEAN STDDEV VARIANCE MIN MAX SEMEAN .

 

* Descriptive stats of every sort.

EXAMINE VARIABLES=q1 q2 q3 q4

  /PLOT BOXPLOT STEMLEAF NPPLOT

  /COMPARE GROUP

  /STATISTICS DESCRIPTIVES EXTREME

  /CINTERVAL 95

  /MISSING PAIRWISE

  /NOTOTAL.

EXECUTE.

 

* Frequencies and percents.

FREQUENCIES VARIABLES=workshop gender q1 q2 q3 q4

  /ORDER=  ANALYSIS .

EXECUTE.

 

*---Measures of association.

 

* Person correlations.

CORRELATIONS

  /VARIABLES=q1 q2 q3 q4

  /PRINT=TWOTAIL NOSIG

  /MISSING=PAIRWISE .

EXECUTE.

 

* Spearman correlations.

NONPAR CORR

  /VARIABLES=q1 q2 q3 q4

  /PRINT=SPEARMAN TWOTAIL NOSIG

  /MISSING=PAIRWISE .

EXECUTE.

 

* Linear regression.

REGRESSION

  /MISSING LISTWISE

  /STATISTICS COEFF OUTS R ANOVA

  /CRITERIA=PIN(.05) POUT(.10)

  /NOORIGIN

  /DEPENDENT q4

  /METHOD=ENTER q1 q2 q3  .

EXECUTE.

 

*---Group comparisons;

 

* Chisquare.

CROSSTABS

  /TABLES=workshop  BY gender

  /FORMAT= AVALUE TABLES

  /STATISTIC=CHISQ

  /CELLS= COUNT ROW

  /COUNT ROUND CELL .

EXECUTE.

 

* Independent samples t-test.

T-TEST

  GROUPS = gender('m' 'f')

  /MISSING = ANALYSIS

  /VARIABLES = q4

  /CRITERIA = CI(.95) .

EXECUTE.

 

* Nonparametric version of above using

  Wilcoxon/Mann-Whitney test.

* SPSS requires a numeric form of gender for this procedure.

AUTORECODE

  VARIABLES=gender  /INTO genderN

  /PRINT.

NPAR TESTS

  /M-W= q4   BY genderN(1 2)

  /MISSING ANALYSIS.

EXECUTE.

 

* Paired samples t-test.

T-TEST

  PAIRS = q1  WITH q2 (PAIRED)

  /CRITERIA = CI(.95)

  /MISSING = ANALYSIS.

EXECUTE.

 

* Nonparametric version of above using

  Wilcoxon Signed Rank test.

NPAR TEST

  /WILCOXON=q1  WITH q2 (PAIRED)

  /MISSING ANALYSIS.

EXECUTE.

 

* Oneway analysis of variance (ANOVA).

UNIANOVA q4  BY workshop

  /METHOD = SSTYPE(3)

  /INTERCEPT = INCLUDE

  /POSTHOC = workshop ( TUKEY )

  /PRINT = ETASQ HOMOGENEITY

  /CRITERIA = ALPHA(.05)

  /DESIGN = workshop .

 

* Nonparametric version of above using

  Kruskal Wallis test.

NPAR TESTS

  /K-W=q4   BY workshop(1 3)

  /MISSING ANALYSIS.

EXECUTE.

 


3. R-PROJECT

MAIN

*

mydata<-read.table ("c:/data/mydata.csv",header=TRUE,

  sep=",",row.names="id")

print(mydata)

 

# 실행전 Hmisc prettyR 인스톨.

library(foreign)

library(Hmisc)

install.packages("prettyR")

library(prettyR)

 

# 기술 통계와 빈도;

summary(mydata)

workshop   gender       q1             q2             q3              q4     

 Min.   :1.0   f:4    Min.   :1.00   Min.   :1.00   Min.   :2.000   Min.   :1.00 

 1st Qu.:1.0   m:4    1st Qu.:2.00   1st Qu.:1.00   1st Qu.:4.000   1st Qu.:2.50 

 Median :1.5          Median :3.50   Median :2.50   Median :4.000   Median :3.50 

 Mean   :1.5          Mean   :3.25   Mean   :2.75   Mean   :4.143   Mean   :3.25 

 3rd Qu.:2.0          3rd Qu.:4.25   3rd Qu.:4.25   3rd Qu.:5.000   3rd Qu.:4.25 

 Max.   :2.0          Max.   :5.00   Max.   :5.00   Max.   :5.000   Max.   :5.00

                                                    NA's   :1.000 

 

# Hmisc 패키지의 describe함수를 이용하여 평균,빈도,백분율 계산;

describe(mydata)

Description of mydata

 

Numeric

               mean    median       var        sd   valid.n

workshop        1.5       1.5    0.2857    0.5345         8

q1             3.25       3.5     2.214     1.488         8

q2             2.75       2.5     3.071     1.753         8

q3            4.143         4     1.143     1.069         7

q4             3.25       3.5       2.5     1.581         8

 

Factor

               f       m

gender         4       4

mode = >1 mode  Valid n = 8 

 

# prettyR 패키지로부터 freq함수를 이용한 빈도, 백분율 계산.

freq(mydata)

Frequencies for workshop

        1    2

        4    4

%      50   50

 

Frequencies for gender

        f    m

        4    4

%      50   50

 

Frequencies for q1

        1    2    3    4    5

        1    2    1    2    2

%    12.5   25 12.5   25   25

 

Frequencies for q2

        1    2    3    4    5

        3    1    1    1    2

%    37.5 12.5 12.5 12.5   25

 

Frequencies for q3

        2    4    5   NA

        1    3    3    1

%    12.5 37.5 37.5 12.5

%!NA 14.3 42.9 42.9

 

Frequencies for q4

        1    3    4    5

        2    2    2    2

%      25   25   25   25

 

#---연관성 측도.

 

# 피어슨 상관계수

# Hmisc 패키지의 rcorr함수는 SAS SPSS 비슷한 결과 산출. 상관계수,n,p-value 산출.

rcorr( cbind(mydata$q1,mydata$q2,mydata$q3,mydata$q4) )

[,1]  [,2]  [,3]  [,4]

[1,]  1.00  0.74 -0.12  0.88

[2,]  0.74  1.00 -0.27  0.85

[3,] -0.12 -0.27  1.00 -0.03

[4,]  0.88  0.85 -0.03  1.00

 

n

     [,1] [,2] [,3] [,4]

[1,]    8    8    7    8

[2,]    8    8    7    8

[3,]    7    7    7    7

[4,]    8    8    7    8

 

P

     [,1]   [,2]   [,3]   [,4] 

[1,]        0.0360 0.7894 0.0039

[2,] 0.0360        0.5581 0.0074

[3,] 0.7894 0.5581        0.9556

[4,] 0.0039 0.0074 0.9556

 

# cor function 기본 제공 함수지만, 유의성 검증 결과를 제공하지 않는다.

cor(data.frame(mydata$q1,mydata$q2,mydata$q3,mydata$q4),method="pearson",use="pairwise")

mydata.q1  mydata.q2   mydata.q3   mydata.q4

mydata.q1  1.0000000  0.7395179 -0.12500000  0.88040627

mydata.q2  0.7395179  1.0000000 -0.27003086  0.85063978

mydata.q3 -0.1250000 -0.2700309  1.00000000 -0.02613542

mydata.q4  0.8804063  0.8506398 -0.02613542  1.00000000

 

# cor.test함수는 상관계수, p-value, 신뢰구간을 제공하지만, 단지 2개의 변수만 검증.

cor.test(mydata$q1,mydata$q2,use="pairwise")

Pearson's product-moment correlation

 

data:  mydata$q1 and mydata$q2

t = 2.691, df = 6, p-value = 0.036

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

 0.0727632 0.9494271

sample estimates:

      cor

0.7395179

 

# Hmisc패키지의 rcorr함수를 이용하여 스피어만 상관계수 산출.

rcorr( cbind(mydata$q1,mydata$q2,mydata$q3,mydata$q4), type='spearman')

[,1]  [,2]  [,3] [,4]

[1,]  1.00  0.70 -0.04 0.87

[2,]  0.70  1.00 -0.08 0.88

[3,] -0.04 -0.08  1.00 0.26

[4,]  0.87  0.88  0.26 1.00

 

n

     [,1] [,2] [,3] [,4]

[1,]    8    8    7    8

[2,]    8    8    7    8

[3,]    7    7    7    7

[4,]    8    8    7    8

 

P

     [,1]   [,2]   [,3]   [,4] 

[1,]        0.0532 0.9327 0.0050

[2,] 0.0532        0.8670 0.0039

[3,] 0.9327 0.8670        0.5768

[4,] 0.0050 0.0039 0.5768

 

# 선형 회귀분석.

myRegModel<-lm(q4~q1+q2+q3,data=mydata)

summary(myRegModel)

Call:

lm(formula = q4 ~ q1 + q2 + q3, data = mydata)

 

Residuals:

       1        2        3        5        6        7        8

-0.31139 -0.42616  0.94283 -0.17975  0.07658  0.02257 -0.12468

 

Coefficients:

            Estimate Std. Error t value Pr(>|t|) 

(Intercept)  -1.3243     1.2877  -1.028    0.379 

q1            0.4297     0.2623   1.638    0.200 

q2            0.6310     0.2503   2.521    0.086 .

q3            0.3150     0.2557   1.232    0.306 

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Residual standard error: 0.6382 on 3 degrees of freedom

  (1 observation deleted due to missingness)

Multiple R-Squared: 0.9299,     Adjusted R-squared: 0.8598

F-statistic: 13.27 on 3 and 3 DF,  p-value: 0.03084

 

* 분산분석

anova(myRegModel)

Analysis of Variance Table

 

Response: q4

          Df  Sum Sq Mean Sq F value  Pr(>F) 

q1         1 13.4934 13.4934 33.1335 0.01042 *

q2         1  2.0955  2.0955  5.1456 0.10809 

q3         1  0.6180  0.6180  1.5174 0.30576 

Residuals  3  1.2217  0.4072                 

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

plot(myRegModel)

termplot(myRegModel)

#---그룹 비교.

 

# prettyR 패키지의 xtab 함수를 이용하여 교차분석,chi-square 계산.

xtab(~workshop+gender,data=mydata,chisq=TRUE)

Crosstabulation of workshop by gender

         gender

workshop        f        m

1               2        2        4

               50       50       50

               50       50

 

2               2        2        4

               50       50       50

               50       50

 

                4        4        8

               50       50

X2[1] = 0.5, p = 0.4795001

 

Warning message:

카이 자승 근사는 부정확할지도 모릅니다 in: chisq.test(x$counts, ...)

 

# R 기본함수인 table 이용하여 교차분석, chi-square 계산.

myWG<-table(mydata$workshop,mydata$gender)

print(myWG)

chisq.test(myWG)

Pearson's Chi-squared test with Yates' continuity correction

 

data:  myWG

X-squared = 0.5, df = 1, p-value = 0.4795

 

Warning message:

카이 자승 근사는 부정확할지도 모릅니다 in: chisq.test(myWG)

 

# Independent samples t-test.

t.test(mydata$q4[mydata$gender=='m'],mydata$q4[mydata$gender=='f'] )

Welch Two Sample t-test

 

data:  mydata$q4[mydata$gender == "m"] and mydata$q4[mydata$gender == "f"]

t = 3.873, df = 4.412, p-value = 0.01491

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 0.7718919 4.2281081

sample estimates:

mean of x mean of y

      4.5       2.0

 

# Wilcoxon/Mann-Whitney test 이용한 예제의 비모수 버전.

wilcox.test(mydata$q4[mydata$gender=='m'],mydata$q4[mydata$gender=='f'] )

Wilcoxon rank sum test with continuity correction

 

data:  mydata$q4[mydata$gender == "m"] and mydata$q4[mydata$gender == "f"]

W = 16, p-value = 0.02652

alternative hypothesis: true location shift is not equal to 0

 

Warning message:

ties가 있기때문에, 정확한 p값을 돌려줍니다 in: wilcox.test.default(mydata$q4[mydata$gender == "m"], mydata$ gender=='f'] )

 

# Paired samples t-test (silly one).

t.test(mydata$q1,mydata$q2,paired=TRUE)

Paired t-test

 

data:  mydata$q1 and mydata$q2

t = 1.1832, df = 7, p-value = 0.2753

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -0.4992361  1.4992361

sample estimates:

mean of the differences

                    0.5

 

# 위 예제의 비모수 버전.

wilcox.test(mydata$q1,mydata$q2,paired=TRUE)

Wilcoxon signed rank test with continuity correction

 

data:  mydata$q1 and mydata$q2

V = 16, p-value = 0.2795

alternative hypothesis: true location shift is not equal to 0

 

Warning messages:

1: ties가 있기때문에, 정확한 p값을 돌려줍니다 in: wilcox.test.default(mydata$q1, mydata$q2, paired = TRUE)

2: zeroes 값 때문에 수가 없습니다, 정확한 p 값을 계산할 수가 없습니다 in: wilcox.test.default(mydata$q1, mydata$q2, paired = TRUE)

 

# Oneway Analysis of Variance (ANOVA).

myModel<-aov(q4~workshop,data=mydata)

summary(myModel)

             Df  Sum Sq Mean Sq F value Pr(>F)

workshop     1  0.5000  0.5000  0.1765  0.689

Residuals    6 17.0000  2.8333

 

anova(myModel)

Analysis of Variance Table

 

Response: q4

          Df  Sum Sq Mean Sq F value Pr(>F)

workshop   1  0.5000  0.5000  0.1765  0.689

Residuals  6 17.0000  2.8333

 

plot(myModel)

termplot(myModel)

# Kruskal-Wallis test 이용한 예제의 비모수 일원분산분석.(Nonparametric oneway ANOVA)

kruskal.test(mydata$q4,mydata$workshop)

Kruskal-Wallis rank sum test

 

data:  mydata$q4 and mydata$workshop

Kruskal-Wallis chi-squared = 0.35, df = 1, p-value = 0.5541

 

 


4. S-PLUS

MAIN

 

mydata<-read.table ("c:/data/mydata.csv",header=TRUE,

  sep=",",row.names="id")

print(mydata)

 

# 기술 통계와 빈도.

summary(mydata)

 workshop   gender       q1             q2             q3              q4     

 Min.   :1.0   f:4    Min.   :1.00   Min.   :1.00   Min.   :2.000   Min.   :1.00 

 1st Qu.:1.0   m:4    1st Qu.:2.00   1st Qu.:1.00   1st Qu.:4.000   1st Qu.:2.50 

 Median :1.5          Median :3.50   Median :2.50   Median :4.000   Median :3.50 

 Mean   :1.5          Mean   :3.25   Mean   :2.75   Mean   :4.143   Mean   :3.25 

 3rd Qu.:2.0          3rd Qu.:4.25   3rd Qu.:4.25   3rd Qu.:5.000   3rd Qu.:4.25 

 Max.   :2.0          Max.   :5.00   Max.   :5.00   Max.   :5.000   Max.   :5.00

                                                    NA's   :1.000 

 

#---연관성 측도.

# 피어슨 상관계수

# cor function 기본 제공 함수지만, 유의성 검증 결과를 제공하지 않는다.

cor(cbind(mydata$q1,mydata$q2,mydata$q3,mydata$q4),na.method="available")

          [,1]       [,2]        [,3]        [,4]

[1,]  1.0000000  0.7395179 -0.13470398  0.88040627

[2,]  0.7395179  1.0000000 -0.26687250  0.85063978

[3,] -0.1347040 -0.2668725  1.00000000 -0.02817181

[4,]  0.8804063  0.8506398 -0.02817181  1.00000000

 

# cor.test함수는 상관계수, p-value, 신뢰구간을 제공하지만, 단지 2개의 변수만 검증.

cor.test(mydata$q1,mydata$q2,use="pairwise")

Pearson's product-moment correlation

 

data:  mydata$q1 and mydata$q2

t = 2.691, df = 6, p-value = 0.036

alternative hypothesis:  coef is not equal to 0

sample estimates:

       cor

 0.7395179

 

# 선형 회귀분석.

myRegModel<-lm(q4~q1+q2+q3,data=mydata,na.action=na.exclude)

summary(myRegModel)

Call: lm(formula = q4 ~ q1 + q2 + q3, data = mydata, na.action = na.exclude)

Residuals:

       1       2      3       5       6       7       8

 -0.3114 -0.4262 0.9428 -0.1797 0.07658 0.02257 -0.1247

 

Coefficients:

              Value Std. Error t value Pr(>|t|)

(Intercept) -1.3243  1.2877    -1.0284  0.3794

         q1  0.4297  0.2623     1.6382  0.1999

         q2  0.6310  0.2503     2.5214  0.0861

         q3  0.3150  0.2557     1.2318  0.3058

 

Residual standard error: 0.6382 on 3 degrees of freedom

Multiple R-Squared: 0.9299

F-statistic: 13.27 on 3 and 3 degrees of freedom, the p-value is 0.03084

1 observations deleted due to missing values

 

Correlation of Coefficients:

   (Intercept)      q1      q2

q1 -0.0969                   

q2 -0.2887     -0.7813       

q3 -0.8895     -0.1422  0.2779

 

* 분산분석

anova(myRegModel)

Analysis of Variance Table

 

Response: q4

 

Terms added sequentially (first to last)

          Df Sum of Sq  Mean Sq  F Value     Pr(F)

       q1  1  13.49339 13.49339 33.13347 0.0104181

       q2  1   2.09550  2.09550  5.14557 0.1080859

       q3  1   0.61795  0.61795  1.51741 0.3057616

Residuals  3   1.22173  0.40724

 

plot(myRegModel)

#---그룹 비교.

 crosstabs(~mydata$workshop+mydata$gender)

Call:

crosstabs(formula =  ~ mydata$workshop + mydata$gender)

8 cases in table

+----------+

|N         |

|N/RowTotal|

|N/ColTotal|

|N/Total   |

+----------+

mydata$workshop|mydata$gender

       |f      |m      |RowTotl|

-------+-------+-------+-------+

1      |2      |2      |4      |

       |0.5    |0.5    |0.5    |

       |0.5    |0.5    |       |

       |0.25   |0.25   |       |

-------+-------+-------+-------+

2      |2      |2      |4      |

       |0.5    |0.5    |0.5    |

       |0.5    |0.5    |       |

       |0.25   |0.25   |       |

-------+-------+-------+-------+

ColTotl|4      |4      |8      |

       |0.5    |0.5    |       |

-------+-------+-------+-------+

Test for independence of all factors

        Chi^2 = 0 d.f.= 1 (p=1)

        Yates' correction not used

        Some expected values are less than 5, don't trust stated p-value

 

# R 기본함수인 table 이용하여 교차분석, chi-square 계산.

myWG<-table(mydata$workshop,mydata$gender)

print(myWG)

f m

1 2 2

2 2 2

 

chisq.test(myWG)

Pearson's chi-square test with Yates' continuity correction

 

data:  myWG

X-square = 0.5, df = 1, p-value = 0.4795

 

Warning messages:

    Expected counts < 5. Chi-square approximation may not be appropriate. in:

        chisq.test(myWG)

 

# Independent samples t-test.

t.test(mydata$q4[mydata$gender=='m'],mydata$q4[mydata$gender=='f'] )

        Standard Two-Sample t-Test

 

data:  mydata$q4[mydata$gender == "m"] and mydata$q4[mydata$gender == "f"]

t = 3.873, df = 6, p-value = 0.0082

alternative hypothesis:  difference in means is not equal to 0

95 percent confidence interval:

 0.9205252 4.0794748

sample estimates:

 mean of x mean of y

       4.5         2

 

# Wilcoxon/Mann-Whitney test 이용한 예제의 비모수 버전.

wilcox.test(mydata$q4[mydata$gender=='m'],mydata$q4[mydata$gender=='f'] )

Wilcoxon rank-sum test

 

data:  mydata$q4[mydata$gender == "m"] and mydata$q4[mydata$gender == "f"]

rank-sum normal statistic with correction Z = 2.2185, p-value = 0.0265

alternative hypothesis:  mu is not equal to 0

 

Warning messages:

    cannot compute exact p-value with ties in: wil.rank.sum(x, y, alternative,

        exact, correct)

 

# Paired samples t-test (silly one).

t.test(mydata$q1,mydata$q2,paired=TRUE)

        Paired t-Test

 

data:  mydata$q1 and mydata$q2

t = 1.1832, df = 7, p-value = 0.2753

alternative hypothesis:  mean of differences is not equal to 0

95 percent confidence interval:

 -0.4992361  1.4992361

sample estimates:

 mean of x - y

           0.5

 

# 위 예제의 비모수 버전.

wilcox.test(mydata$q1,mydata$q2,paired=TRUE)

        Wilcoxon signed-rank test

 

data:  mydata$q1 and mydata$q2

signed-rank normal statistic with correction Z = 1.0064, p-value = 0.3142

alternative hypothesis:  mu is not equal to 0

 

Warning messages:

  1: cannot compute exact p-value with ties in: wil.sign.rank(dff, alternative,

        exact, correct)

  2: cannot compute exact p-value for zero differences in: wil.sign.rank(dff,

        alternative, exact, correct)

 

# Oneway Analysis of Variance (ANOVA).

myModel<-aov(q4~workshop,data=mydata)

summary(myModel)

Df Sum of Sq  Mean Sq   F Value     Pr(F)

 workshop  1       0.5 0.500000 0.1764706 0.6890522

Residuals  6      17.0 2.833333

 

anova(myModel)

Analysis of Variance Table

 

Response: q4

 

Terms added sequentially (first to last)

          Df Sum of Sq  Mean Sq   F Value     Pr(F)

 workshop  1       0.5 0.500000 0.1764706 0.6890522

Residuals  6      17.0 2.833333

 

plot(myModel)

# Kruskal-Wallis test 이용한 예제의 비모수 일원분산분석.(Nonparametric oneway ANOVA)

kruskal.test(mydata$q4,mydata$workshop)

        Kruskal-Wallis rank sum test

 

data:  mydata$q4 and mydata$workshop

Kruskal-Wallis chi-square = 0.35, df = 1, p-value = 0.5541

alternative hypothesis: two.sided

 

 


5. PROC SQL

MAIN