
* This is a do-file for the analysis for B632 lecture 10, March 28 2006
clear
set mem 1000
use “…B632_scientist.dta" [you need to set this address correctly]
* first check the individual variable distributions
tab c4_31_tc
tab c5_3_age
tab c4_1_ide
tab c4_3_env
* Now look at the internationalisom measures
tab c4_7_un_
tab c4_14_co
tab c4_25_io
* Now run the simple model
regress c4_31_tc c4_1_ide c4_3_env c5_3_age
* Now run the mode complex model
regress c4_31_tc c4_1_ide c4_3_env c5_3_age c4_7_un_ c4_14_co c4_25_io
* These model results produce the information needed to conduct the nested-F test
* Prior model run showed all added viariables to produce statistically insignificat
* estimated coeffcients; bu tthe F-Test was significant. Check for multicolinearity
* in the added variables
vif
correlate c4_7_un_ c4_14_co c4_25_io
* It appears that c4_7 and c4_14 are related. So we czn try making on index of those
* two, and drop c4_25)
* possible solutions? Make an index of c4_7, c4_14 and c4_25 (reversed):
generate internat=(c4_7_un_ + c4_14_co)/2
tab internat
* Now run the regression with the new index
regress c4_31_tc c4_1_ide c4_3_env c5_3_age internat
* Now receck the VIF scores
vif
* Some improvement -- max VIF drops from 1.34 to 1.19; and the coefficients are significant
* in a one-way hypoethesis test (p/2).
* Dummy variable analysis
regress c4_1_ide c5_3_age c5_4_gen
* Interaction terms. Does education operate the same way for for US and EU scientists? First run
* a base-case model without the interaction:
regress c5_5a c5_3_age c5_1a us0_eu1, beta
* Next, generate an education by EU interaction term:
gen eu_edu = us0_eu1* c5_1a
tab c5_1a
tab eu_edu us0_eu1
* Now re-run the model with the interaction term included:
regress c5_5a c5_3_age c5_1a us0_eu1 eu_edu