In preparing graphs for a book I am working on, I have found a Stata package called ‘combomarginsplot’ very valuable. I wanted to share the experience in case it might help other reseachers. The package was written by Nicholas Winter, and I am indebted to him for this terrific tool.
The book is on how parties assign their MPs to legislative committees. The committees are divided into categories, including “high policy” (things like justice, foreign affairs, and defense), “public goods” (such as health and education), and “distributive” (principally agriculture, transport, and construction). The typology is based on one developed some years ago by Pekkanen, Nyblade, and Krauss (2006).
Covariates that we hypothesize are associated with assignment to a given committee category include gender, occupation, seniority, and various other personal and electoral variables.
One of the ways in which ‘combomarginsplot’ has come in handy is in setting covariates for purposes of simulating how various attributes of a politician are associated with increased or decreased probability that a member might be assigned to a given committee category.
Normally it would be fine to use standard ‘margins’ and ‘marginsplot’ for such purposes. For instance, if for a given party–let’s call it “Likud”–we want to know the odds that a candidate of a given gender is assigned to public policy (PG), we would run the logistic regression for this category, and then do the ‘margins’ command. It might look like this:
*margins, at(female=(0 1) ) level(90)marginsplot, recast(scatter) plotopts(msiz(vsmall) mc(gs4)) ciopts(lw(medthick)) ///xscale(range(0 1)) xsc(r(-.99 1.7)) ylabel(0(.2)1) ysc(r(0 1)) scheme(s1mono) ///aspectratio(3) ytitle(“”) title(“Likud: PG”) name(Likud_pg_fem, replace)*
A formatting note: Note the use of the “recast” option. The standard ‘marginsplot’ can produce some really dreadfully ugly graphs! The “recast(scatter)” option gives you these clean capped bars. (You can also use “recast(bar)”, but I find it less appealing.) The x-axis option, “xsc(r(-.99 1.7))”, is also useful because without doing something like this, the default has the bars right next to the box borders, and lots of white space in between. Obviously, Stata graph commands can be adjusted to user preference. This is mine; yours might vary.
The command above produces a graph like this:
Nice, right? No, not really. Please never accept a “result” like this! Look at the confidence interval for female MP. It goes above 1.00. But this is a logistic regression–the outcome can be, by definition, 1.00 or 0.00. It can’t be 1.2! Just because Stata says so, doesn’t make it so!
Sometimes setting ‘margins’ scenarios a certain way actually leads to the estimate being generated off a hypothetical case that is really unrealistic, given the data. That is, there may not be many real politicians who meeet the criteria. Then–especially if the overall sample is not very large–you can get utterly impossible “predictions”.
Go look at your data, and see what’s going on. In this case, it turns out that the problem is that a very small percentage of men in this party had what we term “high-policy occupations” (mostly lawyers), but a high percentage of the women did. When we run ‘margins’ without specifying values for any other covariates, we get an estimate at the sample means of the other variables on the right-hand side of the regression. So it is estimating men and women with the same likelihhod of also being of high-policy occupation, even though the male and female subsamples are rather different.
What we need is to estimate the men and women in separate ‘margins’ commands, each being more realistic on other covariates. However, we want the estimates for men and women to appear in the same box in our final graph. It won’t do to run separate ‘marginsplot’ commands and then do ‘graph combine’ because that will make two separate boxes in the space of one. So here is how you can make it look like the first graph, despite being based on two separate calls to ‘margins’:
*margins, at(female=1 occu_hi=1) level(90) saving(“File1”, replace)margins, at(female=0 occu_hi=0) level(90) saving(“File2”, replace)combomarginsplot “File2” “File1” , ///recast(scatter) plotopts(msiz(vsmall) mc(gs4)) ciopts(lw(medthick)) ///ylabel(0(.2)1) ysc(r(0 1)) scheme(s1mono) ///labels(0 1) xscale(r(.5 2.5)) xtitle(“Female MP”) ///aspectratio(3) ytitle(“”) title(“Likud: PG”) name(Likud_pg_fem, replace)*
When we do all the above, we get:
We see more plausible confidence intervals, because we are estimating on realistic politicians. There is essentially no difference in this party between the probabilities of men and women getting PG committees (or, more to the point, between women with high-policy occupations and men without them). We already knew from the first example plot above that there was not a significant difference. It was the confidence intervals that went haywire, due to the unrealistic scenario.
A challenging part of of this was getting the bars in the right place within the box. First, one needs to use the ‘labels’ option in order to have them marked “0” and “1” instead of the names of the saved file (e.g., “File1”, although you can name them just about anything you want). With a little–OK, a lot–of trial and error, it turned out that “xscale(r(.5 2.5))” worked about right.
I have found several other convenient uses for ‘combomarginsplot’ in this and other projects. A perhaps more common use than the one I demonstrated here would be when the plotted curves come from different regressions. You can save the results from each, then combine them into a single plot area. Another, which I have used, is combining multiple outcomes from one regression, such as a multinomial logit.
It is terrific that Stata has such a community of public goods providers to create tools like this!