{smcl}
{* *! version 1.0 23 October 2020}{...}
{viewerjumpto "Syntax" "bracketfill##syntax"}{...}
{viewerjumpto "Description" "bracketfill##description"}{...}
{viewerjumpto "Options" "bracketfill##options"}{...}
{viewerjumpto "Remarks" "bracketfill##remarks"}{...}
{viewerjumpto "Examples" "bracketfill##examples"}{...}
{viewerjumpto "Author" "bracketfill##author"}{...}
{viewerjumpto "Also see" "bracketfill##alsosee"}{...}
help for {cmd:bracketfill}{right:version 1.0 (23 October 2020)}
{hline}
{title:Title}
{phang}
{bf:bracketfill} {hline 2}
Fill a continuous variable with coarse data from a bracket variable.
{p_end}
{title:Table of contents}
{help bracketfill##syntax:Syntax}
{help bracketfill##description:Description}
{help bracketfill##options:Options}
{help bracketfill##remarks:Remarks}
{help bracketfill##examples:Examples}
{help bracketfill##author:Author}
{help bracketfill##alsosee:Also see}
{marker syntax}
{title:Syntax}
{p 8 12 2}
{cmd:bracketfill}
{newvar}={it:function}
{ifin}
{cmd:,}
{opt brackets(bracketspec [bracketspec] [...])}
[{opt options}]{p_end}
{p 10 14 2}
where valid {it:function}s are:{break}
{input:center}: the class center of a bracket (i.e. {it:lower bound}+({it:upper bound}-{it:lower bound})/2){break}
{input:median}: the median of observations inside a bracket (as calculated by {help summarize}){break}
{input:mean}: the mean of observations inside a bracket (as calculated by {help summarize}){p_end}
{p 10 14 2}
Each {it:bracketspec} consists of two space-separated elements defining a bracket,
the {input:category definition} and the {input:range specification}:{p_end}
{p 12 16 2}
A {input:category definition} is to be a usual Stata {help exp:expression} that defines
a bracket of the categorial variable, such as:{break}
{input:categoricalvar==category}{p_end}
{p 12 16 2}
An {input:range specification} defines the range of a continuous variable that observations from the named category are supposed to originate from.{break}
It is to be defind as a lower bound, the name of the continuous variable, and the upper bound, with one of the {help operators} {input:<} or {input:<=} in between. For instance:{break}
{input:100<=continuousvar<1000}{p_end}
{p 10 14 2}If any of the {input:category definition} or the {input:range specification} should contain space characters, they are to be surrounded by (compound) double quotes.{p_end}
{synoptset 25 tabbed}{...}
{synopthdr}
{synoptline}
{synopt:{opt replace}}replace {newvar}, if existing, with the result{p_end}
{synopt:{opt m:issing(value)}}use {it:value} for observations where the result variable cannot be filled with values (defaults to {it:.} if omitted){p_end}
{synopt:{opt copy:rest(varname)}}copy values for observations not defined in a {it:bracketspec} from {it:varname} to the result variable{p_end}
{synopt:{opt e:xclude(ifexpression)}}exclude observations that match {it:ifexpression} from calculation{p_end}
{synopt:{opt v:erbose}}give verbose output{p_end}
{p2colreset}{...}
{marker description}
{title:Description}
{pstd}
Imagine your dataset contains two variables, a categorical one and a continuous one, about the same content. But for some reason, maybe item non-response in a survey, the continous variable contains missing values for some observations, while the categorical variable contains information for these observations.{p_end}
{pstd}
A typical example from a survey is the question about a person's income. Respondents might not be willing, or not be capable of, answering the question about their exact income (measured continuously). They might, however, give information about their income category, in brackets of 500 Euros, for instance.{p_end}
{pstd}
{cmd:bracketfill} serves the, on first sight, trivial purpose to try filling in the values of the bracketed categorical variable into the continuous one. It does so by picking either the {bf:center} value of a bracket, or the {bf:mode} or {bf:median} of the continuous variable's empirical values, and inserting it into the continuous variable. This operation is the most trivial implementation to gain something that could be analyzed as if full data were present. Naturally, way more elaborate ways to deal with the issue of missing data exist, and are implemented in Stata itself or via user contributed commands.{p_end}
{pstd}
Note that {cmd:bracketfill} does {it:not} aim to replace (multiple) imputation methods, or FIML estimations. These mechanisms are way more suitable for running analyses. {cmd:bracketfill} is more a tool to quickly insert imaginary values into a variable that are somehow close to the original value, for instance as an intermediate step to produce further variables for an analysis dataset. The data {cmd:bracketfill} produces are {it:no} replacement for real data, and are {it:no} replacement for an approximation to real data. The mechanism of inserting a single value for all observations in a bracket can not reflect the variation of the real distribution, and will reduce the variation of the result variable.{p_end}
{marker options}{...}
{title:Options}
{dlgtab:Default}
{phang}
{opt replace} if {newvar} already exists, replace it with the result instead of generating a new variable.
{p_end}
{phang}
{opt missing(value)} use {it:value} as missing value for all observations that are not covered by any {it:bracketspec}. If omitted, {it:value} defaults to the system missing value ("."). Extended missing values are allowed.
{p_end}
{phang}
{opt copyrest(varname)} after populating the result variable, copy values from {it:varname} to the result variable for all observations that have not been covered by any {it:bracketspec} (or out-of-sample observations excluded by {help if} or {help in}).
{p_end}
{phang}
{opt exclude(ifexpression)} exclude all observations that match to {it:ifexpression} from the calculation of the bracket value. The specified {it:ifexpression} has to be a valid Stata {help exp:expression}. Note that using the {opt exclude()} option is not equivalent to excluding observations using the {help if} or {help in} qualifiers. The latter are used to completely exclude observations from generating the new variable; the {opt exclude()} option only excludes observations from calculation behind the scenes, not from populating the target variable.
{p_end}
{phang}
{opt verbose} during calculation, give verbose output on what's happening. This may help in debugging errors.
{p_end}
{marker remarks}
{title:Remarks}
{pstd}
The source code of the program is licensed under the
GNU General Public License version 3 or later.
The corresponding license text can be found on the internet at
{browse "http://www.gnu.org/licenses/"} or in {help gnugpl}.
{p_end}
{marker examples}
{title:Examples}
{phang}Resemble the integrated (household) income variable in NEPS SUFs:{p_end}
{phang}{input}. bracketfill income=median if (t510010<0 | missing(t510010)), ///{break}
brackets( /// {break}
t510012==1 "0<=t510010<500" /// {break}
t510012==2 "500<=t510010<1000" /// {break}
t510012==3 "1000<=t510010<1500" /// {break}
t510013==1 "1500<=t510010<2000" /// {break}
t510013==2 "2000<=t510010<2500" /// {break}
t510013==3 "2500<=t510010<3000" /// {break}
t510014==1 "3000<=t510010<4000" /// {break}
t510014==2 "4000<=t510010<5000" /// {break}
t510014==3 "5000<=t510010<." /// {break}
t510011==1 "0<=t510010<1500" /// {break}
t510011==2 "1500<=t510010<3000" /// {break}
t510011==3 "3000<=t510010<." /// {break}
) /// {break}
verbose copyrest(t510010) missing(.z){p_end}
{marker author}
{title:Author}
{pstd}
Daniel Bela ({browse "mailto:daniel.bela@lifbi.de":daniel.bela@lifbi.de}),
Leibniz Institute for Educational Trajectories (LIfBi), Germany.
{p_end}
{marker alsosee}
{title:Also see}
{psee}
{help NEPSmgmt} (if installed), {help recode}, {help generate}, {help replace}, {help if}
{p_end}