SPSS macro for removing duplicates
SPSS has a built in GUI function for removing duplicates however it is lacking some of the key options that I like to use (perform case insensitive dedupe, drop duplicates, drop the flag, etc). I built my own macro which is very easy to use and has all the options I typically need.
The following video demonstrates them in use.
Here are the SPSS macros
*///////////////. DEFINE !DDN (DupVar !TOKENS (1)) SORT CASES BY !DupVar (A) . MATCH FILES /FILE = * /BY !DupVar /FIRST = First. var label first "Unique and Duplicates". value label first 0'Duplicates' 1'Unique'. Freq First /FORMAT=DVALUE. select if First=1. match files file= * / drop First. exe. !ENDDEFINE. *///////////////. /* !DDN DupVar=Number. *////SPSS macro to remove duplicates on String variable///////////. DEFINE !DDA (DupVar !TOKENS (1)) String duh (A1000). SORT CASES BY !DupVar (A) . compute duh=Lowcase(!DupVar). MATCH FILES /FILE = * /BY Duh /FIRST = First. var label first "Unique and Duplicates". value label first 0'Duplicates' 1'Unique'. Freq First /FORMAT=DVALUE. select if First=1. match files file= * / drop First duh. exe. !ENDDEFINE. *///////////////. /* !DDA DupVar=email. *///////////////. DEFINE !DDM (Var1 !TOKENS (1) / Var2 !Tokens (1) ) String duh1 duh2 (A1000). Compute duh1=Lowcase(!Var1). Compute duh2=Lowcase(!Var2). SORT CASES BY Duh1 (A) Duh2 (A). MATCH FILES /FILE = * /BY Duh1 Duh2 /FIRST =First. var label first "Unique and Duplicates". value label first 0'Duplicates' 1'Unique'. Freq First /FORMAT=DVALUE. select if First=1. match files file= * / drop First Duh1 Duh2. exe. EXECUTE. !ENDDEFINE. *///////////////. /* !DDM Var1=Corporate Var2=gend. */////SPSS macro to remove duplicates on two string variables//////////. DEFINE !DDA2 (DupVar !TOKENS (1)) SORT CASES BY !DupVar. Compute Unique=1. if Upcase(!DupVar)=Upcase(Lag(!DupVar)) Unique=0. Freq Unique. Select if Unique=1. Match files file=* / DROP Unique. exe. !ENDDEFINE. *///////////////. /*!DDA2 DupVar=email.