Package: sdcMicro 5.8.2

Matthias Templ

sdcMicro: Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.

Authors:Matthias Templ [aut, cre], Bernhard Meindl [aut], Alexander Kowarik [aut], Johannes Gussenbauer [aut], Organisation For Economic Co-Operation And Development [cph], Statistics Netherlands [cph], Pascal Heus [cph]

sdcMicro_5.8.2.tar.gz
sdcMicro_5.8.2.zip(r-4.7)sdcMicro_5.8.2.zip(r-4.6)sdcMicro_5.8.2.zip(r-4.5)
sdcMicro_5.8.2.tgz(r-4.6-x86_64)sdcMicro_5.8.2.tgz(r-4.6-arm64)sdcMicro_5.8.2.tgz(r-4.5-x86_64)sdcMicro_5.8.2.tgz(r-4.5-arm64)
sdcMicro_5.8.2.tar.gz(r-4.7-arm64)sdcMicro_5.8.2.tar.gz(r-4.7-x86_64)sdcMicro_5.8.2.tar.gz(r-4.6-arm64)sdcMicro_5.8.2.tar.gz(r-4.6-x86_64)
sdcMicro_5.8.2.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
sdcMicro/json (API)

# Install 'sdcMicro' in R:
install.packages('sdcMicro', repos = c('https://sdctools.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/sdctools/sdcmicro/issues

Uses libs:
  • c++– GNU Standard C++ Library v3
Datasets:

On CRAN:

Conda:

cpp

10.42 score 92 stars 318 scripts 1.3k downloads 4 mentions 64 exports 158 dependencies

Last updated from:0403340f89. Checks:11 NOTE, 2 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64NOTE211
linux-devel-x86_64NOTE201
source / vignettesOK304
linux-release-arm64NOTE207
linux-release-x86_64NOTE238
macos-release-arm64NOTE208
macos-release-x86_64NOTE606
macos-oldrel-arm64NOTE177
macos-oldrel-x86_64NOTE569
windows-develNOTE304
windows-releaseNOTE308
windows-oldrelNOTE230
wasm-releaseOK174

Exports:addGhostVarsaddNoiseAI_applyAnonymizationAI_createSdcObjargus_microaggregationargus_rankswapcalcRiskscreateDatcreateNewIDcreateSdcObjdataGendRiskdRiskRMDdUtilityextractManipDatafreqfreqCalcgenerateStrataget.sdcMicroObjglobalRecodegroupAndRenameIL_correlIL_variablesimportProblemindivRiskinfoLosskAnonkAnon_violationsldiversityLocalRecProglocalSupplocalSuppressionmafastmeasure_riskmergeHouseholdDatamicroaggregationmicroaggrGowermodRiskmvTopCodingnextSdcObjplot.localSuppressionplotMicropramprintrankSwapreadMicrodatarecordLinkagerecordSwapremoveDirectIDreportriskyCellssdcAppselectHouseholdDataset.sdcMicroObjshowshufflestrataVar<-suda2topBotCodingundolastvalTablevarToFactorvarToNumericwriteSafeFile

Dependencies:abindaskpassbackportsbase64encbbotkbitbit64bootbroombslibcachemcarcarDatacheckmateclassclicliprclustercodetoolscolorspacecommonmarkcowplotcpp11crayoncrosstalkcurldata.tableDEoptimRDerivdigestdoBydplyrDTe1071evaluatefarverfastmapfontawesomeforcatsforecastFormulafracdifffsfuturefuture.applygenericsggplot2globalsgluegtablehavenhighrhmshtmltoolshtmlwidgetshttpuvhttrisobandjquerylibjsonliteknitrlabelinglaekenlaterlatticelazyevallgrlifecyclelistenvlme4lmtestmagrittrMASSMatrixMatrixModelsmatrixStatsmemoisemgcvmicrobenchmarkmimeminqamiraimlbenchmlr3mlr3learnersmlr3measuresmlr3miscmlr3pipelinesmlr3tuningmodelrmoocorenanonextnlmenloptrnnetnumDerivopensslotelpalmerpenguinsparadoxparallellypbkrtestpillarpkgconfigprettydocprettyunitsprogresspromisesproxyPRROCpurrrquantregR6rangerrappdirsrbibutilsRColorBrewerRcppRcppArmadilloRcppEigenRdpackreadrreformulasrhandsontablerlangrmarkdownrobustbaseS7sassscalesshinysourcetoolsspSparseMstringistringrsurvivalsystibbletidyrtidyselecttimeDatetinytextzdburcautf8uuidvcdvctrsVIMviridisLitevroomwithrxfunxgboostxtableyamlzoo

AI-Assisted Statistical Disclosure Control with sdcMicro
Abstract | Introduction | Prerequisites | Quick start | Background | Statistical disclosure control | LLMs in statistical workflows | Provider landscape | Software design | Architecture overview | Privacy by design | Provider-agnostic LLM access | Structured tool calling | Combined utility measure | LLM-assisted variable classification | The AI_createSdcObj() function | Reasoning transparency | Interactive confirmation | LLM-assisted anonymization | The AI_applyAnonymization() function | Agentic loop: batch and refinement | Example session | Adjusting utility weights | Using different LLM providers | Graphical user interface | AI variable suggestion | AI-Assisted anonymization panel | Reproducibility | Discussion | Advantages and limitations | Privacy considerations | Comparison with related work | Summary and outlook | Computational details | References

Last update: 2026-04-22
Started: 2026-03-08

Targeted Record Swapping
Overview | Functionality | Some differences to SAS-Code | Risk definition | Sampling probability | Swapping Records | Application | Supplying index vectors | Similarity profiles | Carry along variables | Supplying your own risk values | Information loss | sdcMicro Objects

Last update: 2024-02-18
Started: 2022-02-22

Using the interactive GUI - sdcApp
Introduction and Main Features | About/Help | Microdata | Upload microdata | Testdata/internal data | R-dataset (.rdata) | SPSS-file (.sav) | SAS-file (.sasb7dat) | CSV-file (.csv, .txt) | STATA-file (.dta) | Additional options | Modify microdata | Display microdata | Explore variables | Reset variables | Use subset of microdata | Convert numeric to factor | Convert variables to numeric | Modify factor variable | Create a stratification variable | Set specific values to NA | Hierarchical data | Anonymize | Set up a problem | Anonymization Methods | View/Analyze existing sdcProblem | Show summary | Explore variables | Add linked variables | Create new IDs | Anonymize categorical variables | Recoding | k-Anonymity | PRAM (simple) | PRAM (expert) | Supress values with high risks | Anonymize numerical variables | Top-/Bottom Coding | Microaggregation | Adding Noise | Rank Swapping | Risk/Utility | Risk measures | Information of risk | Suda2 risk measure | l-Diversity risk measure | Visualizations | Barplot/Mosaicplot | Tabulations | Information loss | Obs violating k-Anon | Numerical risk measures | Compare summary statistics | Disclosure Risk | Information loss | Export Data | Anonymized Data | Anonymization Report | Change Stata Labels | Reproducibility | View/Save the current script | Import a previously saved sdcProblem | Export/Save the current sdcProblem | Undo

Last update: 2019-10-30
Started: 2018-03-23

Readme and manuals

Help Manual

Help pageTopics
addGhostVarsaddGhostVars
Adding noise to perturb dataaddNoise
AI_applyAnonymization: Automatically apply anonymization strategy using LLMAI_applyAnonymization
Create an sdcMicro Object with LLM AssistanceAI_createSdcObj
argus_microaggregationargus_microaggregation
argus_rankswapargus_rankswap
Recompute Risk and Frequencies for a sdcMicroObjcalcRisks
Small Artificial Data setcasc1
Census data setCASCrefmicrodata
Dummy Dataset for Record SwappingcreateDat
Creates new randomized IDscreateNewID
Fast generation of synthetic datadataGen
Distribute number of swapsdistributeDraws_cpp
DistributedistributeRandom_cpp
overal disclosure riskdRisk
RMD based disclosure riskdRiskRMD
Data-Utility measuresdUtility
EIA data setEIA
Remove certain variables from the data set inside a sdc object.extractManipData
data from the casc projectfrancdat
Demo data set from mu-Argusfree1
Freqfreq
Frequencies calculation for risk estimationfreqCalc
Generate one strata variable from multiple factorsgenerateStrata
get.sdcMicroObjget.sdcMicroObj
Global RecodingglobalRecode
Join levels of a variables in an object of class 'sdcMicroObj-class' or 'factor' or 'data.frame'groupAndRename
Additional Information-Loss measuresIL_correl IL_variables print.il_correl print.il_variables
importProblemimportProblem
Individual Risk computationindivRisk
Calculate information loss after targeted record swappinginfoLoss
'kAnon_violations'kAnon_violations kAnon_violations,sdcMicroObj,logical,numeric-method
Local recoding via Edmond's maximum weighted matching algorithmLocalRecProg
Local SuppressionlocalSupp
Local Suppression to obtain k-anonymitykAnon localSuppression
Fast and Simple Microaggregationmafast
Disclosure Risk for Categorical Variablesldiversity measure_risk print.ldiversity print.measure_risk
Replaces the raw household-level data with the anonymized household-level data in the full dataset for anonymization of data with a household structure (or other hierarchical structure). Requires a matching household ID in both files.mergeHouseholdData
Microaggregationmicroaggregation
Microaggregation for numerical and categorical key variables based on a distance similar to the Gower DistancemicroaggrGower
microDatamicroData
Global risk using log-linear models.modRisk
Detection and winsorization of multivariate outliersmvTopCoding
nextSdcObjnextSdcObj
Reorder dataorderData_cpp
Plots for localSuppression objectsplot.localSuppression
Plotfunctions for objects of class sdcMicroObjplot.sdcMicroObj
Comparison plotsplotMicro
Post Randomizationpram
Print method for objects from class freqCalc.print.freqCalc
Print method for objects from class indivRiskprint.indivRisk
Print method for objects from class localSuppressionprint.localSuppression
Print method for objects from class microprint.micro
Print method for objects from class modriskmodrisk print.modrisk
Print method for objects from class pramprint.pram
Print and Extractor Functions for objects of class 'sdcMicroObj-class'print,sdcMicroObj-method print.sdcMicroObj
Print method for objects from class suda2print.suda2
Random SamplingrandSample_cpp
Rank SwappingrankSwap
readMicrodatareadMicrodata
Record linkage via Global Distance-Based Record LinkagerecordLinkage
Targeted Record SwappingrecordSwap recordSwap.default recordSwap.sdcMicroObj
Targeted Record SwappingrecordSwap_cpp
Remove certain variables from the data set inside a sdc object.removeDirectID
Generate an Html-report from an sdcMicroObjreport
riskyCellsriskyCells
Random sample for donor recordssampleDonor_cpp
sdcAppsdcApp
Class '"sdcMicroObj"'createSdcObj sdcMicroObj-class strataVar<- strataVar<-,sdcMicroObj,characterOrNULL-method undolast
Creates a household level file from a dataset with a household structure.selectHouseholdData
set.sdcMicroObjset.sdcMicroObj
Define Swap-LevelssetLevels_cpp
Calculate RisksetRisk_cpp
Showshow,sdcMicroObj-method
Shuffling and EGADPshuffle
subsetMicrodatasubsetMicrodata
Suda2: Detecting Special Uniquessuda2
Summary method for objects from class freqCalcsummary.freqCalc
Summary method for objects from class microsummary.micro
Summary method for objects from class pramsummary.pram
Tarragona data setTarragona
A real-world data set on household income and expenditurestestdata testdata2
Top and Bottom CodingtopBotCoding
Comparison of different microaggregation methodsvalTable
Change the a keyVariable of an object of class 'sdcMicroObj-class' from Numeric to Factor or from Factor to NumericvarToFactor varToNumeric
writeSafeFilewriteSafeFile