Title: | Generation of Perturbation Tables for the Cell-Key Method |
---|---|
Description: | Tabular data from statistical institutes and agencies are mostly confidential and must be protected prior to publications. The cell-key method is a post-tabular Statistical Disclosure Control perturbation technique that adds random noise to tabular data. The statistical properties of the perturbations are defined by some noise probability distributions - also referred to as perturbation tables. This tool can be used to create the perturbation tables based on a maximum entropy approach as described for example in Giessing (2016) <doi:10.1007/978-3-319-45381-1_18>. The perturbation tables created can finally be used to apply a cell-key method to frequency count or magnitude tables. |
Authors: | Tobias Enderle [aut, cre] |
Maintainer: | Tobias Enderle <[email protected]> |
License: | EUPL |
Version: | 1.0.0.9000 |
Built: | 2025-02-20 04:15:12 UTC |
Source: | https://github.com/sdctools/ptable |
ptable makes it easy to create perturbation tables that can be used for applying noise to statistical tables with any cell-key method approach - among others either the cellKey()-package or the standalone tool TauArgus.
The package provides four main functions to create the perturbation tables:
create_ptable()
: generic function that creates a ptable, either for
frequency count or magnitude tables with a various set of options.
create_cnt_ptable()
: creates a ptable suitable for frequency count
tables.
create_num_ptable()
: creates a ptable suitable for magnitude tables
(i.e. with numerical variables).
modify_cnt_ptable()
: modifies the ptable for a higher level of protection
create_ptable( D, V, js = 0, pstay = NULL, optim = 1, mono = TRUE, step = 1, icat = NULL, table = "cnts", type = "all", label = paste0("D", D, "V", V * 100), monitoring = FALSE, debugging = FALSE, create = TRUE, params = NULL ) create_cnt_ptable( D, V, js = 0, pstay = NULL, optim = 1, mono = TRUE, label = paste0("D", D, "V", V * 100), monitoring = FALSE, create = TRUE ) create_num_ptable( D, V, pstay = NULL, optim = 1, mono = TRUE, step = 2, icat = NULL, type = "all", label = paste0("D", D, "V", V * 100), monitoring = FALSE, create = TRUE )
create_ptable( D, V, js = 0, pstay = NULL, optim = 1, mono = TRUE, step = 1, icat = NULL, table = "cnts", type = "all", label = paste0("D", D, "V", V * 100), monitoring = FALSE, debugging = FALSE, create = TRUE, params = NULL ) create_cnt_ptable( D, V, js = 0, pstay = NULL, optim = 1, mono = TRUE, label = paste0("D", D, "V", V * 100), monitoring = FALSE, create = TRUE ) create_num_ptable( D, V, pstay = NULL, optim = 1, mono = TRUE, step = 2, icat = NULL, type = "all", label = paste0("D", D, "V", V * 100), monitoring = FALSE, create = TRUE )
D |
perturbation parameter for maximum noise (scalar integer) |
V |
perturbation parameter for variance (scalar double) |
js |
threshold value for blocking of small frequencies (i.e. the perturbation will not produce positive cell values that are equal to or smaller than the threshold value). (scalar integer) |
pstay |
optional parameter to set the probability (0 < p < 1) of an original frequency to remain unperturbed: NA (default) no preset probability (i.e. produces the maximum entropy solution) |
optim |
optimization parameter: |
mono |
(logical) vector specifying optimization parameter for monotony condition |
step |
(integer) number of steps for the noise (between two integer
values). Whereas the cell-key approach
for frequency count tables only allows to have noise values that are
integers (
the noise distribution for magnitude values does not have to be integer valued:
The reciprocal of step (='step width') is computed and used internally for the perturbation table. |
icat |
(integer) categorized original frequencies i |
table |
(character) type of the table: frequency count ( |
type |
(character) type indicator for the extra column 'type' used for
magnitude tables: |
label |
(character) label of the Output |
monitoring |
(logical) output monitoring on/off |
debugging |
(logical) debug monitoring on/off |
create |
(logical) scalar specifying to create just the
input parameters of class ptable_params ( |
params |
object of class ptable_params can be used as input instead of the remaining parameters |
The perturbation probabilities are constructed given the following constraints:
Maximum noise
Zero mean (unbiased noise)
Fixed noise variance
Transition probabilities are between zero and one and the sum up to 1
Perturbations will not produce negative cell values or positive cell values equal to or less than a specific threshold value
Returns ptable object including the
created perturbation table by default. If the argument create = FALSE
,
a ptable_params object is returned.
plot()
to analyze the created perturbation table visually
pt_export()
to export the perturbation table for external sdcTools
like TauArgus or SAS.
# create ptable for frequency count tables create_cnt_ptable(D = 3, V = 1.08, js = 1, label = "ptable_frequency_tab") # create ptable for magnitude tables create_num_ptable(D = 5, V = 2, step = 4, icat = c(1, 3, 5)) # create ptable for frequency or magnitude tables create_ptable(D = 3, V = 1.08, js = 1, table="cnts") create_ptable(D = 5, V = 2, step = 4, icat = c(1, 4, 5), table="nums")
# create ptable for frequency count tables create_cnt_ptable(D = 3, V = 1.08, js = 1, label = "ptable_frequency_tab") # create ptable for magnitude tables create_num_ptable(D = 5, V = 2, step = 4, icat = c(1, 3, 5)) # create ptable for frequency or magnitude tables create_ptable(D = 3, V = 1.08, js = 1, table="cnts") create_ptable(D = 5, V = 2, step = 4, icat = c(1, 4, 5), table="nums")
modify_cnt_ptable()
is a function to modify the standard
ptable for count variables that is generated by create_cnt_ptable()
or
within the 'cellKey'-package. The noise intervals in
the standard ptable are ordered from -D to D. A modified ptable still has
the same properties as the standard ptable but can ensure a higher
protection of perturbed frequency tables since the noise probabilities are
split and the intervals are rearranged.
modify_cnt_ptable(input, threshold = 0.2, seed = NULL)
modify_cnt_ptable(input, threshold = 0.2, seed = NULL)
input |
The ptable-object of class 'ptable', 'ck_params' or data.table |
threshold |
The maximum width of the intervals after modification |
seed |
A seed for the rearrangement of the split intervals |
In a first step, the noise probabilities larger than a threshold value will be split. Then, the split noise probabilities are randomly rearranged using a seed (the modifications is replicable). Finally, the intervals of the ptable will be adjusted.
Returns an object of class ptable or a data.table.
Tobias Enderle, [email protected]
# Original ptable ptab <- create_cnt_ptable(3, 1) # modified ptable ptab_mod <- modify_cnt_ptable(ptab, 0.3, seed = 5467) ptab_mod@pTable
# Original ptable ptab <- create_cnt_ptable(3, 1) # modified ptable ptab_mod <- modify_cnt_ptable(ptab, 0.3, seed = 5467) ptab_mod@pTable
plot()
makes it easy to visualize the results of the created
ptable-object that has been created by create_cnt_ptable()
,
create_cnt_ptable()
or modify_cnt_ptable()
.
plot(obj, type = "d", file = NULL, ...)
plot(obj, type = "d", file = NULL, ...)
obj |
an object of class |
type |
(character) type of graph: distribution "d" (standard), perturbation panel ("p"), transition matrix "t" |
file |
if not |
... |
additional parameters passed to methods |
The selected graph is displayed, but there is no direct return
value. The output could also be assigned to an object to receive an object
of class ggplot
.
Tobias Enderle
# Create a ptable for frequency count tables and modify the intervals ptab <- create_cnt_ptable(D = 3, V = 1.05, js = 1, label = "Example") ptab_mod <- modify_cnt_ptable(ptab, threshold = 0.3, seed = 5432) # Distribution Plot of the Noise plot(ptab_mod, type = "d") # Perturbations Panel of the Noise plot(ptab_mod, type = "p") ## Plot and Save the Transition Matrix plot(ptab_mod, type ="t", file = tempfile("example_tMatrix", fileext = ".pdf"))
# Create a ptable for frequency count tables and modify the intervals ptab <- create_cnt_ptable(D = 3, V = 1.05, js = 1, label = "Example") ptab_mod <- modify_cnt_ptable(ptab, threshold = 0.3, seed = 5432) # Distribution Plot of the Noise plot(ptab_mod, type = "d") # Perturbations Panel of the Noise plot(ptab_mod, type = "p") ## Plot and Save the Transition Matrix plot(ptab_mod, type ="t", file = tempfile("example_tMatrix", fileext = ".pdf"))
pt_check()
checks the constraints of the ptable
pt_check(ptab)
pt_check(ptab)
ptab |
a |
a data.table object
Tobias Enderle, [email protected]
# create ptable ptab1 <- create_cnt_ptable(D = 5, V = 3, js = 2, label = "test2") # check ptable pt_check(ptab1)
# create ptable ptab1 <- create_cnt_ptable(D = 5, V = 3, js = 2, label = "test2") # check ptable pt_check(ptab1)
pt_ex_cnts()
returns a perturbation table object from
create_cnt_ptable()
with some default parameters. This is useful
for quickly creating ptables to demonstrate usage in other tools.
pt_ex_cnts()
pt_ex_cnts()
Returns a ptable object.
ptab <- pt_ex_cnts() plot(ptab, type = "t")
ptab <- pt_ex_cnts() plot(ptab, type = "t")
pt_ex_nums()
returns a perturbation table objects from
create_num_ptable()
with some default parameters. This is useful
for quickly creating ptables to demonstrate usage in other tools.
pt_ex_nums(parity = TRUE, separation = FALSE)
pt_ex_nums(parity = TRUE, separation = FALSE)
parity |
a scalar logical; if |
separation |
a scalar logical; if |
Returns a ptable object if both parity
and separation
are FALSE
, else a named list.
# extra ptable for small cells names(pt_ex_nums(parity = FALSE, separation = TRUE)) # different ptables for even/odd cells names(pt_ex_nums(parity = TRUE, separation = TRUE))
# extra ptable for small cells names(pt_ex_nums(parity = FALSE, separation = TRUE)) # different ptables for even/odd cells names(pt_ex_nums(parity = TRUE, separation = TRUE))
Function to export perturbation table to Tau-Argus, SAS or any other CKM tool (as txt-file).
pt_export(..., file, SDCtool = "TauArgus")
pt_export(..., file, SDCtool = "TauArgus")
... |
1 or 2 input object of class |
file |
(character) filename (only 'txt' is possible as file extension) |
SDCtool |
(character) either "TauArgus" or "SAS" |
Returns 'NULL' and the ptable is saved in the specified format.
Tobias Enderle
ptab <- create_cnt_ptable(D = 5, V = 3, js = 2, label = "test") pt_export(ptab, file = tempfile("ptable_example"), SDCtool = "TauArgus")
ptab <- create_cnt_ptable(D = 5, V = 3, js = 2, label = "test") pt_export(ptab, file = tempfile("ptable_example"), SDCtool = "TauArgus")
Function to solve the non-linear optimization problem used
within ptable()
.
pt_optim_entropy( optim = optim, mono = mono, v = v, variance = variance, lb = p_lb, ub = p_ub, ndigits )
pt_optim_entropy( optim = optim, mono = mono, v = v, variance = variance, lb = p_lb, ub = p_ub, ndigits )
optim |
optimization parameter (1=default, 2-4=further test implementations) |
mono |
(logical) monotony parameter |
v |
(integer) vector with perturbation values (i.e. deviations to the original frequency) |
variance |
(numeric) variance parameter |
lb |
(integer) vector with lower bounds of the controls |
ub |
(integer) vector with upper bounds of the controls |
ndigits |
(integer) number of digits |
The main parameter is 'optim': In 'optim=1 to 3' the variance is stated as inequality constraint and in 'optim=4' the variance condition is stated as equality constraint.
The return value contains a list with two elements:
result
"optimal value of the controls
iter
" number of iterations that were executed
Tobias Enderle, Sarah Giessing, Jonas Peter
Giessing, S. (2016), 'Computational Issues in the Design of Transition Probabilities and Disclosure Risk Estimation for Additive Noise'. In: Domingo-Ferrer, J. and Pejic-Bach, M. (Eds.), Privacy in Statistical Databases, pp. 237-251, Springer International Publishing, LNCS, vol. 9867.
Fraser, B. and Wooton, J.: A proposed method for confidentialising tabular output to protect against differencing. In: Monographs of Official Statistics. Work session on Statistical Data Confidentiality, Eurostat-Office for Official Publications of the European Communities, Luxembourg, 2006, pp. 299-302
Starts the package vignette that gets you started with the package
pt_vignette()
pt_vignette()
a browser windows/tab with showing the vignette
pt_vignette()
pt_vignette()
In the ptable
-package there is a shiny app for first
time users and visual-style learners. ptable()
makes it easy to experiment
with different parameter
settings while getting direct feedback by means of graphical plots and
summaries. The different result output tabs are:
Perturbation Table
shows the output used for applying CKM methods.
Constraints Check
can be used to check the main constraints (e.g., zero
mean, fixed variance)
Input Code
could be used for replication of the results (i.e. copy&paste
the code for your R script).
Input Object
shows the input object derived from the parameters a user
set.
Legend
gives an overview of used parameters.
Users can also visually learn how input parameters effect the perturbation table:
Transition Matrix
Distribution
Plot
Perturbation Panel
Plot
ptable()
ptable()
No return value, the dashboard is opened in the default browser.
After usage (e.g. closing the browser tab), interrupt R to stop the application (usually by pressing Ctrl+C or Esc in the console or by using the stop button in RStudio).
Tobias Enderle, [email protected]
See create_cnt_ptable()
to get more help or pt_vignette()
for an
introduction
# Run the dashboard in your default browser ptable()
# Run the dashboard in your default browser ptable()
An S4 class to represent perturbation parameters
D
(integer) parameter for maximum perturbation / noise
V
(numeric) parameter for perturbation variance
js
(integer) parameter for original counts not to be perturbed
ncat
(integer) number of perturbation classes
pstay
numeric vector specifying parameter for non-perturbation
optim
(integer) specifying optimization parameter for optimization function
mono
(logical) vector specifying optimization parameter for monotony condition
label
(character) label for output
icat
(integer) categorized original frequencies i
table
(character) type of table: frequency counts (cnts
) or
magnitude (nums
)
step
(integer) step
type
(character) indicator for the extra column 'type' used for
magnitude tables: 'even'
, 'odd'
or 'all'
An S4 class to represent perturbation table
tMatrix
(matrix) transition matrix with perturbation probabilities
pClasses
(numeric) numeric classes
pTable
(data.table) perturbation table with probabilities
empResults
(data.table) ...
pParams
a ptable_params
object
tStamp
(character) ...
type
(character) type indicator for magnitude tables
table
(character) type of table: frequency counts (cnts) or magnitude (nums)