Abstract
. Linear regression is one of the most prevalent techniques in machine learning; however, it is also common to use linear regression for its explanatory capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives t-values — representing the likelihood of each real value to be the true correlation. Using t-values, OLS can release a confidence interval, which is an interval on the reals that is likely to contain the true correlation; and when this interval does not intersect the origin, we can reject the null hypothesis as it is likely that the true correlation is non-zero. Our work aims at achieving similar guarantees on data under differentially private estimators. First, we show that for well-spread data, the Gaussian Johnson-Lindenstrauss Transform (JLT) gives a very good approximation of t-values; secondly, when JLT approximates Ridge regression (linear regression with ℓ2-regularization) we derive, under certain conditions, confidence intervals using the projected data; lastly, we derive, under different conditions, confidence intervals for the “Analyze Gauss” algorithm [14].
Original language | English |
---|---|
Journal | Journal of Privacy and Confidentiality |
Volume | 9 |
Issue number | 1 Special Issue |
DOIs | |
State | Published - 31 Mar 2019 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2019, Cornell University. All rights reserved.
Funding
The bulk of this work was done when the author was a postdoctoral fellow at Harvard University, supported by NSF grant CNS-123723; and also an unpaid collaborator on NSF grant 1565387. The author wishes to wholeheartedly thank Prof. Salil Vadhan, for his tremendous help in shaping this paper. The author would also like to thank Prof. Jelani Nelson and the members of the “Privacy Tools for Sharing Research Data” project at Harvard University (especially James Honaker, Vito D’Orazio, Vishesh Karwa, Prof. Kobbi Nissim and Prof. Gary King) for many helpful discussions and suggestions; as well as Abhradeep Thakurta for clarifying the similarity between our result and general DP-ERM bounds. Lastly the author thanks the anonymous referees for many helpful suggestions in general and for a reference to [41] in particular.
Funders | Funder number |
---|---|
Abhradeep Thakurta | |
National Science Foundation | CNS-123723, 1565387 |
Harvard University |
Keywords
- Differential Privacy
- Ordinary Least Squares
- p-Value
- t-Value