DIFFERENTIALLY PRIVATE ORDINARY LEAST SQUARES

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

. Linear regression is one of the most prevalent techniques in machine learning; however, it is also common to use linear regression for its explanatory capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives t-values — representing the likelihood of each real value to be the true correlation. Using t-values, OLS can release a confidence interval, which is an interval on the reals that is likely to contain the true correlation; and when this interval does not intersect the origin, we can reject the null hypothesis as it is likely that the true correlation is non-zero. Our work aims at achieving similar guarantees on data under differentially private estimators. First, we show that for well-spread data, the Gaussian Johnson-Lindenstrauss Transform (JLT) gives a very good approximation of t-values; secondly, when JLT approximates Ridge regression (linear regression with ℓ2-regularization) we derive, under certain conditions, confidence intervals using the projected data; lastly, we derive, under different conditions, confidence intervals for the “Analyze Gauss” algorithm [14].

Original languageEnglish
JournalJournal of Privacy and Confidentiality
Volume9
Issue number1 Special Issue
DOIs
StatePublished - 31 Mar 2019
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2019, Cornell University. All rights reserved.

Funding

The bulk of this work was done when the author was a postdoctoral fellow at Harvard University, supported by NSF grant CNS-123723; and also an unpaid collaborator on NSF grant 1565387. The author wishes to wholeheartedly thank Prof. Salil Vadhan, for his tremendous help in shaping this paper. The author would also like to thank Prof. Jelani Nelson and the members of the “Privacy Tools for Sharing Research Data” project at Harvard University (especially James Honaker, Vito D’Orazio, Vishesh Karwa, Prof. Kobbi Nissim and Prof. Gary King) for many helpful discussions and suggestions; as well as Abhradeep Thakurta for clarifying the similarity between our result and general DP-ERM bounds. Lastly the author thanks the anonymous referees for many helpful suggestions in general and for a reference to [41] in particular.

FundersFunder number
Abhradeep Thakurta
National Science FoundationCNS-123723, 1565387
Harvard University

    Keywords

    • Differential Privacy
    • Ordinary Least Squares
    • p-Value
    • t-Value

    Fingerprint

    Dive into the research topics of 'DIFFERENTIALLY PRIVATE ORDINARY LEAST SQUARES'. Together they form a unique fingerprint.

    Cite this