16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Options for encoding names for data linking at the Australian Bureau of Statistics

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Publicly, ABS has said it would use a cryptographic hash function to convert names collected in the 2016 Census of Population and Housing into an unrecognisable value in a way that is not reversible. In 2016, the ABS engaged the University of Melbourne to provide expert advice on cryptographic hash functions to meet this objective. For complex unit-record level data, including Census data, auxiliary data can be often be used to link individual records, even without names. This is the basis of ABS's existing bronze linking. This means that records can probably be re-identified without the encoded name anyway. Protection against re-identification depends on good processes within ABS. The undertaking on the encoding of names should therefore be considered in the full context of auxiliary data and ABS processes. There are several reasonable interpretations: 1. That the encoding cannot be reversed except with a secret key held by ABS. This is the property achieved by encryption (Option 1), if properly implemented; 2. That the encoding, taken alone without auxiliary data, cannot be reversed to a single value. This is the property achieved by lossy encoding (Option 2), if properly implemented; 3. That the encoding doesn't make re-identification easier, or increase the number of records that can be re-identified, except with a secret key held by ABS. This is the property achieved by HMAC-based linkage key derivation using subsets of attributes (Option 3), if properly implemented. We explain and compare the privacy and accuracy guarantees of five possible approaches. Options 4 and 5 investigate more sophisticated options for future data linking. We also explain how some commonly-advocated techniques can be reversed, and hence should not be used.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: not found
          • Book Chapter: not found

          Calibrating Noise to Sensitivity in Private Data Analysis

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The Algorithmic Foundations of Differential Privacy

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Network Applications of Bloom Filters: A Survey

                Bookmark

                Author and article information

                Journal
                22 February 2018
                Article
                1802.07975
                cb22642a-30a8-4012-8b7b-aeadc96b53da

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                University of Melbourne Research Contract 85449779. After receiving a draft of this report, ABS conducted a further assessment of Options 2 and 3, which will be published on their website
                cs.CR

                Comments

                Comment on this article