Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each of the small data chunk, a signature is generated based on a combination of a representation of characters that appear in the small data chunk with a representation of frequencies of the small data chunk. A signature is generated based on a combination of a representation of characters that appear. The signature is used to help in selecting the data to be deduplicated. Additional system and computer program product embodiments are disclosed and provide related advantages.
|State||Published - 2017|