PoS - Proceedings of Science
Volume 299 - The 7th International Conference on Computer Engineering and Networks (CENet2017) - Session III -Information Theory
The SNM Algorithm Based on a Variety of Edit Distance and Variable Window
Q. Yang*, Z. Guo and K. Wang
Full text: pdf
Pre-published on: July 17, 2017
Published on: September 06, 2017
Abstract
In order to solve the drawback of SNM algorithm, which leads to inefficiency in detecting precision on approximately duplicate records when multiple sources need to be integrated, the improved SNM algorithm based on the variety of edit distance and the variable window is proposed in this paper. According to the record pattern of edit distance and the size of edit distance, the approximately duplicate records of data are deleted to reduce the number of comparisons. Then we put forward the mechanism of variable window to solve the problem.
Empirical results show that this algorithm can effectively solve the problem that the window is too big or too small to leak. The experiment shows that the improved SNM algorithm can solve many problems in the integration of multi-source, and it has obvious advantages in ensuring precision and boosting efficiency.
DOI: https://doi.org/10.22323/1.299.0063
How to cite

Metadata are provided both in "article" format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in "proceeding" format which is more detailed and complete.

Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.