تعداد نشریات | 43 |
تعداد شمارهها | 1,675 |
تعداد مقالات | 13,674 |
تعداد مشاهده مقاله | 31,691,095 |
تعداد دریافت فایل اصل مقاله | 12,520,614 |
Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality | ||
Transactions on Combinatorics | ||
مقاله 2، دوره 14، شماره 3، آذر 2025، صفحه 135-156 اصل مقاله (577.5 K) | ||
نوع مقاله: Research Paper | ||
شناسه دیجیتال (DOI): 10.22108/toc.2024.138377.2091 | ||
نویسندگان | ||
Sepideh Aghamolaei* 1؛ Mohammad Ghodsi2 | ||
1Department of Computer Engineering, Sharif University of Technology, Tehran, Iran | ||
2Department of Computer Engineering, Sharif University of Technology, Tehran, Iran. School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran. | ||
چکیده | ||
A well-known clustering problem called Density-Based Spatial Clustering of Applications with Noise~(DBSCAN) involves computing the solutions of at least one disk range query per input point, computing the connected components of a graph, and bichromatic fixed-radius nearest neighbor. MapReduce class is a model where a sublinear number of machines, each with sublinear memory, run for a polylogarithmic number of parallel rounds. Most of these problems either require quadratic time in the sequential model or are hard to compute in a constant number of rounds in MapReduce. In the Euclidean plane, DBSCAN algorithms with near-linear time and a randomized parallel algorithm with a polylogarithmic number of rounds exist. We solve DBSCAN in the Euclidean plane in a constant number of rounds in MapReduce, assuming the minimum number of points in range queries is constant and each connected component fits inside the memory of a single machine and has a constant diameter. | ||
کلیدواژهها | ||
Massively parallel algorithms؛ range searching؛ unit disk graph؛ near neighbors؛ clustering | ||
مراجع | ||
[1] J. Gan and Y. Tao, On the hardness and approximation of Euclidean DBSCAN, ACM Trans. Database Syst., 42 no. 3 (2017) 1–45. [2] Y. Wang, Y. Gu and J. Shun, Theoretically-efficient and practical parallel DBSCAN, In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, (2020) 2555–2571. [3] N. Bansal, A. Blum and S. Chawla, Correlation clustering, Mach. Learn., 56 (2004) 89–113. [4] N. Ailon, M. Charikar and A. Newman, Proofs of conjectures in “aggregating inconsistent information: Ranking and clustering”, Technical report, Technical Report TR-719-05, Department of Computer Science, Princeton University, 2005. [5] S. Chawla, K. Makarychev, T. Schramm and G. Yaroslavtsev, Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs, In Proceedings of the 2015 ACM Symposium on Theory of Computing, ACM, New York, (2015) 219–228. [6] M. Ester, H.-P. Kriegel, J. Sander and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, (1996) 226–231. [7] J. H. Friedman, J. L. Bentley and R. A. Finkel, An algorithm for finding best matches in logarithmic expected time, ACM Transactions on Mathematical Software, 3 no. 3 (1977) 209–226. [8] J. L. Bentley, Survey of techniques for fixed radius near neighbor searching, Technical report, Stanford Linear Accelerator Center, Calif.(USA), 1975. [9] J. Han, M. Kamber and J. Pei, Data mining concepts and techniques third edition, University of Illinois at Urbana-Champaign Micheline Kamber Jian Pei Simon Fraser University, 2012. [10] R. C. Hoetzlein, Fast fixed-radius nearest neighbors: interactive million-particle fluids, In GPU Technology Conference, 18 (2014) pp. 2. [11] M. de Berg, A. Gunawan and M. Roeloffzen, Faster DBSCAN and HDBSCAN in low-dimensional Euclidean spaces, Internat. J. Comput. Geom. Appl., 29 no. 1 (2019) 21–47. [12] E. Schubert, J. Sander, M. Ester, H. P. Kriegel and X. Xu, DBSCAN revisited, revisited: why and how you should (still) use DBSCAN, ACM Trans. Database Syst., 42 no. 3 (2017) 21 pp. [13] P. K. Agarwal, K. Fox, K. Munagala and A. Nath, Parallel algorithms for constructing range and nearest-neighbor searching data structures, In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, (2016) 429–440. [14] S. Aghamolaei, F. Baharifard and M. Ghodsi, Geometric spanners in the MapReduce model, Computing and combinatorics, Lecture Notes in Comput. Sci., Springer, Cham, 2018 675–687. [15] S. Aghamolaei, V. Keikha, M. Ghodsi and A. Mohades, Windowing queries using Minkowski sum and their extension to MapReduce, J. Supercomput., 77 (2021) 936–972. [16] S. Aghamolaei, V. Keikha, M. Ghodsi and A. Mohades, Sampling and sparsification for approximating the packedness of trajectories and detecting gatherings, Int. J. Data Sci. Anal., 15 no. 2 (2023) 201–216. [17] J. H. Reif and S. Sen, Optimal randomized parallel algorithms for computational geometry, Algorithmica, 7 no. 1 (1992) 91–117. [18] F. Frei and K. Wada, Efficient circuit simulation in MapReduce, In Proceedings of the 30th International Symposium on Algorithms and Computation, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019 pp. 21. [19] F. Frei and K. Wada, Efficient deterministic MapReduce algorithms for parallelizable problems, Journal of Parallel and Distributed Computing, 177 (2023) 28–38. [20] M. Garofalakis, J. Gehrke and R. Rastogi, Data stream management: A brave new world, In Data Stream Management: Processing High-Speed Data Streams, (2016) 1–9. [21] L. Arge, External memory data structures, Handbook of massive data sets, Massive Comput., 4, Kluwer Acad. Publ., Dordrecht, 2002 313–357. [22] G. Yaroslavtsev and A. Vadapalli, Massively parallel algorithms and hardness for single-linkage clustering under ℓp -distances, In Proceedings of the 35th International Conference on Machine Learning, (2018). [23] D. Nanongkai and M. Scquizzato, Equivalence classes and conditional hardness in massively parallel computations, Distrib. Comput., 35 no. 2 (2022) 165–183. [24] A. Andoni, A. Nikolov, K. Onak and G. Yaroslavtsev, Parallel algorithms for geometric graph problems, In STOC’14—Proceedings of the 2014 ACM Symposium on Theory of Computing, ACM, New York, (2014) 574–583. [25] G. Narasimhan and M. Smid, Geometric spanner networks, Cambridge University Press, 2007. [26] C. D. Toth, J. O’Rourke and J. E. Goodman, Handbook of discrete and computational geometry, CRC press, second edition edition, 2004. [27] M. T. Goodrich, N. Sitchinava and Q. Zhang, Sorting, searching, and simulation in the MapReduce frame-work, In Proceedings of the 22nd International Symposium on Algorithms and Computation, Springer Science & Business Media, 7074 (2011) 374–383. [28] L. Barba, P. Bose, M. Damian, R. Fagerberg, W. L. Keng, J. O’Rourke, A. van Renssen, P. Taslakian, S. Verdonschot and G. Xia, New and improved spanning ratios for yao graphs, J. Comput. Geom., 6 no. 2 (2015) 19–53. [29] D. Bakhshesh and M. Farshi, A lower bound on the stretch factor of yao graph y4, Scientia Iranica, 29 no. 6 (2022) 3244–3248. [30] R. E. Tarjan and U. Vishkin, An efficient parallel biconnectivity algorithm, SIAM J. Comput., 14 no. 4 (1985) 862–874 (1985). [31] K. Bringmann, Fine-grained complexity theory: Conditional lower bounds for computational geometry, In Proceedings of the 17th conference on computability in Europe, Springer, (2021) 60–70. [32] S. Guha and S. Khuller, Approximation algorithms for connected dominating sets, Algorithmica, 20 (1998) 374–387. [33] H. Karloff, S. Suri and S. Vassilvitskii, A model of computation for MapReduce, In Proceedings of the 21st annual ACM-SIAM symposium on Discrete Algorithms, SIAM, Philadelphia, PA, (2010) 938–948. [34] P. Beame, P. Koutris and D. Suciu, Communication steps for parallel query processing, J. ACM, 64 no. 6 (2017) 58 pp. [35] J. L. Bentley, D. F. Stanat and E. H. Williams Jr, The complexity of finding fixed-radius near neighbors, Information Processing Lett., 6 no. 6 (1977) 209–212. [36] S. Har-Peled, Geometric approximation algorithms, Mathematical Surveys and Monographs, 173, American Mathematical Society, Providence, RI, 2011. [37] S. Suri and S. Vassilvitskii, Counting triangles and the curse of the last reducer, In Proceedings of the 20th international conference on World wide web, (2011) 607–614. | ||
آمار تعداد مشاهده مقاله: 99 تعداد دریافت فایل اصل مقاله: 90 |