Cross-Domain Transfer Learning for Demand Forecasting: Using Social Media Sentiment from Related Industries
DOI:
https://doi.org/10.55544/jrasb.1.2.12Keywords:
Data extracts, Window-based refresh, ETL optimization, Data warehousing, Big data, Performance tuning, Incremental updatesAbstract
This study examines various window-based techniques, including time-based, size-based, and hybrid approaches, and evaluates their effectiveness in improving extract performance. Through extensive analysis and empirical testing, we demonstrate that window-based strategies can significantly reduce processing time and resource utilization while maintaining data consistency and integrity. This research paper investigates the application of window-based refresh strategies to enhance the performance of data extracts in large-scale data management systems. Traditional extract, transform, load (ETL) processes often struggle with the increasing volume and velocity of data in modern environments. Window-based refresh strategies offer a promising solution by focusing on specific subsets of data during each refresh cycle. This paper shall be devoted to assessing the efficiency of window-based refresh strategies related to the issues described above. The primary research goals are: Propose a general framework with which to apply window-based refresh strategies during the data extract process. Assess the performance benefits derived from applying different types of approaches based on window-based forms as opposed to conventional full and incremental extracts.
Downloads
Metrics
References
Abadi, D., Ailamaki, A., Andersen, D., Bailis, P., Balazinska, M., Bernstein, P., ... & Zaharia, M. (2019). The Seattle Report on Database Research. ACM SIGMOD Record, 48(4), 44-53.
Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R. S., Lian, C., Huai, Y., ... & Franklin, M. J. (2015). Spark SQL: Relational data processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 1383-1394).
Bailis, P., Fekete, A., Franklin, M. J., Ghodsi, A., Hellerstein, J. M., & Stoica, I. (2015). Coordination avoidance in database systems. Proceedings of the VLDB Endowment, 8(3), 185-196.
Boehm, M., Schlegel, B., Volk, P. B., Fischer, U., Habich, D., & Lehner, W. (2020). Efficient in-memory indexing with generalized prefix trees. ACM Transactions on Database Systems (TODS), 45(1), 1-47.
Carbone, P., Fragkoulis, M., Kalavri, V., & Katsifodimos, A. (2020). Beyond analytics: the evolution of stream processing systems. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (pp. 2651-2658).
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., & Tzoumas, K. (2018). Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 28-38.
Chandramouli, B., Goldstein, J., Barnett, M., DeLine, R., Fisher, D., Platt, J. C., ... & Terwilliger, J. (2018). Trill: A high-performance incremental query processor for diverse analytics. Proceedings of the VLDB Endowment, 8(4), 401-412.
Chen, L., Gao, H., & Xu, Z. (2020). Adaptive parallel execution for window-based stream queries.
Delimitrou, C., & Kozyrakis, C. (2014). Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 127-144). ACM.
Dey, A., Fekete, A., Nambiar, R., & Röhm, U. (2016). YCSB+T: Benchmarking web-scale transactional databases. In 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW) (pp. 223-230). IEEE.
Fernandez, R. C., Migliavacca, M., Kalyvianaki, E., & Pietzuch, P. (2018). Integrating scale out and fault tolerance in stream processing using operator state management. In Proceedings of the 2018 International Conference on Management of Data (pp. 725-739). ACM.
Floratou, A., Agrawal, A., Graham, B., Rao, S., & Ramasamy, K. (2017). Dhalion: Self-regulating stream processing in Heron. Proceedings of the VLDB Endowment, 10(12), 1825-1836.
Jonas, E., Pu, Q., Venkataraman, S., Stoica, I., & Recht, B. (2017). Occupy the cloud: Distributed computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing (pp. 445-451). ACM.
Kraska, T., Alizadeh, M., Beutel, A., Chi, E. H., Kristo, A., Leclerc, G., ... & Zaharia, M. (2019). SageDB: A learned database system. In CIDR.
Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2017). The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data (pp. 489-504). ACM.
Krishnan, S., Wang, J., Wu, E., Franklin, M. J., & Goldberg, K. (2016). ActiveClean: Interactive data cleaning for statistical modeling. Proceedings of the VLDB Endowment, 9(12), 948-959.
Laptev, N., Amizadeh, S., & Flint, I. (2015). Generic and scalable framework for automated time-series anomaly detection. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1939-1947). ACM.
Li, J., Maier, D., Tufte, K., Papadimos, V., & Tucker, P. A. (2018). No pane, no gain: Efficient evaluation of sliding-window aggregates over data streams. In Proceedings of the 2018 International Conference on Management of Data (pp. 39-53). ACM.
Mao, H., Schwarzkopf, M., Venkatakrishnan, S. B., Meng, Z., & Alizadeh, M. (2019). Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication (pp. 270-288). ACM.
Ramakrishnan, S. R., Swart, G., & Urmanov, A. (2017). Balancing reducer skew in MapReduce workloads using progressive sampling. In Proceedings of the 2017 Symposium on Cloud Computing (pp. 282-294). ACM.
Shanbhag, A., Jindal, A., Madden, S., Quamar, A., & Zhou, H. (2017). A robust partitioning scheme for ad-hoc query workloads. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 1349-1364). ACM.
Sharma, P., Guo, T., He, X., Irwin, D., & Shenoy, P. (2016). Flint: Batch-interactive data-intensive processing on transient servers. In Proceedings of the Eleventh European Conference on Computer Systems (pp. 1-15). ACM.
Tangwongsan, K., Hirzel, M., Schneider, S., & Wu, K. L. (2017). General incremental sliding-window aggregation. Proceedings of the VLDB Endowment, 8(7), 702-713.
Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüş, H., & Naughton, J. F. (2021). Towards a learning optimizer for shared clouds. Proceedings of the VLDB Endowment, 12(3), 210-222.
Zamanian, E., Binnig, C., & Salama, A. (2015). Locality-aware partitioning in parallel database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 17-30). ACM.
Zhang, Y., Cui, B., Fu, H., Guo, W., & Zhang, W. (2019). AdaM: An adaptive partitioning mechanism for continuous query processing over data streams. The VLDB Journal, 28(3), 351-376
.Santhosh Palavesh. (2019). The Role of Open Innovation and Crowdsourcing in Generating New Business Ideas and Concepts. International Journal for Research Publication and Seminar, 10(4), 137–147. https://doi.org/10.36676/jrps.v10.i4.1456
Santosh Palavesh. (2021). Developing Business Concepts for Underserved Markets: Identifying and Addressing Unmet Needs in Niche or Emerging Markets. Innovative Research Thoughts, 7(3), 76–89. https://doi.org/10.36676/irt.v7.i3.1437
Palavesh, S. (2021). Co-Creating Business Concepts with Customers: Approaches to the Use of Customers in New Product/Service Development. Integrated Journal for Research in Arts and Humanities, 1(1), 54–66. https://doi.org/10.55544/ijrah.1.1.9
Santhosh Palavesh. (2021). Business Model Innovation: Strategies for Creating and Capturing Value Through Novel Business Concepts. European Economic Letters (EEL), 11(1). https://doi.org/10.52783/eel.v11i1.1784
Vijaya Venkata Sri Rama Bhaskar, Akhil Mittal, Santosh Palavesh, Krishnateja Shiva, Pradeep Etikani. (2020). Regulating AI in Fintech: Balancing Innovation with Consumer Protection. European Economic Letters (EEL), 10(1). https://doi.org/10.52783/eel.v10i1.1810
Challa, S. S. S. (2020). Assessing the regulatory implications of personalized medicine and the use of biomarkers in drug development and approval. European Chemical Bulletin, 9(4), 134-146.D.O.I10.53555/ecb.v9:i4.17671
EVALUATING THE EFFECTIVENESS OF RISK-BASED APPROACHES IN STREAMLINING THE REGULATORY APPROVAL PROCESS FOR NOVEL THERAPIES. (2021). Journal of Population Therapeutics and Clinical Pharmacology, 28(2), 436-448. https://doi.org/10.53555/jptcp.v28i2.7421
Challa, S. S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2019). Investigating the use of natural language processing (NLP) techniques in automating the extraction of regulatory requirements from unstructured data sources. Annals of Pharma Research, 7(5), 380-387.
Challa, S. S. S., Chawda, A. D., Benke, A. P., & Tilala, M. (2020). Evaluating the use of machine learning algorithms in predicting drug-drug interactions and adverse events during the drug development process. NeuroQuantology, 18(12), 176-186. https://doi.org/10.48047/nq.2020.18.12.NQ20252
Ranjit Kumar Gupta, Sagar Shukla, Anaswara Thekkan Rajan, Sneha Aravind, 2021. "Utilizing Splunk for Proactive Issue Resolution in Full Stack Development Projects" ESP Journal of Engineering & Technology Advancements 1(1): 57-64.
Sagar Shukla. (2021). Integrating Data Analytics Platforms with Machine Learning Workflows: Enhancing Predictive Capability and Revenue Growth. International Journal on Recent and Innovation Trends in Computing and Communication, 9(12), 63–74. Retrieved from https://ijritcc.org/index.php/ijritcc/article/view/11119
Sneha Aravind. (2021). Integrating REST APIs in Single Page Applications using Angular and TypeScript. International Journal of Intelligent Systems and Applications in Engineering, 9(2), 81 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6829
Siddhant Benadikar. (2021). Developing a Scalable and Efficient Cloud-Based Framework for Distributed Machine Learning. International Journal of Intelligent Systems and Applications in Engineering, 9(4), 288 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6761
Siddhant Benadikar. (2021). Evaluating the Effectiveness of Cloud-Based AI and ML Techniques for Personalized Healthcare and Remote Patient Monitoring. International Journal on Recent and Innovation Trends in Computing and Communication, 9(10), 03–16. Retrieved from https://www.ijritcc.org/index.php/ijritcc/article/view/11036
Challa, S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2019). Investigating the use of natural language processing (NLP) techniques in automating the extraction of regulatory requirements from unstructured data sources. Annals of PharmaResearch, 7(5), 380-387.
Dr. Saloni Sharma, & Ritesh Chaturvedi. (2017). Blockchain Technology in Healthcare Billing: Enhancing Transparency and Security. International Journal for Research Publication and Seminar, 10(2), 106–117. Retrieved from https://jrps.shodhsagar.com/index.php/j/article/view/1475
Saloni Sharma. (2020). AI-Driven Predictive Modelling for Early Disease Detection and Prevention. International Journal on Recent and Innovation Trends in Computing and Communication, 8(12), 27–36. Retrieved from https://www.ijritcc.org/index.php/ijritcc/article/view/11046
Fadnavis, N. S., Patil, G. B., Padyana, U. K., Rai, H. P., & Ogeti, P. (2020). Machine learning applications in climate modeling and weather forecasting. NeuroQuantology, 18(6), 135-145. https://doi.org/10.48047/nq.2020.18.6.NQ20194
Narendra Sharad Fadnavis. (2021). Optimizing Scalability and Performance in Cloud Services: Strategies and Solutions. International Journal on Recent and Innovation Trends in Computing and Communication, 9(2), 14–21. Retrieved from https://www.ijritcc.org/index.php/ijritcc/article/view/10889
Patil, G. B., Padyana, U. K., Rai, H. P., Ogeti, P., & Fadnavis, N. S. (2021). Personalized marketing strategies through machine learning: Enhancing customer engagement. Journal of Informatics Education and Research, 1(1), 9. http://jier.org
Bhaskar, V. V. S. R., Etikani, P., Shiva, K., Choppadandi, A., & Dave, A. (2019). Building explainable AI systems with federated learning on the cloud. Journal of Cloud Computing and Artificial Intelligence, 16(1), 1–14.
Vijaya Venkata Sri Rama Bhaskar, Akhil Mittal, Santosh Palavesh, Krishnateja Shiva, Pradeep Etikani. (2020). Regulating AI in Fintech: Balancing Innovation with Consumer Protection. European Economic Letters (EEL), 10(1). https://doi.org/10.52783/eel.v10i1.1810
Dave, A., Etikani, P., Bhaskar, V. V. S. R., & Shiva, K. (2020). Biometric authentication for secure mobile payments. Journal of Mobile Technology and Security, 41(3), 245-259.
Saoji, R., Nuguri, S., Shiva, K., Etikani, P., & Bhaskar, V. V. S. R. (2021). Adaptive AI-based deep learning models for dynamic control in software-defined networks. International Journal of Electrical and Electronics Engineering (IJEEE), 10(1), 89–100. ISSN (P): 2278–9944; ISSN (E): 2278–9952
Narendra Sharad Fadnavis. (2021). Optimizing Scalability and Performance in Cloud Services: Strategies and Solutions. International Journal on Recent and Innovation Trends in Computing and Communication, 9(2), 14–21. Retrieved from https://www.ijritcc.org/index.php/ijritcc/article/view/10889
Prasad, N., Narukulla, N., Hajari, V. R., Paripati, L., & Shah, J. (2020). AI-driven data governance framework for cloud-based data analytics. Volume 17, (2), 1551-1561.
Big Data Analytics using Machine Learning Techniques on Cloud Platforms. (2019). International Journal of Business Management and Visuals, ISSN: 3006-2705, 2(2), 54-58. https://ijbmv.com/index.php/home/article/view/76
Shah, J., Narukulla, N., Hajari, V. R., Paripati, L., & Prasad, N. (2021). Scalable machine learning infrastructure on cloud for large-scale data processing. Tuijin Jishu/Journal of Propulsion Technology, 42(2), 45-53.
Narukulla, N., Lopes, J., Hajari, V. R., Prasad, N., & Swamy, H. (2021). Real-time data processing and predictive analytics using cloud-based machine learning. Tuijin Jishu/Journal of Propulsion Technology, 42(4), 91-102
Secure Federated Learning Framework for Distributed Ai Model Training in Cloud Environments. (2019). International Journal of Open Publication and Exploration, ISSN: 3006-2853, 7(1), 31-39. https://ijope.com/index.php/home/article/view/145
Paripati, L., Prasad, N., Shah, J., Narukulla, N., & Hajari, V. R. (2021). Blockchain-enabled data analytics for ensuring data integrity and trust in AI systems. International Journal of Computer Science and Engineering (IJCSE), 10(2), 27–38. ISSN (P): 2278–9960; ISSN (E): 2278–9979.
Challa, S. S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2019). Investigating the use of natural language processing (NLP) techniques in automating the extraction of regulatory requirements from unstructured data sources. Annals of Pharma Research, 7(5),
Challa, S. S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2021). Navigating regulatory requirements for complex dosage forms: Insights from topical, parenteral, and ophthalmic products. NeuroQuantology, 19(12), 15.
Tilala, M., & Chawda, A. D. (2020). Evaluation of compliance requirements for annual reports in pharmaceutical industries. NeuroQuantology, 18(11), 27.
Ghavate, N. (2018). An Computer Adaptive Testing Using Rule Based. Asian Journal For Convergence In Technology (AJCT) ISSN -2350-1146, 4(I). Retrieved from http://asianssr.org/index.php/ajct/article/view/443
Shanbhag, R. R., Dasi, U., Singla, N., Balasubramanian, R., & Benadikar, S. (2020). Overview of cloud computing in the process control industry. International Journal of Computer Science and Mobile Computing, 9(10), 121-146. https://www.ijcsmc.com
Benadikar, S. (2021). Developing a scalable and efficient cloud-based framework for distributed machine learning. International Journal of Intelligent Systems and Applications in Engineering, 9(4), 288. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6761
Shanbhag, R. R., Balasubramanian, R., Benadikar, S., Dasi, U., & Singla, N. (2021). Developing scalable and efficient cloud-based solutions for ecommerce platforms. International Journal of Computer Science and Engineering (IJCSE), 10(2), 39-58.
Tripathi, A. (2020). AWS serverless messaging using SQS. IJIRAE: International Journal of Innovative Research in Advanced Engineering, 7(11), 391-393.
Tripathi, A. (2019). Serverless architecture patterns: Deep dive into event-driven, microservices, and serverless APIs. International Journal of Creative Research Thoughts (IJCRT), 7(3), 234-239. Retrieved from http://www.ijcrt.org
Thakkar, D. (2021). Leveraging AI to transform talent acquisition. International Journal of Artificial Intelligence and Machine Learning, 3(3), 7. https://www.ijaiml.com/volume-3-issue-3-paper-1/
Thakkar, D. (2020, December). Reimagining curriculum delivery for personalized learning experiences. International Journal of Education, 2(2), 7. Retrieved from https://iaeme.com/Home/article_id/IJE_02_02_003
Kanchetti, D., Munirathnam, R., & Thakkar, D. (2019). Innovations in workers compensation: XML shredding for external data integration. Journal of Contemporary Scientific Research, 3(8). ISSN (Online) 2209-0142.
Aravind Reddy Nayani, Alok Gupta, Prassanna Selvaraj, Ravi Kumar Singh, & Harsh Vaidya. (2019). Search and Recommendation Procedure with the Help of Artificial Intelligence. International Journal for Research Publication and Seminar, 10(4), 148–166. https://doi.org/10.36676/jrps.v10.i4.1503
Vaidya, H., Nayani, A. R., Gupta, A., Selvaraj, P., & Singh, R. K. (2020). Effectiveness and future trends of cloud computing platforms. Tuijin Jishu/Journal of Propulsion Technology, 41(3). Retrieved from https://www.journal-propulsiontech.com
Alok Gupta. (2021). Reducing Bias in Predictive Models Serving Analytics Users: Novel Approaches and their Implications. International Journal on Recent and Innovation Trends in Computing and Communication, 9(11), 23–30. Retrieved from https://ijritcc.org/index.php/ijritcc/article/view/11108
Rinkesh Gajera , "Leveraging Procore for Improved Collaboration and Communication in Multi-Stakeholder Construction Projects", International Journal of Scientific Research in Civil Engineering (IJSRCE), ISSN : 2456-6667, Volume 3, Issue 3, pp.47-51, May-June.2019
Voddi, V. K. R., & Konda, K. R. (2021). Spatial distribution and dynamics of retail stores in New York City. Webology, 18(6). Retrieved from https://www.webology.org/issue.php?volume=18&issue=60
Gudimetla, S. R., et al. (2015). Mastering Azure AD: Advanced techniques for enterprise identity management. Neuroquantology, 13(1), 158-163. https://doi.org/10.48047/nq.2015.13.1.792
Gudimetla, S. R., & et al. (2015). Beyond the barrier: Advanced strategies for firewall implementation and management. NeuroQuantology, 13(4), 558-565. https://doi.org/10.48047/nq.2015.13.4.876
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Sweta Kumari
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.