![](images/logo.png?v=2)
International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Reviewer Referral Program
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
Conferences Published ↓
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 6 Issue 4
July-August 2024
Indexing Partners
![Academia.edu Academia](images/index-partners/academia.png)
![Advanced Sciences Index Advanced Sciences Index](images/index-partners/advanced-sciences.png)
![Bielefeld Academic Search Engine Bielefeld Academic Search Engine](images/index-partners/bielefeld.gif)
![CiteSeer CiteSeer](images/index-partners/cite-seer.png)
![DRJI DRJI](images/index-partners/drji.png)
![Google Scholar Google Scholar](images/index-partners/google-scholar.png)
![Independent Search Engine & Directory Network (isedn.org) Independent Search Engine & Directory Network](images/index-partners/isedn.jpg)
![ISI (International Scientific Indexing) ISI (International Scientific Indexing)](images/index-partners/isi.png)
![Issuu Issuu](images/index-partners/issuu.png)
![Mendeley Research Networks Mendeley Research Networks](images/index-partners/mendeley.png)
![RefSeek RefSeek](images/index-partners/ref-seek.png)
![ResearcherId - Thomson Reuters ResearcherId - Thomson Reuters](images/index-partners/researcher-id.png)
![ResearchGate ResearchGate](images/index-partners/research-gate.png)
![Scirus Scirus](images/index-partners/scirus.png)
![Scribd Scribd](images/index-partners/scribd.gif)
![Semantic Scholar Semantic Scholar](images/index-partners/semantic-scholar.png)
![UTeM - Universiti Teknikal Malaysia Melaka UTeM - Universiti Teknikal Malaysia Melaka](images/index-partners/utem.png)
![Wiki for Call for Papers Wiki for Call for Papers](images/index-partners/wiki-cfp.png)
![WorldCat WorldCat](images/index-partners/world-cat.png)
The Making of an Data Pipeline
Author(s) | Harsh Kaushik, Avnish Rai, Gaurav Kapasiya, Jai Prakash Bhati |
---|---|
Country | India |
Abstract | This paper details the development and implementation of a data engineering pipeline designed for the extraction, transformation, and loading (ETL) of data from a web-based directory. The project involves using asynchronous web scraping techniques to gather user details from a local business directory, transforming the data into a structured format, and loading it into a storage solution. The pipeline utilises Python, the HTTPX library for asynchronous HTTP requests, BeautifulSoup for HTML parsing, and Amazon S3 for data storage. By leveraging these technologies, the pipeline demonstrates an efficient approach to handling large-scale web data extraction and processing, significantly reducing the time required to gather and organise data from multiple web pages. This paper provides insights into the architecture, implementation, and performance of the ETL pipeline, highlighting the benefits and challenges of using asynchronous programming in data engineering. |
Keywords | ETL, Data engineering , Python, Async, Web Scraping, local.ch |
Field | Engineering |
Published In | Volume 6, Issue 3, May-June 2024 |
Published On | 2024-05-21 |
Cite This | The Making of an Data Pipeline - Harsh Kaushik, Avnish Rai, Gaurav Kapasiya, Jai Prakash Bhati - IJFMR Volume 6, Issue 3, May-June 2024. DOI 10.36948/ijfmr.2024.v06i03.20849 |
DOI | https://doi.org/10.36948/ijfmr.2024.v06i03.20849 |
Short DOI | https://doi.org/gtwmsm |
Share this
E-ISSN 2582-2160
![](images/ean-13-barcode.gif)
CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
![](images/loading.gif)