Code Python Script to scrape sales data from multiple PDF files, clean info and consolidate results on one single CSV file that will be sent to Google Drive
Description
This is the third Data Source for the Data Pipeline Project. By using Python we designed a script to scrape hundreds of pdfs files that have the same layout, however some of its graphs generate unstructured data that we need to capture and refine.
Leveraging the power of some Python libraries we parse the content and enrich it to feed a data structure that finally consolidate into one single CSV file that is subsequently sent over to assigned Google Drive folder as Gsheet
Child issues
Activity
Show:
In Progress
Add watchers
Details
Priority
Created: 30 May 2022, 23:52
Updated:
5 June 2022, 05:21