Code Python Script to scrape sales data from multiple PDF files, clean info and consolidate results on one single CSV file that will be sent to Google Drive
Description
This is the third Data Source for the Data Pipeline Project. By using Python we designed a script to scrape hundreds of pdfs files that have the same layout, however some of its graphs generate unstructured data that we need to capture and refine.
Leveraging the power of some Python libraries we parse the content and enrich it to feed a data structure that finally consolidate into one single CSV file that is subsequently sent over to assigned Google Drive folder as Gsheet
Child issues
Done
Done
Done
To Do
Activity
Show:
In Progress
Details
Priority
More fields
Assignee
None
Reporter
None
Labels
None
Due date
None
Original estimate
None
Time tracking
None
Fix versions
None
Affects versions
None
Components
None
Created: 30 May 2022, 23:52
Updated:
5 June 2022, 05:21