Ingestion engine with PySpark
I worked on a project in which it was necessary to perform multiple ingests of information to the datalake (on-premise). Up to that moment, the client performed new ingests of information as 100% new developments, implementing validations and processing established by the users. Performing ingests in this way mainly generated the following problems Repeated and not very scalable code. Repeated validations and processing, in case of a change it impacted on all the processes already developed....