At a higher level, you can select, both from file and database, either all rows or keys of all rows, sort them (strict binary order), and use comm -3 to find what keys are not loaded. This way, you know what to load with no duplicates or missing rows. You can UNIX join key file to sorted records file to get missing records. The join does not do pipes, so I wrote one, m1join, that can do many to one joins without ever seeking.
https://www.unix.com/shell-programmin...ent-value.html
Commands comm, sort are fine with pipes, which is especially handy if your UNIX allows ksh <(...) (/dev/fd/0-N), and means you are not juggling and waiting for so many files.
I wrote a batching utility called tailbatch that chops off N lines or for M seconds from the end of a growing flat file or pipe and runs a child script of your writing to process each batch. If the child exits not zero, tailbatch exits the same way, so the child controls the process! This helps keep transaction size down, both for restartability, for near real time loading, and for locality of reference speedup.