Recently we found ourselves needing to purchase and download Zip+4 from the USPS. Zip+4 provides listing of mailable addresses in the US. We intend to use it for address validation.
Each file has one single line with no linefeeds or carriage returns! From spec, each 182-character segment constitutes a record. USPS was nice enough to provide a Java graphical app called CRLF that can inject breaks at specified intervals. That's all nice and well, but with hundreds of files to parse, using their interactive graphical CRLF tool is too tedious.
How could we compose a PostgreSQL function to handle the parsing? Unsure of the performance among procedural languages, we wrote the function in PL/pgSQL, SQL, and PL/V8 to compare. PL/V8 processed the files an astounding 100 times faster than the rest.
In our casual use of PL/V8, we found that when it comes to string, array, and mathematical operations, PL/V8 outshines PL/pgSQL, SQL, and in many cases PL/R and PL/Python.