You have a very aggravated person who demands you purge their email from any table you have in your system.
You have lots of tables that have email addresses. How do you find which tables have this person's email address.
Below is a handy plpgsql function we wrote that does the following. Given a search criteria, field name pattern, table_name pattern,
schema name pattern, data type pattern, and max length of field to check, it will search all fields in the database fitting those
patterns and return to you the names of these schema.table.field names that contain the search phrase.
To use the below you would do something like:
SELECT pc_search_tablefield('%john@hotmail%', '%email%', '%', '%', '%', null);
The above will return all database field names that have the phrase email in the field name and that contain the term john@hotmail
Code of the function looks like this:
CREATEORREPLACEFUNCTIONpc_search_tablefield(param_search text, param_field_like text, param_table_like text,
param_schema_like text, param_datatype_like text, param_max_length integer)RETURNStextAS
result text :='';
var_match text :='';
searchsql text :='';
searchsql :=array_to_string(ARRAY(SELECT'SELECT'||quote_literal(quote_ident(c.table_schema)||'.'||quote_ident(c.table_name)||'.'||quote_ident(c.column_name))||'WHEREEXISTS(SELECT'||quote_ident(c.column_name)||'FROM'||quote_ident(c.table_schema)||'.'||quote_ident(c.table_name)||'WHERE'||CASEWHEN c.data_type IN('character', 'charactervarying', 'text')THENquote_ident(c.column_name)ELSE'CAST('||quote_ident(c.column_name)||'Astext)'END||'LIKE'||quote_literal(param_search)||')'As subsql
FROM information_schema.columns c
WHERE c.table_schema NOTIN('pg_catalog', 'information_schema')AND c.table_name LIKE param_table_like
AND c.table_schema LIKE param_schema_like
AND c.column_name LIKE param_field_like
AND c.data_type IN('"char"','character', 'charactervarying', 'integer', 'numeric', 'real', 'text')AND c.data_type LIKE param_datatype_like
AND(param_max_length ISNULLOR param_max_length =0OR character_maximum_length <= param_max_length)ANDEXISTS(SELECT t.table_name
FROM information_schema.tables t
WHERE t.table_type ='BASETABLE'AND t.table_name = c.table_name AND t.table_schema = c.table_schema)),
RAISE NOTICE '%', searchsql;
IF searchsql >''THENFOR var_match INEXECUTE(searchsql) LOOP
IF result >''THEN
result := result ||';'|| var_match;
result := var_match;
LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER;
does this take into account that the email address might be in someone elses data?
say you were storing all emails in pg. john wants all references to him deleted but jack and john have been talking. you can just delete all emails that jack has that has john's email in it headers.
Caleb Cushing (xenoterracide)
What it really doesn't take into account is that in many countries it is now illegal to delete any emails that have passed through your system within the last 5-10 years. (Yeah, that's a business issue, not so much a code issues).
One solution we developed at OmniTI was to use plperl and some perl modules to ingest, parse, and then store a breakdown of all email messages. One example portion of that search's all parts of the email (envelope, headers, and body) to stores any email addresses found in a table linked back to the original message. Hmm... I wonder if that code is hiding on labs...
Actually this script just returns the table field names that the email address appears in your database (or search phrase appears in) and will limit the search to only the pattern of tables, schemas, fields you specify.
It also by the way doesn't do a case insensitive search, so would miss JOHN@hotmail. That would be easy enough to change by doing an ILIKE or upper,lower check or a regular expression check at a potentially significant speed penalty.
As Robert said it is illegal to delete emails in many countries and it gets even more messy as each government agency/company has thier own rules too.
I guess the main point of this exercise for us - was to say are we culpable (e.g. is this person simply blaming us because someone is using our address to spam people which annoyingly happens a lot and a lot of mail servers ignore SPF rules).
If we find him in our system then we are likely to blame, but if not - we should investigate further or kindly tell him he is wrong.
You would say that wouldn't you? Its a very fetterish thing to say :).
I think people are all caught up in the stupid example I provided. The reason I wouldn't grep my pg_dump is because
1) I only have grep on my linux box not my windows box.
2) I'm not a grep hacker and I only want to search certain specific fields in my database. For example I may not care to check bodies of messages and only check short fields we use to spam people with.
3) Not really using this for email that much. I just thought that was an example people would be more likely to relate to than my real sinister intensions :).