Finding Lost Parents
Finding Missing Parents in a Child-Parent Self-Referencing Table¶
In [1]:
import pymysql
from sqlalchemy import *
import getpass
Pattern: Find items in one list which does not occur in a second list¶
In [2]:
# To find parent_tids which do not have corresponding tids,
# convert lists to sets and calculate the difference.
tid = ["a", "b", "c", "d", "e"]
parent_tid = ["a", "f", "c", "m"]
list(set(parent_tid) - set(tid))
Out[2]:
Repeat using data from database¶
In [3]:
password = getpass.getpass('Database password: ')
In [5]:
s = 'mysql+pymysql://aubreymoore:{}@localhost/pestlist'.format(password)
db = create_engine(s)
In [6]:
rs = db.execute("select tid, parent_tid from taxon2;")
tid_list = []
parent_tid_list = []
for r in rs:
tid_list.append(r.tid)
parent_tid_list.append(r.parent_tid)
In [7]:
tid_list[:10]
Out[7]:
In [8]:
parent_tid_list[:10]
Out[8]:
In [9]:
missing_parents = list(set(parent_tid_list) - set(tid_list))
missing_parents
Out[9]:
Results¶
There are two missing parent_tids. Note that '#' is the parent of root nodes, so this is OK.
A constraint needs to be added to the taxon2 table to prevent entry of a parent_tid which does not match an existing tid.
In [ ]: