Data Cleansing Module
Embedded XML Cleanser
Note: This module searches the xml file for embedded tags. Then, it searches the MySQL database for entries containing the content within the embedded tags.
This means that you must put the xml file in the directory: jlex/nahuatl/php5.
Project Name: e.g. nahuatl
XML file: e.g. ActiveNahuatl_2005.xml
Head Tag: e.g. refgroup
Id Field: e.g. ref
Please provide a start ID and an end ID. This reduces the set of entries which are cleansed. Remember that you have over 70,000 tags.
Checking the content of every tag will be very time consuming (for the computer). Its better to check only a subset at a time.
However, by leaving the Start ID and End ID empty, all records will be checked.
Start ID: e.g. 1
End ID: e.g. 100
Fields to search for each embedded xml tag: Create rules to associate an embedded xml tag with the existence of an entry containing a particular column(s).
For example, for the embedded tag <nlao>, there should be an entry containing a value for the lxam field and the lxoa field.
To express this rule, write: nlao = lxam,lxoa
You may have embedded xml tags signifying an entry should not exist containing a value for a specified field. Use an ! to signify a column should be empty.
For example, for the embedded tag <nla>, there should be an entry containing a value for the lxam field but NOT for the lxoa field.
To express this rule, write: nla = lxam,!lxoa
NOTE: Seperate fields with by commas (with no spaces in between), e.g. lxam,lxoa
i
l
n
na
nao
nam
namoa
nba
nbo
nbao
nl
nla
nlo
nlao
nlam
nlamoa
no
nr
nt
spn
r