Data Cleansing Module

Project Name: e.g. nahuatl
Head Tag: e.g. refgroup
Id Field: e.g. ref
Cross Reference Field e.g. mref
Cross Reference Dialect Field e.g. mref_d
Fields to search for each dialect: Create rules to associate the value of a dialect field of a cross-reference with a headword field. This rules must be in the form X = Y, e.g. Am = lxam.


Embedded XML Cleanser

Note: This module searches the xml file for embedded tags. Then, it searches the MySQL database for entries containing the content within the embedded tags.
This means that you must put the xml file in the directory: jlex/nahuatl/php4.
Project Name: e.g. nahuatl
XML file: e.g. ActiveNahuatl_2005.xml
Head Tag: e.g. refgroup
Id Field: e.g. ref

Please provide a start ID and an end ID. This reduces the set of entries which are cleansed. Remember that you have over 70,000 tags.
Checking the content of every tag will be very time consuming (for the computer). Its better to check only a subset at a time.
However, by leaving the Start ID and End ID empty, all records will be checked.
Start ID: e.g. 1
End ID: e.g. 100

Fields to search for each embedded xml tag: Create rules to associate an embedded xml tag with the existence of an entry containing a particular column(s).
For example, for the embedded tag <nlao>, there should be an entry containing a value for the lxam field and the lxoa field.
To express this rule, write: nlao = lxam,lxoa

You may have embedded xml tags signifying an entry should not exist containing a value for a specified field. Use an ! to signify a column should be empty.
For example, for the embedded tag <nla>, there should be an entry containing a value for the lxam field but NOT for the lxoa field.
To express this rule, write: nla = lxam,!lxoa
NOTE: Seperate fields with by commas (with no spaces in between), e.g. lxam,lxoa