Tryag File Manager
Home
-
Turbo Force
Current Path :
/
home
/
cluster1
/
data
/
bu01
/
1121861
/
html
/
tutorial
/
Upload File :
New :
File
Dir
/home/cluster1/data/bu01/1121861/html/tutorial/tutorial_original.htm
<html> <head> <title>NLE : TutorialTT</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body bgcolor="#FFFFCC" text="#000000"> <p align="center"><font face="Arial, Helvetica, sans-serif" size="6"><b><font color="#800000">TUTORIAL</font></b></font></p> <p align="left"><font color="#800000"><b><font face="Arial, Helvetica, sans-serif" size="6"><b><font size="5">Table of Contents (click on desired section):</font></b></font></b></font></p> <ul> <li><b><font face="Arial, Helvetica, sans-serif" size="5" color="#000000">Lexicon</font> </b> <ul> <li><font face="Arial, Helvetica, sans-serif" size="5" color="#FFFFFF"><a href="#basic_search"><font color="#000000">Basic searching</font></a></font><font size="5"> (the search template)</font></li> <li><font face="Arial, Helvetica, sans-serif" size="5" color="#000000">Word searches in Nahuatl</font></li> <li><font face="Arial, Helvetica, sans-serif" size="5" color="#000000">Word searches in English</font></li> <li><font face="Arial, Helvetica, sans-serif" size="5" color="#FFFFFF"><a href="#regexp"><font color="#000000">Regular expression searches</font></a></font></li> </ul> </li> <ul> <li><font face="Arial, Helvetica, sans-serif" size="5" color="#FFFFFF"><a href="#ch_class"><font color="#000000">Character classes</font></a></font></li> <li><a href="#VLN"><font face="Arial, Helvetica, sans-serif" size="5" color="#000000">VLN and PRN</font></a></li> <li><font face="Arial, Helvetica, sans-serif" size="5" color="#FFFFFF"><a href="#Sound"><font color="#000000">Sounds</font></a></font></li> <li><font face="Arial, Helvetica, sans-serif" size="5" color="#FFFFFF"><a href="#Sound"><font color="#000000">Selecting specific fields for display</font></a></font></li> <li><font face="Arial, Helvetica, sans-serif" size="5" color="#FFFFFF"><a href="#Sound"><font color="#000000">Data cleansing </font></a></font></li> </ul> <li><b><font face="Arial, Helvetica, sans-serif" size="5" color="#000000">Grammar</font></b></li> <li><a href="#ency"><b><font face="Arial, Helvetica, sans-serif" size="5" color="#000000">Encyclopedia</font></b></a></li> </ul> <p align="left"> </p> <p><font face="Arial, Helvetica, sans-serif" size="6"><b><font size="5" color="#800000">Lexicon</font></b></font> </p> <p><font size="5" face="Arial, Helvetica, sans-serif" color="#800000"><a name="structure"></a>Structure</font> </p> <p> </p> <p><font size="5" face="Arial, Helvetica, sans-serif" color="#800000"><a name="searching"></a>Searching</font> </p> <blockquote> <p><font size="5" face="Arial, Helvetica, sans-serif" color="#800000"><a name="basic_search"></a>Basic searching</font> </p> <p><font face="Times New Roman, Times, serif" size="4">Basic searches are accomplished by specifying variables in each of the three columns of the search template. Up to five rows may be specified using the logical expressions <i>and</i> and <i>or</i>. Thus one can search for words that begin with <i>cho:</i> and end with <i>ka </i>through the use of two rows joined by <i>and</i>. There are several things to remember in any search:</font></p> <ol> <li><font face="Times New Roman, Times, serif" size="4">the field column lists in text the fields that are searched with any submission. Note that often more than one field is searched. For example, when searching for <i>Ameyaltepec word— </i>several fields are searched:</font></li> </ol> <p><font size="5" face="Arial, Helvetica, sans-serif" color="#800000">Word searching in Nahuatl</font></p> <p><font size="5" face="Arial, Helvetica, sans-serif" color="#800000">Word searching in English</font></p> <p><font face="Times New Roman, Times, serif" size="4">At this time, given the immense amount of time it would involve, there are no simple glosses or single-word definitions for Nahuatl words. A search for the Nahuatl equivalent of any English word must be conducted in an English sense field (<i>/sea, /seo, /seao</i>) with the logical operator <i>contains word</i>. What this does is look in the various sense fields for a character string as word (that is, preceded by a space and followed by a space or punctuation). A user could, therefore, search:</font></p> <ul> <li><font face="Times New Roman, Times, serif" size="4"><i>English sense—contains word—cry</i></font></li> </ul> <p><font face="Times New Roman, Times, serif" size="4">This will return 5 hits, including</font></p> <blockquote> <p><font face="Times New Roman, Times, serif" size="4"><i>yo:ltepistik</i> : 1 : to be tough of character; to be hard-hearted; to be tenancious; to be able to endure adversity (e.g., a person who does not cry or break down when scolded or beaten, or who shows little tendency to back down when their compasion is appealed to)</font></p> </blockquote> <p><font face="Times New Roman, Times, serif" size="4">Clearly <i>yo:ltepistik</i> is not what most users would expect; it was listed simply because <i>cry</i> is contained in the definition: <i>person who does not cry or break down when scolded or beaten</i>.</font></p> <p><font face="Times New Roman, Times, serif" size="4">There are, however, reasons for not writing a keyword search or simple English word finder function. The first is simply that of resources. Given that many entries have to be redefined, elaborated, and otherwise checked, the implications of creating a word finder list at this time, with a dictionary in process, are that other tasks, which are probably more urgent, would have to be neglected. A second reason is that many Nahuatl words are incapable of being summarized in English to a degree that would permit searches. Finally, there would be a great chance of leaving basic English words out. </font></p> <p><font face="Times New Roman, Times, serif" size="4">The benefits of the present system is that in searching for any word, users are presented with a more complete semantic domain. One only needs to search for <i>English sense—contains word—happy </i>to see these advantages. Moreover, clever use of the multiple search functions should enable users to limit searches, with a little ingenuity. For example, if one searches for the word <i>order</i>, hundreds of hits are given, since any definition with the phrase <i>in order to</i> or <i>in order that</i> would be pulled up. Thus one could simply search for <i>English sense—contains word—order </i>and <i>English sense—does not contain sequence—in order. </i>This yields 8 hits; by further specifying <i>Part of speech—contains—N </i>only two results appear.</font></p> <p><font face="Times New Roman, Times, serif" size="4">Other ways of limiting searches involve placing limits on the size of the sense definition. For example, to find the word for <i>and</i> one can search <i>English sense—contains word—and </i> and <i>English sense—regular expression—^[a-z]{1,15}$</i>, i.e., that the total length of the sense field is between 1 and 15 letter characters. </font></p> <p> </p> <p><font face="Arial, Helvetica, sans-serif" size="5" color="#800000"><a name="regexp"></a>Regular expression searches</font></p> <p><font face="Times New Roman, Times, serif" size="4">The NLE : Lexicon search engine is based on the submission of regular expression queries to the MySQL database. (A regular expression is a series of symbols used to represent or describe a given string of text.) The regular expression submitted for any query is displayed at the bottom of the search results page. Thus if one submits <i>Ameyaltepec word—begins with—cho:ka</i>, the regexp submitted (and displayed at the foot of the results page) is as follows:</font></p> <blockquote> <blockquote> <p> <font face="Times New Roman, Times, serif" size="4">(lxa_REGEXP_'^(%?cho:ka)'_OR_lxa_REGEXP_'%cho:ka[a-zA-Z]*%?'<br> _OR_lxaa_REGEXP_'^(%?cho:ka)'_OR_lxa_REGEXP_'%cho:ka[a-zA-Z]*%?'<br> _OR_lxap_REGEXP_'^(%?cho:ka)'_OR_lxa_REGEXP_'%cho:ka[a-zA-Z]*%?'<br> )__ORDER_BY_alpha</font></p> </blockquote> </blockquote> <p><font face="Times New Roman, Times, serif" size="4">The <i>begins with</i> part of the query is represented by the ^ symbol, which signifies <i>start of line</i>. If the query is changed to <i>Ameyaltepec word—ends with—cho:ka</i>, the regexp submitted (as displayed) is as follows: </font></p> <blockquote> <blockquote> <p><font size="4" face="Times New Roman, Times, serif">(lxa_REGEXP_'(cho:ka)$'_OR_lxa_REGEXP_'%[a-zA-Z]*(cho:ka%?)$'<br> _OR_lxaa_REGEXP_'(cho:ka)$'_OR_lxa_REGEXP_'%[a-zA-Z]*(cho:ka%?)$'<br> _OR_lxap_REGEXP_'(cho:ka)$'_OR_lxa_REGEXP_'%[a-zA-Z]*(cho:ka%?)$'<br> )__ORDER_BY_alpha</font></p> </blockquote> </blockquote> <p><font face="Times New Roman, Times, serif" size="4">In this regexp the $ symbol signifies <i>end of line</i> (though literally it means <i>up to a newline character</i>).</font></p> <p><font face="Times New Roman, Times, serif" size="4"> The search template, therefore, converts each column (e.g., <i>Ameyaltepec word, ends with, cho:ka</i>) into a regexp. The expression <i>Ameyaltepec word</i> is set up to prompt a search in three fields: <i>lxa </i>(the lexical headword entry), <i>lxaa</i> (an alternate pronunciation of the headword entry), and <i>lxap</i> (a practical orthography of the headword entry). The search is actually carried out in fields that have been stripped of diacritics (e.g., accents); however, a corresponding display field which has the diacritics is maintained in database for online display. Thus the MySQL database (which is how the information is stored) has a field (or column) named lxa, which is the Ameyaltepec headword stripped of diacritics, as well as a field named lxa_d, which is the original field with all the diacrtics. The search is on the stripped-down field (lxa), the display is of the original field (lxa_d).</font></p> <p><font face="Times New Roman, Times, serif" size="4">Some users might want to use regular expressions in their queries. They can do this by selecting the fields to search on in the pulldown menu of the first column ofthe search template, selecting <i>regular expression</i> from the second column, and then typing in a regular expression in the third column. For example, if users want to search for all words that begin with /t/ or /k/ followed by a long /a:/ they have two options. The first would be to use two rows of the search engine joined by <i>or</i>:</font></p> <ul> <li><font face="Times New Roman, Times, serif" size="4"><i>Ameyaltepec word—</i><i>begins with—ta: </i></font></li> <li><font face="Times New Roman, Times, serif" size="4">or</font></li> <li><font face="Times New Roman, Times, serif" size="4"><i>Ameyaltepec word—begins with—ka:</i></font></li> </ul> <p><font face="Times New Roman, Times, serif" size="4">However, the same result can be accomplished with a regexp. The user could search:</font></p> <ul> <li><font face="Times New Roman, Times, serif" size="4"><i>Ameyaltepec word—regular expression—^[tk]a:</i></font></li> </ul> <p><font face="Times New Roman, Times, serif" size="4">In this case the user-entered regexp might not provide much of an advantage to letting the search engine construct the same query. However, in other cases the possibility of using regular expressions is a powerful tool. </font></p> <p><font face="Times New Roman, Times, serif" size="4">What follows is a brief explanation of the most important symbols used in regular expressions:</font></p> <p><font face="Times New Roman, Times, serif" size="4"> </font></p> <table width="75%" align="center"> <tr> <td width="11%"><b>Symbol</b> <td width="40%"> <div align="left"><b>Meaning</b></div> </td> <td width="15%"> <div align="left"><b>Example</b></div> </td> <td width="34%"> <div align="left"><b>Explanation</b></div> </td> </tr> <tr valign="top"> <td width="11%"> <div align="left">^</div> </td> <td width="40%"> <div align="left">begins with</div> </td> <td width="15%"> <div align="left">^k</div> </td> <td width="34%"> <div align="left">searches for all fields that begin with /k/</div> </td> </tr> <tr valign="top"> <td width="11%"> <div align="left">$</div> </td> <td width="40%"> <div align="left">ends with</div> </td> <td width="15%"> <div align="left">$k</div> </td> <td width="34%"> <div align="left">searches for fields that end with /k/</div> </td> </tr> <tr valign="top"> <td width="11%"> <div align="left">*</div> </td> <td width="40%"> <div align="left">preceding character may not exist or have one to infinity continuous repetitions</div> </td> <td width="15%"> <div align="left">^ka*</div> </td> <td width="34%"> <div align="left">searches for fields that begin with /k/ followed by zero to infinity of /a/</div> </td> </tr> <tr valign="top"> <td width="11%"> <div align="left">+</div> </td> <td width="40%"> <div align="left">preceding character may be followed by any number of repetitions of that character</div> </td> <td width="15%"> <div align="left">^ka+</div> </td> <td width="34%"> <div align="left">searches for fields that begin with /k/ followed by one to infinity of /a/</div> </td> </tr> <tr valign="top"> <td width="11%"> <div align="left">?</div> </td> <td width="40%"> <div align="left">preceding character may or may not exist</div> </td> <td width="15%"> <div align="left">^ka:?</div> </td> <td width="34%"> <div align="left">searches for fields that begin with /k/ followed by /a/ that may or may not be long (i.e., may or may not have a colon after it)</div> </td> </tr> <tr valign="top"> <td width="11%"> <div align="left">(#)</div> </td> <td width="40%"> <div align="left">may be used with a number inside to indicate the exact number of repetitions of the preceding character</div> </td> <td width="15%"> <div align="left">^ka(2)</div> </td> <td width="34%"> <div align="left">searches for fields that begin with /k/ followed by 2 /a/'s</div> </td> </tr> <tr valign="top"> <td width="11%"> <div align="left">(#,)</div> </td> <td width="40%"> <div align="left">may be used with a number inside to indicate at least the number of repetitions of the preceding character</div> </td> <td width="15%"> <div align="left">^ka(2,)</div> </td> <td width="34%"> <div align="left">searches for fields that begin with /k/ followed by at least 2 /a/'s</div> </td> </tr> <tr valign="top"> <td width="11%"> <div align="left">(#,#)</div> </td> <td width="40%"> <div align="left">may be used with a number inside to indicate the range of repetitions of the preceding character</div> </td> <td width="15%"> <div align="left">^ka(2,4)</div> </td> <td width="34%"> <div align="left">searches for fields that begin with /k/ followed by between 2 and 4 /a/'s</div> </td> </tr> <tr valign="top"> <td width="11%">[]</td> <td width="40%">used to match a string that contains any of the characters or digits in the brackets</td> <td width="15%">^k[ie]:</td> <td width="34%">searches for fields that begin with /k/ followed by a long /i:/ or a long /e:/</td> </tr> <tr valign="top"> <td width="11%">.</td> <td width="40%">matches any character (including punctuation but not digits)</td> <td width="15%">^..k</td> <td width="34%">searches for fields whose third character is /k/</td> </tr> <tr valign="top"> <td width="11%">-</td> <td width="40%">when used within brackets searches for any character within the range expressed by the characters before and after the dash</td> <td width="15%">^[a-c]</td> <td width="34%">searches for fields that begin with /a/, /b/, or /c/ (this is equivalent to ^(abc) as well as ^a|b|c|d</td> </tr> <tr valign="top"> <td width="11%">|</td> <td width="40%">used to express "or"</td> <td width="15%">^(cho:ka|to:ka)</td> <td width="34%">searches for fields that begin with <i>cho:ka </i>or <i>to:ka</i>; note that the expression must be included within parentheses</td> </tr> </table> <p> </p> <p><font size="5" face="Arial, Helvetica, sans-serif" color="#800000"><a name="ch_class"></a>Character classes</font></p> <p><font face="Times New Roman, Times, serif" size="4">Character classes comprise a set of characters that are represented by a single, unique symbol. This enables the user to conduct searches that yield results for a variety of conditions. The database is always queried through a regular expression, but convenient shortcuts may be established by selecting a single symbol to represent the entire class. For example, if one wanted to search for any word that started with a sequence <i>t-vowel-t </i>one would write: <b>^t[aeiou]:?t </b>as the regular expression. The<b> ^</b> indicates 'beginning of field,' the <b>[ ]</b> indicate any value included within the brackets (in this case any of the five Nahuatl vowels), the colon is used in the present orthography for vowel length, and the <b>?</b> indicates an optional preceding character. Finally, the <b>t</b> ends the sequence.</font></p> <p><font face="Times New Roman, Times, serif" size="4">A predefined character class, symbolized by V, has been selected to represent any vowel. This symbol inserts the regular expression [aeiou]:? into any search string. Thus, a user who wants to search for any initial sequence of <i>t-vowel-t</i> could simply write <i>tVt. </i>At present there are 3 predefined character sets that have been hard coded into the program:</font></p> <ul> <li><font face="Times New Roman, Times, serif" size="4">C = any consonant</font></li> <li><font face="Times New Roman, Times, serif" size="4">V = any vowel, long or short</font></li> <li><font face="Times New Roman, Times, serif" size="4">S = semi-vowels, i.e., /w/ and /y/</font></li> </ul> <p><font face="Times New Roman, Times, serif" size="4">In addition, users may predefine there own character sets. To do this they must select a character (capital letters are selected) and then set it as equivalent to a regular expression. For most classes this will involve simply the symbol, the equal sign, and a series of characters that are to be included in the class. For example, one might one to establish a symbol for front vowel (/i/ and /e/), regardless of length. This would be done within the box in the search form as follows:</font></p> <ul> <li><font face="Times New Roman, Times, serif" size="4">F = [ie]:?</font></li> </ul> <p><font face="Times New Roman, Times, serif" size="4">In this case it is not necessary to use the parentheses, though one could write</font></p> <ul> <li><font face="Times New Roman, Times, serif" size="4">F = ([ie]:?)</font></li> </ul> <p><font face="Times New Roman, Times, serif" size="4">Parentheses are necessary, however, if one wants to ensure that the characters are included as part of the regular expression. Thus if one wanted a symbol, e.g., T, to represent all alveolar stops and affricates, one would write</font></p> <ul> <li><font face="Times New Roman, Times, serif" size="4">T = (t|tl|ts)</font></li> </ul> <p><font face="Times New Roman, Times, serif" size="4">Note that multiple character classes cannot be enclosed in square brackets. Instead the pipe symbol | should be used within parentheses to establish the possible variables in the regular expression. For example, if one wishes to final all the words that begin with /kw/ in a closed (consonant final) syllable, one cannot use</font></p> </blockquote> <ul> </ul> <blockquote> <ul> <li><font face="Times New Roman, Times, serif" size="4">^kw[aeiou]:?[CS][CS]</font></li> </ul> <p><font face="Times New Roman, Times, serif" size="4">rather, one must write</font></p> <ul> <li><font face="Times New Roman, Times, serif" size="4">^kw[aeiou]:?(C|S)(C|S)</font></li> </ul> <p><font face="Times New Roman, Times, serif" size="4">The reason for the above limitation is that the character classes </font></p> </blockquote> <p> </p> <p><font face="Arial, Helvetica, sans-serif" size="5" color="#800000"><a name="VLN"></a>VLN and PRN</font></p> <p><font face="Times New Roman, Times, serif" size="4">These two buttons on the side of the search form serve are neutralization switches that are designed to make searches easier for users unfamiliar with the location of vowel length distinctions in Nahuatl or with the specific phonological rules of Ameyaltepec and Oapan. </font></p> <p><font face="Times New Roman, Times, serif" size="4"><b>VLN:</b> By checking this box on any line, vowel length distinctions are "neutralized" for all string searches in the box to the left. The regular expression submitted to the MySQL database has <b>:? </b>inserted after every vowel (even word final vowels). The display is the same as always, with vowel length displayed. The effect of this VLN function can be seen at the bottom of the results page. If one submits <i>Nahuatl word—begins with—toka</i>, the regular expression submitted to query the relevant fields is <b>^(%?to:?ka:?).</b></font></p> <p><font face="Arial, Helvetica, sans-serif" size="5" color="#800000"><a name="Sound"></a>Sound files</font></p> <p><font face="Times New Roman, Times, serif" size="4">At present sound files are linked to most headwords. In the future illustrative sentences will also have accompanying files. The headword sound files may be accessed by clicking on one of the two icons after the headwords. The diamond-shaped icon remits the user to an mp3 file; the musical note icon remits to a downsampled wave file. </font></p> <p><font face="Arial, Helvetica, sans-serif" size="5" color="#800000">Selecting specific fields for display</font></p> <p><font face="Times New Roman, Times, serif" size="4">If advanced users wish to select specific fields for display they may do so by directly accessing the following webiste: http://www.ldc.upenn.edu/hyperlex2/nahuatl/main_search.php4?user_lang=english&entry_template=generic</font></p> <p><font face="Arial, Helvetica, sans-serif" size="5" color="#800000">Data cleansing</font></p> <p><font face="Times New Roman, Times, serif" size="4">Cross-referenced fields or XML-tagged can be checked for broken links at http://www.ldc.upenn.edu/hyperlex2/nahuatl/cleanse_form.html</font></p> <p><font face="Times New Roman, Times, serif" size="4">The first row allows a user to check that the contents of a field such as \xvca, which should link to an \lxa field, does in fact link. Since \xvca is used only when the link is valid for the Ameyaltepec headword (\lxa) but not valid for the Oapan headword (\lxo) the cleansing parameters should be set as "xvca matches lxa should not match lxo."</font></p> <p><font face="Times New Roman, Times, serif" size="4">To test for valid (and invalid) XML tags, the second row should be used.</font></p> <p> </p> <p> </p> <p><font face="Arial, Helvetica, sans-serif" size="6"><b><font size="5" color="#800000"><a name="ency"></a>Encyclopedia</font></b></font></p> </body> </html>