<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>nltk on Shylock Hg</title>
    <link>/tags/nltk/</link>
    <description>Recent content in nltk on Shylock Hg</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 19 Feb 2018 00:00:00 +0000</lastBuildDate>
    
	<atom:link href="/tags/nltk/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Terms in NLP</title>
      <link>/post/2018/02/19/terms-in-nlp/</link>
      <pubDate>Mon, 19 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018/02/19/terms-in-nlp/</guid>
      <description>   Terms Description     lexicon Wordbook or dictionary   homonym The word pronounced same as another   stopword The any of number of commonly used words such as the,to and also   synonym The words has the same or nearly meaning as another in the language   hyponym The more specific word   hypernym The more general word   meronym The subject of object   holonym The object contained subject   code point The inner-value of char(such as ASCII or Unicode value of a char)    </description>
    </item>
    
    <item>
      <title>Accessing web and local text</title>
      <link>/post/2018/02/17/accessing-web-and-local-text/</link>
      <pubDate>Sat, 17 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018/02/17/accessing-web-and-local-text/</guid>
      <description>1.Handling plain web text 1.1.Accessing web text Accessing web text as bellow:
import urllib url = &#39;http://www.gutenberg.org/files/2554/2554.txt&#39; #get the string of text file raw = urllib.urlopen(url).read() #with proxy proxy = {&#39;http&#39;:&#39;http://www.yourproxy.com:443&#39;} raw = urllib.urlopen(url,proxies=proxy).read()  1.2.Tokenizing the text Tokenizing a text(string) to produce a list of tokens.
#context same as upper #tokenize the text string tokens = nltk.word_tokenize(raw)  1.3.Creating nltk.Text object We can handle text by nltk after creating nltk.</description>
    </item>
    
    <item>
      <title>Lexical Resource</title>
      <link>/post/2018/02/17/lexical-resource/</link>
      <pubDate>Sat, 17 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018/02/17/lexical-resource/</guid>
      <description>1.Wordlist Corpora The corpora include only wordlists in /usr/dict/words.This is not a complete text but just collection of words in different area. The example as bellow:
   Corpora Description     nltk.corpus.words The corpora of common words   nltk.corpus.stopwords The any of number of commonly used words without special meaning,such as the   nltk.corpus.names The first names categoried by gender that stored in two different files.</description>
    </item>
    
    <item>
      <title>wordnet</title>
      <link>/post/2018/02/17/wordnet/</link>
      <pubDate>Sat, 17 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018/02/17/wordnet/</guid>
      <description>1.Overview The wordnet is semantically oriented dictionary of English.
2.Senses and Synonyms Accessing synonyms as bellow:
from nltk.corpus import wordnet #return the synset that word &#39;motorcar&#39; belong to. wordnet.synsets(&#39;motorcar&#39;) #return the list of lemma names about &#39;car.n.01&#39; wordnet.synset(&#39;car.n.01&#39;).lemma_names #return the list of lemma about &#39;car.n.01&#39; wordnet.synset(&#39;car.n.01&#39;).lemma #return string of definition of &#39;car.n.01&#39; wordnet.synset(&#39;car.n.01&#39;).definition #return the example sentence of &#39;car.n.01&#39; wordnet.synset(&#39;car.n.01&#39;).examples  3.The WordNet Hierarchy The relationship of father &amp;amp; child(is-a) As shown as bellow: Accessing hyponyms as bellow:</description>
    </item>
    
    <item>
      <title>FreqDist in NLTK</title>
      <link>/post/2018/02/16/freqdist-in-nltk/</link>
      <pubDate>Fri, 16 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018/02/16/freqdist-in-nltk/</guid>
      <description>1.FreqDist. Usage as bellow:
from nltk import FreqDist fdist = FreqDist([items])  This API FreqDist counting the count of each item in sequence and return the results as a list. But,it is one-dimensional frequency distribution.So you can just apply to one type items.
API of FreqDist    API description     FreqDist(samples) create the FreqDsit object   fdist.inc(sample) increment the count of this sample   fdist[&amp;lsquo;sample&amp;rsquo;] return the count of &amp;lsquo;sample&amp;rsquo;   fdist.</description>
    </item>
    
    <item>
      <title>Accessing Corpus by NLTK</title>
      <link>/post/2018/02/15/accessing-corpus-by-nltk/</link>
      <pubDate>Thu, 15 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018/02/15/accessing-corpus-by-nltk/</guid>
      <description> 1.Inner corpus. For inner corpus of NLTK , just import them as bellow:
from nltk.corpus import &amp;lt;****&amp;gt;  2.User corpus. For user own corpus , you need load them by NLTK reader as bellow:
#for plain text from nltk.corpus import PlaintextCorpusReader path = &#39;~/text&#39; pattern = r&#39;*.txt&#39; texts = PlaintextCorpusReader(path,pattern) #for mrg format text from nltk.corpus import BracketParserCorpusReader mrg_path = &#39;~/mrg&#39; mrg_pattern = r&#39;.*/wsj_.*\.mrg&#39; mrgs = BracketParserCorpusReader(mrg_path,mrg_pattern)  3.API of corpus obj    API Description     fileids() return list of all file ids of this corpus   fileids([categories]) return list of file ids belong to corresponding categories   categories() return list of all categories of this corpus   categories([fileids]) return list of categories of these fileids   raw return the list of all chars of the corpus   raw(filedis=[fileids]) return the list of chars of these fileids   raw(categories=[categories]) return the list of chars of these categories   words same as raw   sents same as words   abspath(fileid) return the absolute path of fileid   encoding(fileid) return the encoding of fileid   open(fileid) open the file return file object   root() return the root path of corresponding corpus   readme() return the readme file of corresponding corpus    </description>
    </item>
    
    <item>
      <title>Natural Language Understanding</title>
      <link>/post/2018/02/14/natural-language-understanding/</link>
      <pubDate>Wed, 14 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018/02/14/natural-language-understanding/</guid>
      <description> 1.Word Sense Disambiguation 2.Pronoun Resolution 2.1Anaphora Resolution 2.2Semantic Role Labeling 3.Generating Language Output 3.1Question Answering 3.2Machine Translation 4.Spoken Dialogue Systems To know the mean of question and organize the answer to reply.
5.Textual Entailment </description>
    </item>
    
    <item>
      <title>Simple statistics in NLP</title>
      <link>/post/2018/02/14/simple-statistics-in-nlp/</link>
      <pubDate>Wed, 14 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018/02/14/simple-statistics-in-nlp/</guid>
      <description>1.Frequency Distributions of words. At most time,we can know topic about a text from the frequent words or infrequent words. So,in this case,we should know the Frequency Distribution of words or collocations.
We can do this with nltk as bellow:
fdist = FreqDist(text)  2.Select words by length. Sometime,the length of words will tell us some information of text,specially with distribution of length of words. We can do this with nltk as bellow:</description>
    </item>
    
    <item>
      <title>Tips of button detect</title>
      <link>/post/2018/02/06/tips-of-button-detect/</link>
      <pubDate>Tue, 06 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018/02/06/tips-of-button-detect/</guid>
      <description>1.Problems There is a problem in usr button events detection of embeded system,when you need to detect multi events(such as click , double click and long press) in one button.
As shown bellow: In the picture , you can see that there is a gap of time between click can be sured and double click and long press can be sured.So,you can&amp;rsquo;t handle click event right now before double click or long press can be sured.</description>
    </item>
    
  </channel>
</rss>