当前位置 主页 > 网站技术 > 代码类 >

    win10子系统python开发环境准备及kenlm和nltk的使用教程(2)

    栏目:代码类 时间:2019-11-07 21:04

    3.zlib

    wget http://zlib.net/zlib-1.2.11.tar.gz
    tar xzf zlib-1.2.11.tar.gz
    cd zlib-1.2.11
    ./configure
    make
    make install

    4.bzip

    wget https://fossies.org/linux/misc/bzip2-1.0.6.tar.gz
    tar xzvf bzip2-1.0.6.tar.gz
    cd bzip2-1.0.6/
    make
    make install
    
    

    5.libbz2-dev

    apt-get install libbz2-dev

    6.kenlm

    在github上有详细的说明,https://github.com/kpu/kenlm。下载解压后

    cd kenlm
    mkdir -p build
    cd build
    cmake ..
    make -j 4 # 启用4个cpu去编译。提高编译速度
    cd ..
    python setup.py install

    测试,在python环境中导入kenlm无报错,说明kenlm安装成功。或者运行\kenlm\python\example.py文件

    nltk安装

    nltk直接用pip下载就行,nltk_data文件较大,可以离线下载后添加进路径。win10下使用nltk_data,直接放进D盘中就行,nltk会自动查找到。但是在Linux下需要将nltk_data路径添加到data,或者移动到下面输出的路径中。为了方便,我个人是建立了个软链接sudo ln -s /mnt/d/nltk_data /usr/local/nltk_data

    import nltk
    nltk.data.find(".")
    
    # Searched in:
    #   - '/root/nltk_data'
    #   - '/usr/local/nltk_data'
    #   - '/usr/local/share/nltk_data'
    #   - '/usr/local/lib/nltk_data'
    #   - '/usr/share/nltk_data'
    #   - '/usr/local/share/nltk_data'
    #   - '/usr/lib/nltk_data'
    #   - '/usr/local/lib/nltk_data'

    在当前会话下添加路径到data

    from nltk import data
    data.path.append(r"你下载的nltk_data所在路径")

    添加完路径,使用nltk.data.path查看当前已添加路径

    简单测试

    from nltk.tokenize import word_tokenize
    sentence = "since the 1890s , and beginning in france , the term ''libertarianism '' has often been used as an synonym for anarchism and was used almost exclusively in this sense until the 1950s in the united states ; its use as an synonym is still common outside the united states ."
    print(word_tokenize(sentence))

    总结

    以上所述是小编给大家介绍的win10子系统python开发环境准备及kenlm和nltk的使用教程,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对IIS7站长之家网站的支持!
    如果你觉得本文对你有帮助,欢迎转载,烦请注明出处,谢谢!