当前位置 博文首页 > LY的博客:量化策略多因子选股之抓取不同时段的历史价格,及数据

    LY的博客:量化策略多因子选股之抓取不同时段的历史价格,及数据

    作者:[db:作者] 时间:2021-08-10 09:53

    #获取20170831,20180323的沪深三百成分及中证500的成分股当日收盘价
    
    
    # coding=utf-8
    import pandas as pd
    import tushare as ts
    import sys
    from sqlalchemy import create_engine
    import numpy as np
    
    reload(sys)
    sys.setdefaultencoding("utf-8")
    hs300=ts.get_hs300s()
    
    code=[]
    price=[]
    
    hs300_code= hs300['code']
    for stock in hs300_code:
        try:
            df=ts.get_k_data(stock,ktype='M',start='2017-08-01', end='2018-03-24')
    
            #print df
    
    
            df1= df.iloc[0,6]
    
    
            df2= df.iloc[0,2]
            #print df1
            #print df2
            code.append(df1)
            price.append(df2)
        except:
            pass
    
    print code
    print len(code)
    print price
    print len(price)
    
    price_201709=pd.DataFrame({'price':price},index=code
                              )
    print price_201709
    
    
    engine = create_engine('mysql://root:********@127.0.0.1/stock_report?charset=utf8')
    price_201709.to_sql('price_201709',engine,if_exists='append')
    
    

    然后在MYSQL进行表连接,将需要的财务数据(2017年第二季度)、历史股价连接

    计算市值 (总市值和流通市值)


    计算PE


    CREATE TABLE final

    SELECT p.index, b.name, p.price AS 20170901price, b.`outstanding`, b.`totals`
    , esp, p.price * outstanding AS marketvalue, totals * p.price AS totalvalue
    , p.price / esp AS pe, industry, g1.roe
    , g1.`net_profits`, g1.`profits_yoy`, g.`net_profit_ratio`, g.`gross_profit_rate`, g.business_income
    , `mbrg`, `nprg`, p2.price AS 20180323price
    FROM `price_201709` p, `get_report_basic_nochange` b, `get_report2_201706` g, `get_report_201706` g1, `get_report3_201706` g3, `price_20180328` p2
    WHERE p.index = b.code
    AND g.code = p.index
    AND g1.code = p.index
    AND g3.code = p.index
    AND p2.index = p.index

    浏览结果:


    接下来就是对部分财务数据进行取对数,检查数据的缺失值。。。

    cs