当前位置 博文首页 > 卜居:Caffe代码导读(4):数据集准备

    卜居:Caffe代码导读(4):数据集准备

    作者:[db:作者] 时间:2021-06-04 11:26

    Caffe上面有两个比较简单的例子:MNIST和CIFAR-10,前者是用于手写数字识别的,后者用于小图片分类。这两个数据集可以在Caffe源码框架中用脚本(CAFFE_ROOT/data/mnist/get_mnist.sh和CAFFE_ROOT/data/cifar10/get_cifar10.sh)下载,如下图所示:

    $ ./get_cifar10.sh
    Downloading...
    --2014-12-02 01:20:12-- ?http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
    Resolving www.cs.toronto.edu... 128.100.3.30
    Connecting to www.cs.toronto.edu|128.100.3.30|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 170052171 (162M) [application/x-gzip]
    Saving to: “cifar-10-binary.tar.gz”
    
    
    100%[===========================================================================================================================================================>] 170,052,171 ?859K/s ? in 2m 16s
    
    
    2014-12-02 01:22:28 (1.20 MB/s) - “cifar-10-binary.tar.gz” saved [170052171/170052171]
    
    
    Unzipping...
    Done.
    $ ls
    batches.meta.txt ?data_batch_1.bin ?data_batch_2.bin ?data_batch_3.bin ?data_batch_4.bin ?data_batch_5.bin ?get_cifar10.sh ?readme.html ?test_batch.bin
    
    
    $ ./get_mnist.sh
    Downloading...
    --2014-12-02 01:24:25-- ?http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
    Resolving yann.lecun.com... 128.122.47.89
    Connecting to yann.lecun.com|128.122.47.89|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 9912422 (9.5M) [application/x-gzip]
    Saving to: “train-images-idx3-ubyte.gz”
    
    
    100%[===========================================================================================================================================================>] 9,912,422 ? 2.09M/s ? in 6.7s
    
    
    2014-12-02 01:24:33 (1.42 MB/s) - “train-images-idx3-ubyte.gz” saved [9912422/9912422]
    
    
    --2014-12-02 01:24:33-- ?http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
    Resolving yann.lecun.com... 128.122.47.89
    Connecting to yann.lecun.com|128.122.47.89|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 28881 (28K) [application/x-gzip]
    Saving to: “train-labels-idx1-ubyte.gz”
    
    
    100%[===========================================================================================================================================================>] 28,881 ? ? ?42.0K/s ? in 0.7s
    
    
    2014-12-02 01:24:34 (42.0 KB/s) - “train-labels-idx1-ubyte.gz” saved [28881/28881]
    
    
    --2014-12-02 01:24:34-- ?http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
    Resolving yann.lecun.com... 128.122.47.89
    Connecting to yann.lecun.com|128.122.47.89|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 1648877 (1.6M) [application/x-gzip]
    Saving to: “t10k-images-idx3-ubyte.gz”
    
    
    100%[===========================================================================================================================================================>] 1,648,877 ? ?552K/s ? in 2.9s
    
    
    2014-12-02 01:24:39 (552 KB/s) - “t10k-images-idx3-ubyte.gz” saved [1648877/1648877]
    
    
    --2014-12-02 01:24:39-- ?http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
    Resolving yann.lecun.com... 128.122.47.89
    Connecting to yann.lecun.com|128.122.47.89|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 4542 (4.4K) [application/x-gzip]
    Saving to: “t10k-labels-idx1-ubyte.gz”
    
    
    100%[===========================================================================================================================================================>] 4,542 ? ? ? 19.8K/s ? in 0.2s
    
    
    2014-12-02 01:24:40 (19.8 KB/s) - “t10k-labels-idx1-ubyte.gz” saved [4542/4542]
    
    
    Unzipping...
    Done.
    $ ls
    get_mnist.sh ?t10k-images-idx3-ubyte ?t10k-labels-idx1-ubyte ?train-images-idx3-ubyte ?train-labels-idx1-ubyte
    

    如果你下载出现问题可以从我的资源处获取,网址http://download.csdn.net/detail/kkk584520/8213463。

    原始数据集为二进制文件,需要转换为leveldb或lmdb才能被Caffe识别。转换格式的工具已经集成在Caffe代码中,见CAFFE_ROOT/examples/mnist/convert_mnist_data.cpp

    和CAFFE_ROOT/examples/cifar10/convert_cifar_data.cpp,如果对leveldb或lmdb操作不熟悉可以从这两个源代码中学习。我们只需要在CAFFE_ROOT目录中执行两条命令即可:

    ./examples/mnist/create_mnist.sh

    ./examples/cifar10/create_cifar10.sh