Ag Data Commons
8 files

Proximal Hyperspectral Image Dataset of Various Crops and Weeds for Classification via Machine Learning and Deep Learning Techniques

posted on 2024-03-06, 17:35 authored by Billy RamBilly Ram, Xin Sun, Kirk Howatt, Michael Ostlie, Joseph Mettler

About the Data

The data consists of proximal hyperspectral images of canola, soybean, sugarbeet, kochia, ragweed, redroot pigweed and waterhemp. The data was collected in the near infrared range of 400–1000 nm using Specim FX10 hyperspectral sensor, under controlled halogen light source. The platform and data acquisition software used for data collection was SPECIM's LabScanner system and Lumo Scanner respectively. The raw hyperspectral images were reference calibrated using the white and dark reference image. The hyperspectral images are saved as Numpy Array (.npy) files in their respective directories. Support Jupyter Notebooks provide additional tools for augmentation, region of interest selection, and spectral preprocessing.

Benefit of Data

  1. Data can enhance the number of data points for machine learning and deep learning models, aiding in classification or identification tasks.
  2. It can serve as a valuable instrument for studies in spectroscopy.
  3. It can assist in the development and testing of three-dimensional data models.

Dataset Information

Each plant consists of 20 images, each image having four plants. Except in the case of redroot pigweed which has one plant/image and consists of 40 images.

Number of images:

  1. canola = 20
  2. soybean = 20
  3. sugarbeet = 20
  4. kochia = 20
  5. ragweed = 20
  6. redraft_pigweed = 40
  7. water hemp = 20


USDA: 58-6064-8-023

Imaging technologies in precision agriculture can be used to address crop and livestock production issues in North Dakota

National Institute of Food and Agriculture

Find out more...


Data contact name

Ram, Billy, G.

Data contact email


Ag Data Commons

Intended use

This dataset serves multiple purposes, including validating weed classification and identification models. Additionally, it can be utilized for model development, analysis pipelines, and creating tools for handling three-dimensional plant canopy data.

Use limitations

1. The dataset includes noise in specific wavelengths. 2. The lighting conditions are not consistent throughout. 3. Leaves that occlude other parts of the plant are present in the dataset.

Temporal Extent Start Date


Temporal Extent End Date



  • notPlanned


  • Non-geospatial

Geographic Coverage

{ "type": "FeatureCollection", "features": [ { "type": "Feature", "properties": {}, "geometry": { "coordinates": [ -96.80557612516652, 46.895113651951846 ], "type": "Point" } }, { "type": "Feature", "properties": {}, "geometry": { "coordinates": [ -99.12064822911218, 47.50827829087268 ], "type": "Point" } } ] }

Geographic location - description

1. Greenhouse, North Dakota State University • Latitude and longitude: 46°53'42.4"N 96°48'19.6"W • City/town/region: Fargo • State: North Dakota • Country: USA 2. Carrington Research Extension Center • Latitude and longitude: 47°30'30.0"N 99°07'25.0"W • City/town/region: Carrington • State: North Dakota • Country: USA

ISO Topic Category

  • farming

Ag Data Commons Group

  • AgBioData

National Agricultural Library Thesaurus terms

hyperspectral imagery; data collection; crops; weeds; artificial intelligence; canola; soybeans; sugar beet; Bassia (Amaranthaceae); halogens; computer software; scanners; models; spectroscopy; hemp; canopy; wavelengths; lighting; leaves

Pending citation

  • Yes

Public Access Level

  • Public

Usage metrics



    Ref. manager