i Parsing CSV Data Using Ruby | Eric Chrobak

Parsing CSV Data Using Ruby

03 Jun 2025 - Eric Chrobak


When I need to extract large sets of data from an API there are 2 main ways I could go about doing this. I could make get API calls or I could do a bulk data export via the API.

Let’s go over the pros and cons of each method and then jump into how to parse CSV data using Ruby.

Get API Calls

Bulk Data Export

Pros

All of the pros lean in favor of using a bulk data export when the API provides it.

Parsing CSV Data Using Ruby

There are a several Ruby gems I found to parse CSV data. I went to Ruby Toolbox do some initial analysis of the Ruby Gems. Then I compared with the Ruby Gems website. I landed on RubyXL as it has been recently maintained.

Steps

Install Gem

  1. Cd to project root directory and Install to environment ‘'’bash $ gem install rubyXL -v 3.3.21 ‘’’
  2. Add to Gemfile ‘'’ruby gem ‘rubyXL’, ‘~> 3.3’, ‘>= 3.3.21’ ‘’’
  3. Bundle Gems in project root directory ‘'’bash $ bundle install ‘’’
  4. Add to relevant files ‘'’ruby require ‘rubyXL’ ‘’’

Parse File

We’re not goint to reinvent the wheel as the documentation provides most of what we need.

  1. Extract Workbook and Worksheet ‘'’ruby workbook = RubyXL::Parser.parse(‘directory/project/path/to/file’) worksheet = workbook[0] ‘’’
  2. Iterate over and retrieve cell data - this is where we add on top of what the documentation provides ‘’’ retrieved_data = [] worksheet.sheet_data.rows.each do |row| retrieved_row = [] row.cells.each do |cell| retrieved_row.push(cell.value) end retrieve_data.push(retrieved_row) end ‘’’
  3. Create a new array for the retrieved data
  4. Use sheet_data method which is not explicitely defined in the []Ruby Docs](https://www.rubydoc.info/gems/rubyXL/3.3.21/RubyXL/Worksheet) however in the schema it is defined.
  5. Get the rows which again is found in the schema.
  6. Iterate over the sheet data which is an array of rows.
  7. Create a new array to how to the cells
  8. Iterate over the row data which is an array of cells
  9. Push the cell data to the retrieved_row array.
  10. Push the retrieved_row array to the retrieved_data row
  11. Assign headers and row data to variable - optional if needed ‘'’ruby retrieved_headers = retrieved_data.slice!(0) retrieved_rows = retrieved_data ‘’’ Slice! will return the data at the index and modify the array the slice is called on

Write to .xlsx

  1. Create .xlsx workbook ‘'’ruby workbook = RubyXL::Workbook.new worksheet = workbook.worksheets[0] ‘’’
  2. Iterate over data and write to worksheet ‘’’ data.each_with_index do |row, row_index| row.each_with_index do |cell, cell_index| worksheet.add_cell(row_index, cell_index, cell) end end ‘’’
  3. Iterate over data as rows and row index
  4. Iterate over data in row as cell and cell index
  5. Write cell to worksheet
  6. Save Workbook ‘'’ruby file_name = ““file_name.xlsx” workbook.save(“folder/that/file/goes/#{file_name}”) ‘’’

Common issues & Troubleshooting

  1. Don’t have the correct path You can run into this with Rails No Using The Right Ruby version - Error - “Cannot load such file” - Local Ruby version doesn’t match Lambda Ruby version**
    • You will see an error saying “cannot load such file”. Here you can see that the Lambda Ruby version is set to 3.3.0. If you are using a different version, you will need to set your local Ruby version to 3.3.0, bundle the gems again, and push the changes to version control.

    /var/lang/lib/ruby/3.3.0/rubygems/core_ext/kernel_require.rb:59:in `require

     {
       "errorMessage": "cannot load such file -- lambda_function (LoadError)",
       "errorType": "Init<LoadError>",
       "stackTrace": [
         "/var/lang/lib/ruby/3.3.0/rubygems/core_ext/kernel_require.rb:59:in `require'",
         "/var/lang/lib/ruby/3.3.0/rubygems/core_ext/kernel_require.rb:59:in `require'",
         "/var/task/hello_ruby_record.rb:1:in `<top (required)>'",
         "/var/lang/lib/ruby/3.3.0/rubygems/core_ext/kernel_require.rb:59:in `require'",
         "/var/lang/lib/ruby/3.3.0/rubygems/core_ext/kernel_require.rb:59:in `require'"
       ]
     }
    
  2. Trying to iterate over worksheet without assigning it to a variable ‘'’ruby workbook = RubyXL::Parser.parse(“file/location”) workbook[0] |sheet| sheet.sheet_data.rows.each |do| end end ‘’’ I was getting nil values for sheet_data when I didn’t assign the worksheet to a value.

Conclusion

Parsing and writing data from a .xlsx is straight forward. All you need with Ruby is to add a gem, extract the data, iterate over the rows and cells, create a workbook, once again iterate over the parsed data, right to the new workbook while iterting over the parsed data, and then save the workbook. Now you can go forth and parse your data!