Parsing CSV Data Using Ruby
03 Jun 2025 - Eric Chrobak
When I need to extract large sets of data from an API there are 2 main ways I could go about doing this. I could make get API calls or I could do a bulk data export via the API.
Let’s go over the pros and cons of each method and then jump into how to parse CSV data using Ruby.
Get API Calls
- Pros
- Real-time data - You get the most up-to-date data.
- Data is in JSON format - Easy to work with if you’re familiar with JSON and don’t need to add extra gems to parse the data.
- Cons
- Rate limits - You can only make so many calls in a given time period that could be throttled per second, minute, hour, and/or day.
- Data is paginated - You have to make multiple calls to get all the data.
- Data is in JSON format - Can be difficult to work with if you’re not familiar with JSON.
- Time Consuming - It can take a long time to get all the data you need.
- Data Limitations - Some endpoints don’t provide all the record data in a bulk call. If you run into this then the endpoint is pretty useless to retrieve large sets of data from.
- Application and API timeouts - Your application’s process to retrieve that data could run into a timeout. The 3rd party API could timeout or not respond in a given time.
Bulk Data Export
Pros
- All data in one file - You get all the data in one file.
- Minimal API Calls - You only need to make a coule of calls to get all the data. Initiate job, check job status, download file.
- Less Time consuming - Depending on the limit of records per page you could be spending a lot of time making API calls to get all the data.
-
Reduced likelyhood of timeouts and API Failures - Your application is only waiting on the the API to process the export. There is a reduced likelyhood of timeouts or API failures.
- Cons
- Data is in CSV format - You need a gem to parse the csv.
All of the pros lean in favor of using a bulk data export when the API provides it.
Parsing CSV Data Using Ruby
There are a several Ruby gems I found to parse CSV data. I went to Ruby Toolbox do some initial analysis of the Ruby Gems. Then I compared with the Ruby Gems website. I landed on RubyXL as it has been recently maintained.
Steps
- install gem
- require gem
- Parse Data
- Into an Array
- Into another spreadsheet
Install Gem
- Cd to project root directory and Install to environment ‘'’bash $ gem install rubyXL -v 3.3.21 ‘’’
- Add to Gemfile ‘'’ruby gem ‘rubyXL’, ‘~> 3.3’, ‘>= 3.3.21’ ‘’’
- Bundle Gems in project root directory ‘'’bash $ bundle install ‘’’
- Add to relevant files ‘'’ruby require ‘rubyXL’ ‘’’
Parse File
We’re not goint to reinvent the wheel as the documentation provides most of what we need.
- Extract Workbook and Worksheet ‘'’ruby workbook = RubyXL::Parser.parse(‘directory/project/path/to/file’) worksheet = workbook[0] ‘’’
- Iterate over and retrieve cell data - this is where we add on top of what the documentation provides ‘’’ retrieved_data = [] worksheet.sheet_data.rows.each do |row| retrieved_row = [] row.cells.each do |cell| retrieved_row.push(cell.value) end retrieve_data.push(retrieved_row) end ‘’’
- Create a new array for the retrieved data
- Use sheet_data method which is not explicitely defined in the []Ruby Docs](https://www.rubydoc.info/gems/rubyXL/3.3.21/RubyXL/Worksheet) however in the schema it is defined.
- Get the rows which again is found in the schema.
- Iterate over the sheet data which is an array of rows.
- Create a new array to how to the cells
- Iterate over the row data which is an array of cells
- Push the cell data to the retrieved_row array.
- Push the retrieved_row array to the retrieved_data row
- Assign headers and row data to variable - optional if needed ‘'’ruby retrieved_headers = retrieved_data.slice!(0) retrieved_rows = retrieved_data ‘’’ Slice! will return the data at the index and modify the array the slice is called on
Write to .xlsx
- Create .xlsx workbook ‘'’ruby workbook = RubyXL::Workbook.new worksheet = workbook.worksheets[0] ‘’’
- Iterate over data and write to worksheet ‘’’ data.each_with_index do |row, row_index| row.each_with_index do |cell, cell_index| worksheet.add_cell(row_index, cell_index, cell) end end ‘’’
- Iterate over data as rows and row index
- Iterate over data in row as cell and cell index
- Write cell to worksheet
- Save Workbook ‘'’ruby file_name = ““file_name.xlsx” workbook.save(“folder/that/file/goes/#{file_name}”) ‘’’
Common issues & Troubleshooting
- Don’t have the correct path
You can run into this with Rails No Using The Right Ruby version - Error - “Cannot load such file” - Local Ruby version doesn’t match Lambda Ruby version**
- You will see an error saying “cannot load such file”. Here you can see that the Lambda Ruby version is set to 3.3.0. If you are using a different version, you will need to set your local Ruby version to 3.3.0, bundle the gems again, and push the changes to version control.
/var/lang/lib/ruby/
3.3.0
/rubygems/core_ext/kernel_require.rb:59:in `require{ "errorMessage": "cannot load such file -- lambda_function (LoadError)", "errorType": "Init<LoadError>", "stackTrace": [ "/var/lang/lib/ruby/3.3.0/rubygems/core_ext/kernel_require.rb:59:in `require'", "/var/lang/lib/ruby/3.3.0/rubygems/core_ext/kernel_require.rb:59:in `require'", "/var/task/hello_ruby_record.rb:1:in `<top (required)>'", "/var/lang/lib/ruby/3.3.0/rubygems/core_ext/kernel_require.rb:59:in `require'", "/var/lang/lib/ruby/3.3.0/rubygems/core_ext/kernel_require.rb:59:in `require'" ] }
- Trying to iterate over worksheet without assigning it to a variable ‘'’ruby workbook = RubyXL::Parser.parse(“file/location”) workbook[0] |sheet| sheet.sheet_data.rows.each |do| end end ‘’’ I was getting nil values for sheet_data when I didn’t assign the worksheet to a value.
Conclusion
Parsing and writing data from a .xlsx is straight forward. All you need with Ruby is to add a gem, extract the data, iterate over the rows and cells, create a workbook, once again iterate over the parsed data, right to the new workbook while iterting over the parsed data, and then save the workbook. Now you can go forth and parse your data!