Data scrubbing is something we all have to deal with from time to time. Ruby is such a great language to do this scrubbing in, because most of the time it’s already installed on your operating system and doesn’t require any additional dependencies to work with. This is a little example that scrubs a CSV and converts the data in a file.
Given an original CSV file:
text;text;text;Mon 14 Nov 2016 13:07:30
text;text;text;Mon 15 Nov 2016 13:07:30
text;text;text;Mon 16 Nov 2016 13:07:30
We are going to scrub this and change the date to a format: DD-MM-YYYY HH:MM:SS
. Create this file as parser.rb
and save it’s contents. This will assume that the original data is in a file in the same directory called original.csv
:
require 'date'
newFile = File.new("new.csv", "w+")
oldFile = File.read("original.csv")
oldFile.lines.each do |line|
lineArray = line.split(';')
formattedDate = DateTime.parse(lineArray[3]).strftime('%d-%m-%Y %H:%M:%S')
lineArray[3] = formattedDate
newFile.puts "#{lineArray.join(';')}\n"
end
newFile.close
Now all we have to do is run this with: ruby parser.rb
and it will create a new file new.csv
in the same directory with our scrubbed output!