Read CSV files in Common Lisp (cl-csv, data-table)

#lisp #commonlisp

I just helped friends for some CSV manipulation and did it for the first time from CL, here's a quick usage overview.

If you know a better© way please comment.

EDIT: lisp-stat's data-frame is pretty awesome.

I added the data-table utility in CIEL: http://ciel-lang.org/#/libraries?id=csv

cl-csv and a table-like data structure

We'll use cl-csv, data-table and cl-csv-data-tables (a system defined in cl-csv).

cl-csv allows to read and write a CSV file, string or stream.

data-table allows to work with a "table-like" data-structure where
you can access columns by name (instead of only by index), coerce
column types, and more.

cl-csv-data-tables brings a couple helpers.

WARN: below, csv is a nickname to ciel-csv which exports symbols from both cl-csv and data-table.

Here are the first cl-csv examples:

Read a file into a list of lists:

(csv:read-csv #P"file.csv")   ;; <--- note the #P
=> (("1" "2" "3") ("4" "5" "6"))

;; read csv from a string (streams are also supported)
(csv:read-csv "1,2,3
4,5,6")
=> (("1" "2" "3") ("4" "5" "6"))

Read a file that's TAB delimited:

(csv:read-csv #P"file.tab" :separator #\Tab)

loop over a CSV for side effects with do-csv:

(let ((sum 0))
  (csv:do-csv (row #P"file.csv")
    (incf sum (parse-integer (nth 0 row))))
  sum)

We can use map-fn to do something at each row.

Below we read a file and return a list of objects created from each row. We could create instances of our own objects like this.

(cl-csv:read-csv #P"file.csv"
                 :map-fn #'(lambda (row)
                             (make-instance 'object
                                            :foo (nth 0 row)
                                            :baz (nth 2 row))))

note that we have to access each column by index.

Use a table, guess column types

Read a CSV, create a data-table object, assume headers are on the
first row (:has-column-names), guess the column types (:munge-types):

(csv:get-data-table-from-csv #p"file.csv")   ;; <--- still the #P
;; #<DATA-TABLE:DATA-TABLE {10018A9F63}>

(describe *)
;; =>
  COLUMN-NAMES                   = ("Date" "Type" "Quantity" "Total")
  COLUMN-TYPES                   = (STRING STRING INTEGER DOUBLE-FLOAT)
  ROWS                           = (("9 jan. 1975" "Sell" 1 9.90) …)

This function is roughly as if you read the CSV file with cl-csv:read-csv, checked that the first row contains column names, created the data-table object with (make-instance 'data-table:data-table :column-names (first rows) :rows (rest rows), and coerced the columns' types with (data-table:coerce-data-table-of-strings-to-types dt) and (data-table::ensure-column-data-types dt) (unexported function).

Access rows and columns

do that with:

csv:rows
csv:data-table-value dt &key row row-idx col-name col-idx
and more

Write to file (or streams)

Write the data-table to a file with data-table-to-csv dt &optional stream).

Real-world example

That's my CIEL script that I did to help friends of a non-profit.

;;;;
;;;; Run this script on many CSV files:
;;;;
;;;; $ ciel sumuputils.lisp Rapports*
;;;;
;;;;

(in-package :ciel-user)

;; Needs the latest CIEL, or this library.
;; (ql:quickload "data-table"  :silent t)

;;; Download the CSV file from Sum Up
(defvar *file*  #p"/path/to/Rapport-ventes-2024-11-01_2024-11-30.csv"
  "Only for testing.")

(defvar *dt* nil "devel only")

(defun parse-csv (file)
  "Parse CSV, return a data-table object with column names and rows.

  file: a pathname (not just a string)."
  ;; This takes headers as the first row
  ;; and guesses the columns' types (string, int, float).
  (csv:get-data-table-from-csv (pathname file)))

(defun get-all-days (dt)
  (remove-duplicates
   (loop for row in (data-table:rows dt)
         for date/time = (data-table:data-table-value dt :row row :col-name "Date")
         for day = (str:unwords
                    (split-sequence #\Space date/time :count 3))
         collect day)
   :test #'equal))

;; (get-all-days (parse-csv *FILE*))
;; ("1 nov. 2024" "2 nov. 2024" "5 nov. 2024" "7 nov. 2024" "8 nov. 2024" …)

(defun get-all-conso-types (dt)
  (sort
   (remove-duplicates
    (loop for row in (data-table:rows dt)
          for description = (data-table:data-table-value dt :row row :col-name "Description")
          when (str:non-blank-string-p description)
            collect description)
    :test #'equal)
   #'string<))

(defun get-all-offert-types (types)
  (filter (^ (it) (str:containsp "offert" it)) types))
#++
(get-all-offert-types (get-all-conso-types (parse-csv *FILE*)))

(defun rows-offerts-for-day (day dt)
  "day: string, like '23 nov"
  (loop for row in (data-table:rows dt)
        when (and (str:containsp day (data-table:data-table-value dt :row row :col-name "Date"))
                  (str:containsp "offert" (data-table:data-table-value dt :row row :col-name "Description")))
          collect row))

(defun sum-quantities-offerts-for-day (day dt)
  ;; optionnel: pour chq jour, combien de tartines, alcool, soft… d'offerts? (en nombre)
  (loop for row in (rows-offerts-for-day day dt)
        for qty = (data-table:data-table-value dt :row row :col-name "Quantité")
        when qty
          sum qty))

;; (sum-quantities-offerts-for-day "23 nov" *dt*)
;; 15

(defun sum-quantities-offerts-for-day/by-type (day dt desc)
  ;; optionnel: pour chq jour, combien de tartines, alcool, soft… d'offerts? (en nombre)
  (loop for row in (rows-offerts-for-day day dt)
        for qty = (data-table:data-table-value dt :row row :col-name "Quantité")
        when (and qty
                  (str:containsp desc (data-table:data-table-value dt :row row :col-name "Description")))
          sum qty))

(defun report-sum-quantities-offerts-for-day/by-type (file &key (stream t) &aux dt)
  (setf dt (parse-csv file))
  (loop for day in (get-all-days dt)
        do
           (format stream "~&~%Nombre de consos offertes le ~a~&~%" day)
           (loop for desc in (get-all-offert-types (get-all-conso-types dt))
                 do
                    (format stream "~a: ~a~&" desc (sum-quantities-offerts-for-day/by-type day dt desc)))))


(defun sum-total-offerts-for-day (day dt)
  ;; le + important
  (loop for row in (rows-offerts-for-day day dt)
        for qty = (data-table:data-table-value dt :row row :col-name "Prix (TTC)")
        when qty
          sum qty))

;; (sum-total-offerts-for-day "23 nov" *dt*)
;; 49.0d0

(defun report-totals-offert-for-days (file &key (stream t))
  (let ((dt (parse-csv file)))
    (format stream "~&~%Total TTC des offerts~&~%")
    (format stream "~&(somme des colonnes Prix (TTC) pour toutes les lignes du jour comportant 'conso offerte')~&")
    (loop for day in (get-all-days dt)
         do (format stream "~20a: ~4f~&" day (sum-total-offerts-for-day day dt)))))

(defun report-number-offert-for-days (file &key (stream t))
  (let ((dt (parse-csv file)))
    (format stream "~&Nombre des consos offertes par jour~&")
    (format stream "~&(somme des colonnes 'Quantité' pour toutes les lignes du jour comportant 'conso offerte')~&")
    (loop for day in (get-all-days dt)
         do (format stream "~20a: ~4a~&" day (sum-quantities-offerts-for-day day dt)))))

#+ciel

#+ciel
(let ((files (mapcar #'pathname (rest ciel-user:*script-args*))))
  (mapcar #'report-totals-offert-for-days files)
  (mapcar #'report-sum-quantities-offerts-for-day/by-type files))

Call it from the command line:

$ ciel sumuputils.lisp Rapport*csv > rapport.txt

Show data in table

This is our CSV, by the way:

Date,Type,Réf. transaction,Moyen de paiement,Quantité,Description,Devise,Prix avant réduction,Réduction,Prix (TTC),Prix (HT),TVA,Taux de TVA,Compte
1 nov. 2024 22:09,Vente,T3YKZYZANT,Mastercard - Crédit,1,Montant personnalisé,EUR,3.00,0.00,3.00,3.00,0.00,,[email protected]
…

Can we display it nicely in a table? We'll use our resurrected cl-ansi-term library.

We'll use the "raw" CSV data, where the headers are on the first line, instead of the data-value object.

That way we just have to pass it to term:table *csv*, and we choose the columns to display:

(setf *rawcsv* (csv:read-csv #p"test.csv"))

(term:table *RAWCSV* :keys '("Date" "Quantité" "Prix (HT)"))

Result:

+-----------------+--------+---------+
|Date             |Quantité|Prix (HT)|
+-----------------+--------+---------+
|1 nov. 2024 22:09|1       |3.00     |
+-----------------+--------+---------+
|2 nov. 2024 19:10|1       |6.00     |
+-----------------+--------+---------+
|2 nov. 2024 20:25|1       |6.00     |
+-----------------+--------+---------+
|2 nov. 2024 20:44|1       |10.00    |
+-----------------+--------+---------+
|2 nov. 2024 21:05|1       |6.00     |
+-----------------+--------+---------+
|2 nov. 2024 21:13|1       |3.00     |
+-----------------+--------+---------+
|2 nov. 2024 21:45|1       |7.00     |
+-----------------+--------+---------+
|2 nov. 2024 21:49|1       |6.00     |
+-----------------+--------+---------+

Lisp-stat's data-frame

lisp-stat's data-frame is the smart way.

(ql:quickload :lisp-stat)
;; …

(in-package :ls-user)

;; Give a name to our dataframe,
;; read CSV.
(defdf sumup (read-csv #p"sells.csv"))
;; similar to (defparameter *data-frame* …) but with future goodies.

;; The goodies are that each column is created as a variable for our "sumup" dataframe.
;; Here we get all the "date" columns:
LS-USER> sumup:date
;; #("1 nov. 2024 19:36" "1 nov. 2024 19:36" …)

;; We didn't need to LOOP.

;; We can also use 
(select sumup t 'date)

;; We have rows an column manipulation functions.
;; Let's get the data of only 2 columns:
(columns sumup '(date quantité))

;; Let's inspect our data:
(head sumup)

;;   DATE              TYPE  RÉF--TRANSACTION MOYEN-DE-PAIEMENT  QUANTITÉ DESCRIPTION          DEVISE PRIX-AVANT-RÉDUCTION RÉDUCTION PRIX-(TTC) PRIX-(HT) TVA TAUX-DE-TVA COMPTE                     
;; 0 1 nov. 2024 19:36 Vente TE4PL6DA9X       Mastercard - Débit        1 Montant personnalisé EUR                     9.0       0.0        9.0       9.0 0.0          NA [email protected]

;; the data is printed as a nice table, respecting the columns width out of the box.

;; print-data would print all the data.

;; DESCRIBE output is pimped:
(describe sumup)

SUMUP
  A data-frame with 863 observations of 14 variables

Variable             | Type | Unit | Label      
--------             | ---- | ---- | -----------
DATE                 | NIL  | NIL  | NIL        
TYPE                 | NIL  | NIL  | NIL        
RÉF--TRANSACTION     | NIL  | NIL  | NIL        
MOYEN-DE-PAIEMENT    | NIL  | NIL  | NIL        
QUANTITÉ             | NIL  | NIL  | NIL        
…

;; We can guess the column types:
(heuristicate-types sumup)

;; Our types are not NIL now:
(describe sumup)
SUMUP
  A data-frame with 863 observations of 14 variables

Variable             | Type         | Unit | Label      
--------             | ----         | ---- | -----------
DATE                 | STRING       | NIL  | NIL        
TYPE                 | STRING       | NIL  | NIL        
RÉF--TRANSACTION     | STRING       | NIL  | NIL        
MOYEN-DE-PAIEMENT    | STRING       | NIL  | NIL        
QUANTITÉ             | INTEGER      | NIL  | NIL        
PRIX-AVANT-RÉDUCTION | DOUBLE-FLOAT | NIL  | NIL        
…