I just helped friends for some CSV manipulation and did it for the first time from CL, here's a quick usage overview.
If you know a better© way please comment.
EDIT: lisp-stat's data-frame is pretty awesome.
I added the data-table utility in CIEL: http://ciel-lang.org/#/libraries?id=csv
cl-csv and a table-like data structure
We'll use cl-csv, data-table and cl-csv-data-tables
(a system defined in cl-csv).
cl-csv
allows to read and write a CSV file, string or stream.
data-table
allows to work with a "table-like" data-structure where
you can access columns by name (instead of only by index), coerce
column types, and more.
cl-csv-data-tables
brings a couple helpers.
WARN: below, csv
is a nickname to ciel-csv
which exports symbols from both cl-csv and data-table.
Here are the first cl-csv examples:
Read a file into a list of lists:
(csv:read-csv #P"file.csv") ;; <--- note the #P
=> (("1" "2" "3") ("4" "5" "6"))
;; read csv from a string (streams are also supported)
(csv:read-csv "1,2,3
4,5,6")
=> (("1" "2" "3") ("4" "5" "6"))
Read a file that's TAB delimited:
(csv:read-csv #P"file.tab" :separator #\Tab)
loop over a CSV for side effects with do-csv
:
(let ((sum 0))
(csv:do-csv (row #P"file.csv")
(incf sum (parse-integer (nth 0 row))))
sum)
We can use map-fn
to do something at each row.
Below we read a file and return a list of objects created from each row. We could create instances of our own objects like this.
(cl-csv:read-csv #P"file.csv"
:map-fn #'(lambda (row)
(make-instance 'object
:foo (nth 0 row)
:baz (nth 2 row))))
note that we have to access each column by index.
Use a table, guess column types
Read a CSV, create a data-table object, assume headers are on the
first row (:has-column-names
), guess the column types (:munge-types
):
(csv:get-data-table-from-csv #p"file.csv") ;; <--- still the #P
;; #<DATA-TABLE:DATA-TABLE {10018A9F63}>
(describe *)
;; =>
COLUMN-NAMES = ("Date" "Type" "Quantity" "Total")
COLUMN-TYPES = (STRING STRING INTEGER DOUBLE-FLOAT)
ROWS = (("9 jan. 1975" "Sell" 1 9.90) …)
This function is roughly as if you read the CSV file with cl-csv:read-csv
, checked that the first row contains column names, created the data-table object with (make-instance 'data-table:data-table :column-names (first rows) :rows (rest rows)
, and coerced the columns' types with (data-table:coerce-data-table-of-strings-to-types dt)
and (data-table::ensure-column-data-types dt)
(unexported function).
Access rows and columns
do that with:
csv:rows
csv:data-table-value dt &key row row-idx col-name col-idx
- and more
Write to file (or streams)
Write the data-table to a file with data-table-to-csv dt &optional stream)
.
Real-world example
That's my CIEL script that I did to help friends of a non-profit.
;;;;
;;;; Run this script on many CSV files:
;;;;
;;;; $ ciel sumuputils.lisp Rapports*
;;;;
;;;;
(in-package :ciel-user)
;; Needs the latest CIEL, or this library.
;; (ql:quickload "data-table" :silent t)
;;; Download the CSV file from Sum Up
(defvar *file* #p"/path/to/Rapport-ventes-2024-11-01_2024-11-30.csv"
"Only for testing.")
(defvar *dt* nil "devel only")
(defun parse-csv (file)
"Parse CSV, return a data-table object with column names and rows.
file: a pathname (not just a string)."
;; This takes headers as the first row
;; and guesses the columns' types (string, int, float).
(csv:get-data-table-from-csv (pathname file)))
(defun get-all-days (dt)
(remove-duplicates
(loop for row in (data-table:rows dt)
for date/time = (data-table:data-table-value dt :row row :col-name "Date")
for day = (str:unwords
(split-sequence #\Space date/time :count 3))
collect day)
:test #'equal))
;; (get-all-days (parse-csv *FILE*))
;; ("1 nov. 2024" "2 nov. 2024" "5 nov. 2024" "7 nov. 2024" "8 nov. 2024" …)
(defun get-all-conso-types (dt)
(sort
(remove-duplicates
(loop for row in (data-table:rows dt)
for description = (data-table:data-table-value dt :row row :col-name "Description")
when (str:non-blank-string-p description)
collect description)
:test #'equal)
#'string<))
(defun get-all-offert-types (types)
(filter (^ (it) (str:containsp "offert" it)) types))
#++
(get-all-offert-types (get-all-conso-types (parse-csv *FILE*)))
(defun rows-offerts-for-day (day dt)
"day: string, like '23 nov"
(loop for row in (data-table:rows dt)
when (and (str:containsp day (data-table:data-table-value dt :row row :col-name "Date"))
(str:containsp "offert" (data-table:data-table-value dt :row row :col-name "Description")))
collect row))
(defun sum-quantities-offerts-for-day (day dt)
;; optionnel: pour chq jour, combien de tartines, alcool, soft… d'offerts? (en nombre)
(loop for row in (rows-offerts-for-day day dt)
for qty = (data-table:data-table-value dt :row row :col-name "Quantité")
when qty
sum qty))
;; (sum-quantities-offerts-for-day "23 nov" *dt*)
;; 15
(defun sum-quantities-offerts-for-day/by-type (day dt desc)
;; optionnel: pour chq jour, combien de tartines, alcool, soft… d'offerts? (en nombre)
(loop for row in (rows-offerts-for-day day dt)
for qty = (data-table:data-table-value dt :row row :col-name "Quantité")
when (and qty
(str:containsp desc (data-table:data-table-value dt :row row :col-name "Description")))
sum qty))
(defun report-sum-quantities-offerts-for-day/by-type (file &key (stream t) &aux dt)
(setf dt (parse-csv file))
(loop for day in (get-all-days dt)
do
(format stream "~&~%Nombre de consos offertes le ~a~&~%" day)
(loop for desc in (get-all-offert-types (get-all-conso-types dt))
do
(format stream "~a: ~a~&" desc (sum-quantities-offerts-for-day/by-type day dt desc)))))
(defun sum-total-offerts-for-day (day dt)
;; le + important
(loop for row in (rows-offerts-for-day day dt)
for qty = (data-table:data-table-value dt :row row :col-name "Prix (TTC)")
when qty
sum qty))
;; (sum-total-offerts-for-day "23 nov" *dt*)
;; 49.0d0
(defun report-totals-offert-for-days (file &key (stream t))
(let ((dt (parse-csv file)))
(format stream "~&~%Total TTC des offerts~&~%")
(format stream "~&(somme des colonnes Prix (TTC) pour toutes les lignes du jour comportant 'conso offerte')~&")
(loop for day in (get-all-days dt)
do (format stream "~20a: ~4f~&" day (sum-total-offerts-for-day day dt)))))
(defun report-number-offert-for-days (file &key (stream t))
(let ((dt (parse-csv file)))
(format stream "~&Nombre des consos offertes par jour~&")
(format stream "~&(somme des colonnes 'Quantité' pour toutes les lignes du jour comportant 'conso offerte')~&")
(loop for day in (get-all-days dt)
do (format stream "~20a: ~4a~&" day (sum-quantities-offerts-for-day day dt)))))
#+ciel
#+ciel
(let ((files (mapcar #'pathname (rest ciel-user:*script-args*))))
(mapcar #'report-totals-offert-for-days files)
(mapcar #'report-sum-quantities-offerts-for-day/by-type files))
Call it from the command line:
$ ciel sumuputils.lisp Rapport*csv > rapport.txt
Show data in table
This is our CSV, by the way:
Date,Type,Réf. transaction,Moyen de paiement,Quantité,Description,Devise,Prix avant réduction,Réduction,Prix (TTC),Prix (HT),TVA,Taux de TVA,Compte
1 nov. 2024 22:09,Vente,T3YKZYZANT,Mastercard - Crédit,1,Montant personnalisé,EUR,3.00,0.00,3.00,3.00,0.00,,[email protected]
…
Can we display it nicely in a table? We'll use our resurrected cl-ansi-term library.
We'll use the "raw" CSV data, where the headers are on the first line, instead of the data-value object.
That way we just have to pass it to term:table *csv*
, and we choose the columns to display:
(setf *rawcsv* (csv:read-csv #p"test.csv"))
(term:table *RAWCSV* :keys '("Date" "Quantité" "Prix (HT)"))
Result:
+-----------------+--------+---------+
|Date |Quantité|Prix (HT)|
+-----------------+--------+---------+
|1 nov. 2024 22:09|1 |3.00 |
+-----------------+--------+---------+
|2 nov. 2024 19:10|1 |6.00 |
+-----------------+--------+---------+
|2 nov. 2024 20:25|1 |6.00 |
+-----------------+--------+---------+
|2 nov. 2024 20:44|1 |10.00 |
+-----------------+--------+---------+
|2 nov. 2024 21:05|1 |6.00 |
+-----------------+--------+---------+
|2 nov. 2024 21:13|1 |3.00 |
+-----------------+--------+---------+
|2 nov. 2024 21:45|1 |7.00 |
+-----------------+--------+---------+
|2 nov. 2024 21:49|1 |6.00 |
+-----------------+--------+---------+
Lisp-stat's data-frame
lisp-stat's data-frame is the smart way.
(ql:quickload :lisp-stat)
;; …
(in-package :ls-user)
;; Give a name to our dataframe,
;; read CSV.
(defdf sumup (read-csv #p"sells.csv"))
;; similar to (defparameter *data-frame* …) but with future goodies.
;; The goodies are that each column is created as a variable for our "sumup" dataframe.
;; Here we get all the "date" columns:
LS-USER> sumup:date
;; #("1 nov. 2024 19:36" "1 nov. 2024 19:36" …)
;; We didn't need to LOOP.
;; We can also use
(select sumup t 'date)
;; We have rows an column manipulation functions.
;; Let's get the data of only 2 columns:
(columns sumup '(date quantité))
;; Let's inspect our data:
(head sumup)
;; DATE TYPE RÉF--TRANSACTION MOYEN-DE-PAIEMENT QUANTITÉ DESCRIPTION DEVISE PRIX-AVANT-RÉDUCTION RÉDUCTION PRIX-(TTC) PRIX-(HT) TVA TAUX-DE-TVA COMPTE
;; 0 1 nov. 2024 19:36 Vente TE4PL6DA9X Mastercard - Débit 1 Montant personnalisé EUR 9.0 0.0 9.0 9.0 0.0 NA [email protected]
;; the data is printed as a nice table, respecting the columns width out of the box.
;; print-data would print all the data.
;; DESCRIBE output is pimped:
(describe sumup)
SUMUP
A data-frame with 863 observations of 14 variables
Variable | Type | Unit | Label
-------- | ---- | ---- | -----------
DATE | NIL | NIL | NIL
TYPE | NIL | NIL | NIL
RÉF--TRANSACTION | NIL | NIL | NIL
MOYEN-DE-PAIEMENT | NIL | NIL | NIL
QUANTITÉ | NIL | NIL | NIL
…
;; We can guess the column types:
(heuristicate-types sumup)
;; Our types are not NIL now:
(describe sumup)
SUMUP
A data-frame with 863 observations of 14 variables
Variable | Type | Unit | Label
-------- | ---- | ---- | -----------
DATE | STRING | NIL | NIL
TYPE | STRING | NIL | NIL
RÉF--TRANSACTION | STRING | NIL | NIL
MOYEN-DE-PAIEMENT | STRING | NIL | NIL
QUANTITÉ | INTEGER | NIL | NIL
PRIX-AVANT-RÉDUCTION | DOUBLE-FLOAT | NIL | NIL
…
And we have rows and colums manipulation functions at our disposal. Much more, without loop
ing.
See also
discussion: https://www.reddit.com/r/Common_Lisp/comments/1kht5ht/read_csv_files_in_common_lisp_clcsv_datatable/
If all you want is fast CSV parsing, redditors mentioned that DuckDB through cl-duckdb is pretty good.
-
lisp-stat's dataframe can read CSV. It's the smart way.
- easy overview
- direct access to row and columns manipulation, with no need of loop-ing.
- auto-text, automatic detection for text files (encoding, end of line, column width, csv delimiter etc). inquisitor for detection of asian and far eastern languages.
- CLAWK, an AWK implementation embedded into Common Lisp, to parse files line-by-line.
Lisp?!
Top comments (0)