1

I have a csv from a solar inverter and I need to input the data into a SQL database. The problem I have is that two inverters are in the same csv, so I need to split the csv into two or "do something" with the data from the first, then the second.

This is a sample:

#SmartLogger ESN:102469042181
#INV1 ESN:ES22B0048634
#Time;Upv1;Upv2;Upv3;Upv4;Upv5;Upv6;Upv7;Upv8;Ipv1;Ipv2;Ipv3;Ipv4;Ipv5;Ipv6;Ipv7;Ipv8;Uac1;Uac2;Uac3;Iac1;Iac2;Iac3;Status;Error;Temp;cos;fac;Pac;Qac;Eac;E-Day;E-Total;Cycle Time
08-12-2024 15:30:00;504.3;504.3;502.8;502.8;620.3;620.3;493.0;493.0;0.11;-0.04;0.13;-0.05;0.06;0.00;0.09;0.00;228.7;229.7;228.0;0.640;0.607;0.637;512;0;19.4;0.975;50.00;0.030;0.007;0.01;6.42;162.22;5;
08-12-2024 15:25:00;506.5;506.5;500.2;500.2;631.5;631.5;460.9;460.9;0.10;-0.04;0.12;-0.06;0.04;0.00;0.09;0.00;228.7;229.7;228.0;0.552;0.541;0.563;512;0;19.6;0.994;49.99;0.026;0.003;0.00;6.41;162.21;5;
#INV2 ESN:ES22B0048591
#Time;Upv1;Upv2;Upv3;Upv4;Upv5;Upv6;Upv7;Upv8;Ipv1;Ipv2;Ipv3;Ipv4;Ipv5;Ipv6;Ipv7;Ipv8;Uac1;Uac2;Uac3;Iac1;Iac2;Iac3;Status;Error;Temp;cos;fac;Pac;Qac;Eac;E-Day;E-Total;Cycle Time
08-12-2024 15:30:00;480.3;480.3;492.7;492.7;377.1;377.1;386.9;386.9;-0.07;0.13;0.02;0.05;-0.01;0.07;0.02;0.00;229.6;231.3;231.7;0.510;0.469;0.523;512;0;19.5;0.999;50.00;0.045;-0.002;0.01;6.65;164.65;5;
08-12-2024 15:25:00;478.8;478.8;484.7;484.7;385.1;385.1;410.9;410.9;-0.07;0.12;0.02;0.04;-0.02;0.06;0.00;0.00;229.6;232.3;231.7;0.486;0.451;0.522;512;0;19.6;0.993;49.99;0.036;0.004;0.00;6.64;164.64;5;

so I need to do soemthing with those lines:

08-12-2024 15:30:00;504.3;504.3;502.8;502.8;620.3;620.3;493.0;493.0;0.11;-0.04;0.13;-0.05;0.06;0.00;0.09;0.00;228.7;229.7;228.0;0.640;0.607;0.637;512;0;19.4;0.975;50.00;0.030;0.007;0.01;6.42;162.22;5;
08-12-2024 15:25:00;506.5;506.5;500.2;500.2;631.5;631.5;460.9;460.9;0.10;-0.04;0.12;-0.06;0.04;0.00;0.09;0.00;228.7;229.7;228.0;0.552;0.541;0.563;512;0;19.6;0.994;49.99;0.026;0.003;0.00;6.41;162.21;5;

and then I need to do something with those lines:

08-12-2024 15:30:00;480.3;480.3;492.7;492.7;377.1;377.1;386.9;386.9;-0.07;0.13;0.02;0.05;-0.01;0.07;0.02;0.00;229.6;231.3;231.7;0.510;0.469;0.523;512;0;19.5;0.999;50.00;0.045;-0.002;0.01;6.65;164.65;5;
08-12-2024 15:25:00;478.8;478.8;484.7;484.7;385.1;385.1;410.9;410.9;-0.07;0.12;0.02;0.04;-0.02;0.06;0.00;0.00;229.6;232.3;231.7;0.486;0.451;0.522;512;0;19.6;0.993;49.99;0.036;0.004;0.00;6.64;164.64;5;

nay idea how to distinguish between the both headers?

Header one: #INV1 ESN:ES22B0048634 Header two: #INV2 ESN:ES22B0048591

those should be ignored:

#Time;Upv1;Upv2;Upv3;Upv4;Upv5;Upv6;Upv7;Upv8;Ipv1;Ipv2;Ipv3;Ipv4;Ipv5;Ipv6;Ipv7;Ipv8;Uac1;Uac2;Uac3;Iac1;Iac2;Iac3;Status;Error;Temp;cos;fac;Pac;Qac;Eac;E-Day;E-Total;Cycle Time
1
  • Are files large, if not just use spread sheet to add first column, as inv1 or inv2. And delete the two lines. Then could be in one data file, if desired. Commented Jul 24 at 13:34

4 Answers 4

6

You could use awk to process the data and save the different streams to different files.

Assuming the data is in a file called data:

awk '
  $1 ~ /^#INV/
     { gsub(/^#/,"", $1) ; out = $1".out" ; next }
  $1 !~ /^#/
     { print $0 >> out }
  ' data

This will result in files for each inverter being created in the current (if you'll forgive the pun) directory.

NB -- if you were to run this multiple times for the same data, you will need to remove/truncate the output files each time.

Works as follows...

Filter for lines which define an inverter section

$1 ~ /^#INV/

Remove leading #, define the output file, stop processing this line.

{ gsub(/^#/,"", $1) ; out = $1".out" ; next }

Filter for lines that do not start with #

$1 !~ /^#/

Append the current line to the active output file

{ print $0 >> out }
2
  • Newlines matter in an awk script, /foo/ { bar } is not the same as /foo/<newline>{bar}. The former means "If foo then do bar" while the latter means "If foo then print the current line. Always do bar". So your script wouldn't do what you want given that. I think you're also misunderstanding what >> means in awk and assuming it means the same as it does in shell but it doesn't. Just change >> to > in your script and then you won't need to remove/truncate the output file before each run. Commented Aug 11 at 17:53
  • You could also get rid of most of the references to $1 and $0 and just do awk '/^#INV/ { sub(/^#/,"") ; out = $1".out" ; next } !/^#/ { print > out }' data but you should close() the output files as you finish writing to them to avoid a "too many open files" error from most awks given a large enough input file (or a dramatic slowdown from GNU awk). Commented Aug 11 at 17:58
6

If you're happy to split the file into parts you could use csplit. The command line here splits the file data.csv into as many parts as possible, each part starting either with a line matching the RE ^#INV or the text preceding the first match. (See man csplit for more details.)

csplit data.csv -ks '/^#INV/' '{*}'

For your described data set you'll get three files:

  • xx00 is the initial header
  • xx01 contains the first set of readings
  • xx02 contains the second set of readings

You can skip the CSV subsection headers if you need to, by first piping the data through grep -v '^#Time' or some such similar tool before passing it to csplit

1

Perl can read and process your input file (as can awk and several other tools) but it can also interact directly with an SQL database using the DBI module.

You could use perl to do something like this:

#!/usr/bin/perl

use strict;
use DBI;    # perl's SQL database module

# if the first argument is -n or --dry-run, then this is a dry-run:
my $dry_run = 0;
if ($ARGV[0] eq '-n' or $ARGV[0] eq '--dry-run') {
  $dry_run = 1;
  shift;
};

# Connect to the database. DBI can talk to sqlite, Mysql,
# Postgresql, Oracle, Microsoft SQL Server, and more.
#
# You didn't mention what database you'e using, OR info about the
# table schema, so I'm going to make the following assumptions:
#
# 1. sqlite, using a database file called solar.db
#
# 2. a single table called data with a field for the inverter
# name followed by fields for each of the ;-delimited values.

my $dbfile = './solar.db';
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","");

# Assume that there are 35 fields data fields, including one for
# the inverter name. Construct a string with 34 ? characters
# separated by commas plus one final ? without a comma.
# By using `?` placeholders in the SQL command string, we don't
# need to worry about field data types or quoting, DBI handles all
# that auto-magically.
my $placeholders = '?,' x 34 . '?';

# Use that with the sql insert command.
my $sql = "INSERT INTO data VALUES($placeholders)";

# Prepare the statement handle ($sth) with this $sql string
# for later use in the main loop.
my $sth = $dbh->prepare($sql);

my $inverter; # variable to hold the name of the current inverter.

while(<<>>) {
  chomp; # strip end-of-line character(s), e.g. \n, \r\n, or \n\r

  # This assumes you only want the inverter id AFTER the colon.
  # adjust to suit if you need the whole line or some other
  # part of it.
  if (s/^#INV\d ESN://i) {
    $inverter = $_;
    next;
  };

  # skip all other input lines unless they start with a date. 
  next unless (m/^\d\d-\d\d-\d\d\d\d/);

  my @data = split /;/;   # split the input line on semi-colons

  # NOTE: the first field is a date, but it's in a terrible
  # format, MM-DD-YYYY or DD-MM-YYYY.  Most SQL databases prefer
  # YYYY-MM-DD format for date fields because it's the ISO
  # standard and it's unambiguous.  You may need to transform
  # the first field, `$data[0]`, to YYYY-MM-DD by un-commenting
  # one of the next two lines:
  #
  #$data[0] =~ s/^(\d\d)-(\d\d)-(\d\d\d\d)/$3-$2-$1/; # if DD-MM-YYYY
  #$data[0] =~ s/^(\d\d)-(\d\d)-(\d\d\d\d)/$3-$1-$2/; # if MM-DD-YYYY
  #
  # I just used a TEXT field for the date in my dummy database so
  # didn't need to.

  # Insert any other code here that you might need to modify
  # or transform the various fields in the @data array.

  if ($dry_run) {
    # DEBUG/dry-run, only print what it would do
    print 'INSERT INTO data VALUES(' . join(",", $inverter, @data) . ");\n";
  } else {
    # actually insert the data into the database.
    $sth->execute($inverter, @data);
  };
}

This script looks long, but more than two-thirds of the 90 lines are comments and blank lines. According to sloccount, there's only 27 lines of actual perl code in there...most of which is "boilerplate".

I saved this as import-into-sqlite.pl, made it executable with chmod +x import-into-sqlite.pl, saved your sample data into a file called input.log, then ran the script like so:

$ ./import-into-sqlite.pl input.log

That ended up with the following data in my solar.db

$ sqlite3 solar.db 
SQLite version 3.46.1 2024-08-13 09:16:08
Enter ".help" for usage hints.
sqlite> select * from data;
ES22B0048634|08-12-2024 15:30:00|504.3|504.3|502.8|502.8|620.3|620.3|493.0|493.0|0.11|-0.04|0.13|-0.05|0.06|0.0|0.09|0.0|228.7|229.7|228.0|0.64|0.607|0.637|512.0|0.0|19.4|0.975|50.0|0.03|0.007|0.01|6.42|162.22|5.0
ES22B0048634|08-12-2024 15:25:00|506.5|506.5|500.2|500.2|631.5|631.5|460.9|460.9|0.1|-0.04|0.12|-0.06|0.04|0.0|0.09|0.0|228.7|229.7|228.0|0.552|0.541|0.563|512.0|0.0|19.6|0.994|49.99|0.026|0.003|0.0|6.41|162.21|5.0
ES22B0048591|08-12-2024 15:30:00|480.3|480.3|492.7|492.7|377.1|377.1|386.9|386.9|-0.07|0.13|0.02|0.05|-0.01|0.07|0.02|0.0|229.6|231.3|231.7|0.51|0.469|0.523|512.0|0.0|19.5|0.999|50.0|0.045|-0.002|0.01|6.65|164.65|5.0
ES22B0048591|08-12-2024 15:25:00|478.8|478.8|484.7|484.7|385.1|385.1|410.9|410.9|-0.07|0.12|0.02|0.04|-0.02|0.06|0.0|0.0|229.6|232.3|231.7|0.486|0.451|0.522|512.0|0.0|19.6|0.993|49.99|0.036|0.004|0.0|6.64|164.64|5.0
sqlite> 

For this, I created the following table in my dummy 'solar.db' database:

CREATE TABLE data (
    Inverter    TEXT,
    Time    TEXT,
    Upv1    REAL,
    Upv2    REAL,
    Upv3    REAL,
    Upv4    REAL,
    Upv5    REAL,
    Upv6    REAL,
    Upv7    REAL,
    Upv8    REAL,
    Ipv1    REAL,
    Ipv2    REAL,
    Ipv3    REAL,
    Ipv4    REAL,
    Ipv5    REAL,
    Ipv6    REAL,
    Ipv7    REAL,
    Ipv8    REAL,
    Uac1    REAL,
    Uac2    REAL,
    Uac3    REAL,
    Iac1    REAL,
    Iac2    REAL,
    Iac3    REAL,
    Status  REAL,
    Error   REAL,
    Temp    REAL,
    cos REAL,
    fac REAL,
    Pac REAL,
    Qac REAL,
    Eac REAL,
    E_Day   REAL,
    E_Total REAL,
    Cycle_Time  REAL
);

I wasn't too concerned about getting the field data types exactly right, I just wanted something I could insert data into without error. The Time field should probably be some kind of date / datetime type in most SQL databases, and the Cycle_Time field should maybe be INTEGER rather than REAL.


NOTE: DBI is NOT installed as standard with perl, so you'll need to install it along with the DBD driver for the SQL database you're using. The process of installing these modules depends on what kind of unix you're using, or which linux distribution. Most Linux distros have packages for DBI and the various DBD modules. Otherwise, you can install them directly from CPAN.

For example, to install DBI and the DBD::SQLite3 modules on Debian:

sudo apt-get install libdbi-perl libdbd-sqlite3-perl 

Debian also has DBD module packages for several other databases, and even for working with CSV and Excel files as if they were SQL databases. Most linux distributions will have a similar set of packages:

$ apt-cache search -n libdbd perl
libdbd-cassandra-perl - Perl DBI database backend for Cassandra
libdbd-csv-perl - DBI driver for CSV files
libdbd-excel-perl - SQL interface (via DBI) for accessing Excel files
libdbd-firebird-perl - Perl DBI driver for Firebird RDBMS server
libdbd-ldap-perl - Perl extension for LDAP access via an SQL/Perl DBI interface
libdbd-mariadb-perl - Perl5 database interface to the MariaDB/MySQL databases
libdbd-mock-perl - Mock database driver for testing
libdbd-multi-perl - failover and load balancing of DBI handles
libdbd-mysql-perl - Perl5 database interface to the MariaDB/MySQL database
libdbd-odbc-perl - Perl Database Driver implementing ODBC for DBI
libdbd-pg-perl - Perl DBI driver for the PostgreSQL database server
libdbd-sqlite3-perl - Perl DBI driver with a self-contained RDBMS
libdbd-sybase-perl - Sybase/MS SQL database driver for the DBI module
libdbd-xbase-perl - Perl module to access xbase files (optionally through DBI)
libdbd-oracle-perl - Perl DBI driver for Oracle
1
  • BTW, if you needed to insert the data into one of two different databases / tables depending on the inverter name, it would be fairly easy to modify this script to do that, either by opening two different database handles (e.g. $dbh1 and $dbh2) and/or creating two statement handles (e.g. $sth1 and $sth2) and using an if statement to check the value of $inverter in the main loop (in the else clause of the dry-run check would probably be the best place for that) . Commented Jul 24 at 3:36
1

Using any awk:

$ awk '/^#Time/{close(out); out="out"(++c)} !/^#/{print > out}' file

$ head out?
==> out1 <==
08-12-2024 15:30:00;504.3;504.3;502.8;502.8;620.3;620.3;493.0;493.0;0.11;-0.04;0.13;-0.05;0.06;0.00;0.09;0.00;228.7;229.7;228.0;0.640;0.607;0.637;512;0;19.4;0.975;50.00;0.030;0.007;0.01;6.42;162.22;5;
08-12-2024 15:25:00;506.5;506.5;500.2;500.2;631.5;631.5;460.9;460.9;0.10;-0.04;0.12;-0.06;0.04;0.00;0.09;0.00;228.7;229.7;228.0;0.552;0.541;0.563;512;0;19.6;0.994;49.99;0.026;0.003;0.00;6.41;162.21;5;

==> out2 <==
08-12-2024 15:30:00;480.3;480.3;492.7;492.7;377.1;377.1;386.9;386.9;-0.07;0.13;0.02;0.05;-0.01;0.07;0.02;0.00;229.6;231.3;231.7;0.510;0.469;0.523;512;0;19.5;0.999;50.00;0.045;-0.002;0.01;6.65;164.65;5;
08-12-2024 15:25:00;478.8;478.8;484.7;484.7;385.1;385.1;410.9;410.9;-0.07;0.12;0.02;0.04;-0.02;0.06;0.00;0.00;229.6;232.3;231.7;0.486;0.451;0.522;512;0;19.6;0.993;49.99;0.036;0.004;0.00;6.64;164.64;5;

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.