0

I have an SQL table comprised of stock data closing prices such as:

Date        AAOI  ABIL  ACIA  ACIW  ...  ZG   ZIXI 
2000-01-03   NaN   NaN   NaN  8.94  ... NaN  37.19
2000-01-04   NaN   NaN   NaN  8.33  ... NaN  36.50
2000-01-05   NaN   NaN   NaN  8.06  ... NaN  37.28
2000-01-06   NaN   NaN   NaN  7.98  ... NaN  35.25
2000-01-07   NaN   NaN   NaN  7.81  ... NaN  38.00

Is there a way to append columns to this table that equate to the ratio of each stock pair? I.e.

Date     AAOI  ABIL  ACIA  ACIW  ...  ZG   ZIXI  AAOI/ABIL  AAOI/ACIA  ... AAOI/ZIXI  ABIL/AAOI ... 

Or possibly create a new table?

I have thousands of columns of stock data and the number of ratio columns would run into the tens of thousands or more.

Is there a way to automate the creation of these columns with some sort of loop? This is my first SQLite project and am unsure how to proceed here.

Any additional information or advice would be appreciated. Thanks!

2 Answers 2

1

Only my personnal opinion, but I really think you are misusing SQLite here.

The primary goal of a database is to provide structured storage with little redundancy if any. Redundancy is considered bad in a database, because a poor update could leave incoherent data. It is indeed common to have denormalized columns (meaning columns that can be computed from other columns) but it is normally used when those computations are too complex to fit in a query, and you broadly advertise users that they are computed values.

Here you only have ratios which are easy to write in a select request, so IMHO it is no use to store them in the database: it just wastes space. Is easy enough to write:

SELECT AAOI,  ABIL,  ACIA, AAOI/ABIL, ACIA/ABIL, ABIL/ACIA
FROM ...

It certainly makes sense to have those ratios in a pandas dataframe, but not in a database. And it is easy to compute them in pandas:

cols = list(df.columns[1:])
for i, c in enumerate(cols[:-1]):
    for c2 in cols[i+1:]:
        df['{}/{}'.format(c, c2)] = df[c]/df[c2]

If for performance reasons you have to store those ratios, try to store them outside of the database (for example in a csv file) or in a separate table.

Sign up to request clarification or add additional context in comments.

6 Comments

I think you may be right. But how would I go about producing such a large amount of these ratios without referencing them manually? I mean we are talking about tens of thousands of ratios. I have a dataframe of the thousand plus stock prices actually: df = [AAOI, ABIL ... ZIXI] but no way to automate the ratio calculation process.
Do the ratio computations at pandas level, not at SQLite level. Please see my edit.
This worked perfectly! The only problem I have though is if I have 1000 columns it will add 500'000 columns of ratios which there isnt enough space for in the CSV (would be fine otherwise). What do you think of writing each ratio to its own CSV?
How do you intent to process 500000 column? Anyway 500000 files is probably not very efficient. Depending on the expected processing it could make sense to have about 1000 files of about 500 columns. But at that point it is really a matter of business usage. You certainly know it, but I don't...
Good point. This is being used to find optimal hedging pairs for stocks (one long and one short position). Maybe I can attribute a CSV file for each long position . Still not sure. But thank you so much, this has been a HUGE help.
|
0

Do you really need tens of thousands of ratios at one time? Not that you could have that number of columns as the default limit is 2000 columns per table it can be increased to up to 32767. Limits In SQLite - Maximum Number Of Columns

You can however have tens/hundreds of thousands rows per table. As such you might wish to consider a row per stockcode per date.

Perhaps consider the following, which doesn't add a column to the table BUT instead gets a ratio (no idea if this is the exact calculation you want) derived from the stockprice for a date range for two stockcodes :-

DROP TABLE IF EXISTS closing_price; 
CREATE TABLE IF NOT EXISTS closing_price (closingdate TEXT, stockcode TEXT, stockprice REAL, UNIQUE(closingdate, stockcode));
INSERT INTO closing_price VALUES
    ('2001-01-03','AAOI',null),('2001-01-03','ABIL',null),('2001-01-03','ACIA',null),('2001-01-03','ACIW',8.94),('2001-01-03','ZG',null),('2001-01-03','ZIXI',37.19),
    ('2001-01-04','AAOI',null),('2001-01-04','ABIL',null),('2001-01-04','ACIA',null),('2001-01-04','ACIW',8.33),('2001-01-04','ZG',null),('2001-01-04','ZIXI',36.50),
    ('2001-01-05','AAOI',null),('2001-01-05','ABIL',null),('2001-01-05','ACIA',null),('2001-01-05','ACIW',8.06),('2001-01-05','ZG',null),('2001-01-05','ZIXI',37.28),
    ('2001-01-06','AAOI',null),('2001-01-06','ABIL',null),('2001-01-06','ACIA',null),('2001-01-06','ACIW',7.98),('2001-01-06','ZG',null),('2001-01-06','ZIXI',35.25),
    ('2001-01-07','AAOI',null),('2001-01-07','ABIL',null),('2001-01-07','ACIA',null),('2001-01-07','ACIW',7.81),('2001-01-07','ZG',null),('2001-01-07','ZIXI',38.00)
;

-- Ratio for a single day between ACIW and ZIXI
SELECT (
    SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ACIW' AND closingdate BETWEEN '2001-01-03' AND '2001-01-03'
    ) 
    / (
    SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ZIXI' AND closingdate BETWEEN '2001-01-03' AND '2001-01-03'
    ) 
    AS ratio
;
-- Ratio for the 5 days between ACIW and ZIXI
SELECT (
    SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ACIW' AND closingdate BETWEEN '2001-01-03' AND '2001-01-07'
    ) 
    / (
    SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ZIXI' AND closingdate BETWEEN '2001-01-03' AND '2001-01-07'
    ) 
    AS ratio
;

The above utilises a single table but a row per stockcode/closingdate combination with a UNIQUE index comprised of the combination of the stockcode/closingdate.

The table would look like :-

enter image description here

It then, using a query, calculates the ratio for a specific pair of stockcodes for a given date range (first query is for a single day, the second for a 4 day range).

The results are :-

  1. (between ACIW and ZIXI for the 1 day 2001-01-03)

enter image description here

  1. (between ACIW and ZIXI for the 5 days 2001-01-03 through to 2001-01-07)

enter image description here

Additional

would it be possible to get the ratios using the method you listed without calling them individually (far too many combinations here)?

Ignoring nulls (at least for brevity/usefulness) the you could do something like the following (BUT beware the processing time) then perhaps the following would suit :-

WITH
allstocks AS (SELECT DISTINCT stockcode FROM closing_price),
combined AS (
SELECT DISTINCT closing_price.closingdate, closing_price.stockcode AS sc1, allstocks.stockcode AS sc2 
FROM closing_price JOIN allstocks ON closing_price.stockcode <> allstocks.stockcode
)
SELECT closingdate, sc1, sc2, 
    (SELECT stockprice FROM closing_price WHERE stockcode = sc1 AND closing_price.closingdate = combined.closingdate) /
    (SELECT stockprice FROM closing_price WHERE stockcode = sc2 AND closing_price.closingdate = combined.closingdate) AS ratio
FROM combined WHERE ratio IS NOT NULL;

This would result in :-

enter image description here

(that's from 150 combinations, the rest being nulls)

You could add a date range by amemding the above using something like :-

WITH
allstocks AS (SELECT DISTINCT stockcode FROM closing_price),
combined AS (
SELECT DISTINCT closing_price.closingdate, closing_price.stockcode AS sc1, allstocks.stockcode AS sc2 
FROM closing_price JOIN allstocks ON closing_price.stockcode <> allstocks.stockcode
WHERE closingdate BETWEEN '2001-01-04' AND '2001-01-06' --<<<<<<<<<< ADDED
)
SELECT closingdate, sc1, sc2, 
    (SELECT stockprice FROM closing_price WHERE stockcode = sc1 AND closing_price.closingdate = combined.closingdate) /
    (SELECT stockprice FROM closing_price WHERE stockcode = sc2 AND closing_price.closingdate = combined.closingdate) AS ratio
FROM combined 
WHERE ratio IS NOT NULL
;

which would result in :-

enter image description here

  • The above

    1. creates a CTE (Common Table Expression (a temporary table as such)) for each UNIQUE stockcode, the CTE is given the name allstocks.

    2. creates another CTE based upon the closing_price table joined with the allstocks CTE where the stockcode doesn't match (and for the given date range for the second example). The resultane CTE is named combined

    3. Each row in the combined CTE is then selected and as per the first example the ratio is derived according to the two stock codes in each row.

If you wanted to store the ratios in a table the you could define a table such as :-

CREATE TABLE IF NOT EXISTS ratio (closingdate TEXT, stockcode1 TEXT, stockcode2 TEXT, ratio, PRIMARY KEY(closingdate, stockcode1,stockcode2));

And use :-

WITH
allstocks AS (SELECT DISTINCT stockcode FROM closing_price),
combined AS (
SELECT DISTINCT closing_price.closingdate, closing_price.stockcode AS sc1, allstocks.stockcode AS sc2 
FROM closing_price JOIN allstocks ON closing_price.stockcode <> allstocks.stockcode
WHERE closingdate BETWEEN '2001-01-04' AND '2001-01-06'
)
INSERT OR IGNORE INTO ratio SELECT closingdate, sc1, sc2, 
    (SELECT stockprice FROM closing_price WHERE stockcode = sc1 AND closing_price.closingdate = combined.closingdate) /
    (SELECT stockprice FROM closing_price WHERE stockcode = sc2 AND closing_price.closingdate = combined.closingdate) AS ratio
FROM combined
;
  • Note that the PRIMARY KEY in conjunction with INSERT OR IGNORE removes the chance for duplicates being added.

2 Comments

This is actually very smart! The only thing is though that I have 1-2k stocks and therefore up to 2k rows per date. Which could still be fine I suppose, but would it be possible to get the ratios using the method you listed without calling them individually (far too many combinations here)? Say from a list or something with .format?
Thanks for the quick reply! This looks good to me and could really be my best bet going forward. Really appreciate the insight. You're an asset to the community. Answer accepted!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.