We are having a Analytics product. For each of our customer we give one JavaScript code, they put that in their web sites. If a user visit our customer site the java script code hit our server so that we store this page visit on behalf of our customer. Each of our customer contains unique domain name that means customer determined by domain nam
Database server : MySql 5.6 Table rows : 400 million
Following is our table schema.
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| domain | varchar(50) | NO | MUL | NULL | |
| guid | binary(16) | YES | | NULL | |
| sid | binary(16) | YES | | NULL | |
| url | varchar(2500) | YES | | NULL | |
| ip | varbinary(16) | YES | | NULL | |
| is_new | tinyint(1) | YES | | NULL | |
| ref | varchar(2500) | YES | | NULL | |
| user_agent | varchar(255) | YES | | NULL | |
| stats_time | datetime | YES | | NULL | |
| country | char(2) | YES | | NULL | |
| region | char(3) | YES | | NULL | |
| city | varchar(80) | YES | | NULL | |
| city_lat_long | varchar(50) | YES | | NULL | |
| email | varchar(100) | YES | | NULL | |
+---------------+------------------+------+-----+---------+----------------+
In above table guid represents visitor of our customer site and sid represents visitor session of our customer site. That means for every sid there should be associated guid.
We need queries like following
Query 1 : Find unique,total visitors
SELECT count(DISTINCT guid) AS count,count(guid) AS total FROM page_views WHERE domain = 'abc' AND stats_time BETWEEN '2015-10-05 00:00:00' AND '2015-10-04 23:59:59'
composite index planning : domain,stats_time,sid
Query 2 : Find unique,total sessions
SELECT count(DISTINCT sid) AS count,count(sid) AS total FROM page_views WHERE domain = 'abc' AND stats_time BETWEEN '2015-10-05 00:00:00' AND '2015-10-04 23:59:59'
composite index planning : domain,stats_time,guid
Query 3: Find visitors,sessions by country ,by region, by city
composite index planning : domain,country
composite index planning : domain,region
Each combination is requiring new composite index. That means huge index file, we can't keep this in memory so performance of the queries are low.
Is there any way optimize this index combinations to reduce index size and improve performance.
SHOW CREATE TABLE; it is more descriptive thanDESCRIBE.BETWEEN '2015-10-05 00:00:00' AND '2015-10-04 23:59:59'is empty?? In real life, will you be looking only at "recent" data?