Skip to main content
added 630 characters in body
Source Link
Adrian
  • 135
  • 7

The first few lines from the 2003 CSV output is the following:

n_firms n_users
1   2392550
2   478414
3   205789
4   115967
5   73688
6   51690
7   37297
8   28025
9   21959
10  17480
11  14295
12  11983
13  9937
14  8513
15  7451
16  6611
17  5749
18  4991
19  4702
20  4001
21  3668
22  3330
23  2971
24  2638
25  2462
26  2338
27  2177
28  2006

There are 2006 users in the year 2003 that accessed 28 unique firms in a day or there are 2392550 users in the year 2003 that accessed only 1 unique firm in a day.

The first few lines from the 2003 CSV output is the following:

n_firms n_users
1   2392550
2   478414
3   205789
4   115967
5   73688
6   51690
7   37297
8   28025
9   21959
10  17480
11  14295
12  11983
13  9937
14  8513
15  7451
16  6611
17  5749
18  4991
19  4702
20  4001
21  3668
22  3330
23  2971
24  2638
25  2462
26  2338
27  2177
28  2006

There are 2006 users in the year 2003 that accessed 28 unique firms in a day or there are 2392550 users in the year 2003 that accessed only 1 unique firm in a day.

Added example of input
Source Link
Adrian
  • 135
  • 7

Each CSV file looks like this (and I only keep the IP and the CIK= central index key. Each company has a unique CIK which isand you can use that number to search for the id ofcompany's fillings on the firmsecurities and exchange comission website https://www.sec.gov/edgar/searchedgar/companysearch.html): enter image description here

ip              date        time        zone  cik
199.43.32.edd   03/01/2004  00:00:00    500 78890
67.82.239.bhe   03/01/2004  00:00:00    500 746838
67.82.239.bhe   03/01/2004  00:00:00    500 1001082
67.82.239.bhe   03/01/2004  00:00:00    500 746838
67.82.239.bhe   03/01/2004  00:00:00    500 752642
67.82.239.bhe   03/01/2004  00:00:00    500 1001082
151.196.250.ahd 03/01/2004  00:00:01    500 825411
208.61.82.abc   03/01/2004  00:00:01    500 106926
67.82.239.bhe   03/01/2004  00:00:01    500 82020
67.82.239.bhe   03/01/2004  00:00:01    500 1001082
67.82.239.bhe   03/01/2004  00:00:01    500 101829
67.82.239.bhe   03/01/2004  00:00:01    500 1001082
151.196.250.ahd 03/01/2004  00:00:02    500 825411
207.168.174.jdd 03/01/2004  00:00:02    500 714756
66.108.151.fgg  03/01/2004  00:00:02    500 1000180

Each CSV file looks like this (and I only keep the IP and the CIK which is the id of the firm): enter image description here

Each CSV file looks like this (and I only keep the IP and the CIK= central index key. Each company has a unique CIK and you can use that number to search for the company's fillings on the securities and exchange comission website https://www.sec.gov/edgar/searchedgar/companysearch.html):

ip              date        time        zone  cik
199.43.32.edd   03/01/2004  00:00:00    500 78890
67.82.239.bhe   03/01/2004  00:00:00    500 746838
67.82.239.bhe   03/01/2004  00:00:00    500 1001082
67.82.239.bhe   03/01/2004  00:00:00    500 746838
67.82.239.bhe   03/01/2004  00:00:00    500 752642
67.82.239.bhe   03/01/2004  00:00:00    500 1001082
151.196.250.ahd 03/01/2004  00:00:01    500 825411
208.61.82.abc   03/01/2004  00:00:01    500 106926
67.82.239.bhe   03/01/2004  00:00:01    500 82020
67.82.239.bhe   03/01/2004  00:00:01    500 1001082
67.82.239.bhe   03/01/2004  00:00:01    500 101829
67.82.239.bhe   03/01/2004  00:00:01    500 1001082
151.196.250.ahd 03/01/2004  00:00:02    500 825411
207.168.174.jdd 03/01/2004  00:00:02    500 714756
66.108.151.fgg  03/01/2004  00:00:02    500 1000180
Added example of input
Source Link
Adrian
  • 135
  • 7

Each CSV file looks like this (and I only keep the IP and the CIK which is the id of the firm): enter image description here

Each CSV file looks like this (and I only keep the IP and the CIK which is the id of the firm): enter image description here

Source Link
Adrian
  • 135
  • 7
Loading