0

Consider the following snippet CSV data from "NASDAQ.csv"

"Symbol,""Name"",""LastSale"",""MarketCap"",""ADR TSO"",""IPOyear"",""Sector"",""industry"",""Summary Quote"",";;
"FLWS,""1-800 FLOWERS.COM, Inc."",""2.9"",""81745200"",""n/a"",""1999"",""Consumer Services"",""Other Specialty Stores"",""http://www.nasdaq.com/symbol/flws"",";;
"FCTY,""1st Century Bancshares, Inc"",""4"",""36172000"",""n/a"",""n/a"",""Finance"",""Major Banks"",""http://www.nasdaq.com/symbol/fcty"",";;
"FCCY,""1st Constitution Bancorp (NJ)"",""8.8999"",""44908895.4"",""n/a"",""n/a"",""Finance"",""Savings Institutions"",""http://www.nasdaq.com/symbol/fccy"",";;

I'm trying to import Symbol, Sector, and Industry into a MySQL table with corresponding fields:

$path = "NASDAQ.csv";
$row = 1;
if (($handle = fopen($path, "r")) !== FALSE) {
  while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
    $row++;
    $entries[] = $data ;
  }
  fclose($handle);
}

foreach ($entries as $line) {
  db_query("
     INSERT INTO us_stocks (symbol, name, sector, industry) 
     VALUES ('%s', '%s', '%s', '%s', '%s')",
     $line[0], $line[1], $line[6], $line[7]
  );
}

The result, however, is not what I expected. In the database, only the Symbol field gets filled, and not even correctly:

symbol      name  sector  industry
----------------------------------
Symbol,"Na
FLWS,"1-80
FCTY,"1st
FCCY,"1st

What am I doing wrong?

[edit]

If I print_r($entries), the output looks like

Array (
  [0] => Array(
    [0] => Symbol,"Name","LastSale","MarketCap","ADR TSO","IPOyear","Sector","industry","Summary Quote",;;
  )
  [1] => Array(
    [0] => FLWS,"1-800 FLOWERS.COM, Inc.","2.9","81745200","n/a","1999","Consumer Services","Other Specialty Stores","http://www.nasdaq.com/symbol/flws",;;
  )
  [2] => Array(
    [0] => FCTY,"1st Century Bancshares, Inc","4","36172000","n/a","n/a","Finance","Major Banks","http://www.nasdaq.com/symbol/fcty",;;
  )
)

[edit2]

I have deleted the first line of the CSV, as suggested. I now have a very quick and dirty way of almost accomplishing what I want. Basically, the thing messes up whenever there's a company name with ", Inc" in it. So I just "glue" it to the name above: $data[1] = $data[1] . $data[2]:

$path = "NASDAQ.csv";
$row = 1;
if (($handle = fopen($path, "r")) !== FALSE) {
  while (($data = fgetcsv($handle, 1000, ";;")) !== FALSE) {
    if ($row < 100) {
      $row++;
      $data = explode(',', $data[0]);
      if (substr($data[2], 0, 1) == ' ') {
        $data[1] = $data[1] . $data[2];
        unset($data[2]);
      }
      $entries[] = $data ;
    }
  }
  fclose($handle);
}

A print_r($entries) now gives:

[0] => Array
    (
        [0] => FLWS
        [1] => "1-800 FLOWERS.COM Inc."
        [3] => "2.9"
        [4] => "81745200"
        [5] => "n/a"
        [6] => "1999"
        [7] => "Consumer Services"
        [8] => "Other Specialty Stores"
        [9] => "http://www.nasdaq.com/symbol/flws"
        [10] => 
    )

Final problem: I don't know how to renumber the keys. So 3 into 2, 4 into 3, etc. so that the output looks like:

[0] => Array
    (
        [0] => FLWS
        [1] => "1-800 FLOWERS.COM Inc."
        [2] => "2.9"
        [3] => "81745200"
        [4] => "n/a"
        [5] => "1999"
        [6] => "Consumer Services"
        [7] => "Other Specialty Stores"
        [8] => "http://www.nasdaq.com/symbol/flws"
        [9] => 
    )

Any help would be greatly appreciated!

1
  • 1
    I'd guess that it has to do with double-double quotes used in your CSV file. The fourth argument of fgetcsv() ($enclosure) could be set to "\"\"" to see if this is the case. Commented Feb 9, 2012 at 13:03

2 Answers 2

2

I'd say the data isn't "truely" CSV.

"FLWS,""1-800 FLOWERS.COM, Inc."",""2.9"", should be : "FLWS","1-800 FLOWERS.COM, INC.","2.9" - The quotes should wrap the individual fields with commas seperating each field. Usually numeric fields are not wrapped.

Depending on how you load the data, comma's in the data may confuse it. (i.e. the FLOWERS.COM, INC"

By the way - if it's really CSV - look at: http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Sign up to request clarification or add additional context in comments.

5 Comments

Well, it sure isn't the best csv file I ever saw...but it's whats available on nasday.com and I couldn't find any other source for importing the ticker symbols of all US stocks (I have other csv's like AMEX, NYSE, from the same website). Couldn't I just strip all " and ' from all fields?
The 1st line must have a typo on it, as theres no separator between Symbol and Name outside the quotes. I'd just do a replacement of all "" with " (change or tr 2 x quotes to 1 x quote) and use load data infile skipping the 1st line, and specifying the columns to load.. I guarantee if you go with load data infile, your inserts will be monsterously fast.
Probably, but for now, I think hacking something together in php works faster for me...almost there btw :) Please have a look at my final question - how to renumber keys - if you have time.
samuelkerr.com/?page_id=287 has an example of the load data infile thing - just fyi. On the other question, look at array_values and array_keys - I'm sure they're what you're after.
Very quick and dirty, I used $data = array_merge(array(), $data); to reindex. Problem solved. Thanks :)
1

As Crontab said, probably it's a problem with quotes. Try:

foreach ($entries as $line) {

  // Escape (see mysql_real_escape_string too) and remove double quotes
  foreach ($line as $k => $v) $line[$k] = mysql_escape_string(trim($v, '"'));

  // Rebuild array
  $line = array_values($line);

  db_query("
    INSERT INTO us_stocks (symbol, name, sector, industry) 
    VALUES ('%s', '%s', '%s', '%s', '%s')",
    $line[0], $line[1], $line[6], $line[7]
 );

}

PS: I don't know if you already escape strings in db_query().

3 Comments

I already do. However, it's not working. And neither is your code. It now just reads FLWS,\"1-8 etc. because of double escaping. Perhaps it would be better to just use regex to remove all single and double quotes from each $data line?
trim($v, '"') removes single or multiple double quotes from the beginning and end of the string. So, I'm afraid is fgetcsv() that can't parse that CSV correctly. Have you tried, before the query and without my code, to see the output of print_r($line)? Are fields splitted correctly?
@Reveller Ok, now you got it! You have just to eliminate double quotes using trim($v, '"') as I already showed you. Then, to eliminate index jumps and recompat the array, you just need array_values(). See my updated answer :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.