5

I have a script which captures tweets and puts them into a database. I will be running the script on a cronjob and then displaying the tweets on my site from the database to prevent hitting the limit on the twitter API.

So I don't want to have duplicate tweets in my database, I understand I can use 'INSERT...ON DUPLICATE KEY UPDATE' to achieve this, but I don't quite understand how to use it.

My database structure is as follows.

Table - Hash id (auto_increment) tweet user user_url

And currently my SQL to insert is as follows:

$tweet = $clean_content[0];
$user_url = $clean_uri[0];
$user = $clean_name[0];

$query='INSERT INTO hash (tweet, user, user_url) VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")';
mysql_query($query);

How would I correctly use 'INSERT...ON DUPLICATE KEY UPDATE' to insert only if it doesn't exist, and update if it does?

Thanks

2
  • dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html Commented Apr 27, 2010 at 10:20
  • You need a "unique field" in list of fields to insert. If tweet, user and user_url are regular fields (i.e., they are not unique), it won't work. Commented Apr 27, 2010 at 10:21

3 Answers 3

6

you need some UNIQUE KEY on your table, if user_url is tweer_url, then this should fit (every tweet has a unique url, id would be better).

CREATE TABLE `hash` (
  `user_url` ...,
  ...,
  UNIQUE KEY `user_url` (`user_url`)
);

and its better to use INSERT IGNORE on your case

$query='INSERT IGNORE INTO hash (tweet, user, user_url) VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")';

ON DUPLICATE KEY is useful when you need update existing row but you want to insert just once

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your response, this seems like it would work for me. Although I think I will use the ID of the twitter post rather than the URL, I assume the ID is the 11 digit number, for example - 12935503743
0

Try using:

$query='INSERT INTO hash (tweet, user, user_url) 
        VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'") 
        ON DUPLICATE KEY UPDATE tweet = VALUES(tweet)';

Comments

0

ON DUPLICATE KEY UPDATE doesn't seem to be the right solution here, as you don't want to update if the value is already in the table.

I would use Twitter's own unique Status ID field (which should be unique for each tweet) instead of your hash id. Add that as a field on your table, and define it as the primary key (or as a unique index.) Then use REPLACE INTO, including the status ID from Twitter.

This has the advantage that you can always track your record back to a unique Tweet on twitter, so you could easily get more information about the Tweet later if you need to.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.