2

My problem: Code works great, but the speed is far too slow for the number of rows it needs to process.

What I am doing: I initiate a COUNT(*) to pull total number of rows (which last night was ~58000) and I use that to create the loop to do the following: pull two columns of data from that row, datamine one column for text patterns.

Once I had completed that, I search a second table to see if that individual by username is already existing - if they exist, I update their row. If not, I add a new one.

There are 44 columns of data, one being the name the other 43 storing values of my datamining results.

In about 8 hours, it has completed 26500 out of the 58000 when it first started (in that same period, the table has grown to ~100000, but I am not worried about that).

Is there a better method to increase read/write rate?

Portion of my code - I have removed many of the int declarations and Regex.Matches as they are copies of the first with altered Match values.

azCheck is to determine if the message even contains anything we are looking for, it is remains '0', then we don't bother with the last portion of the code.

using (new MySqlConnection(ConnectiongString))
            {
                using (MySqlCommand cmd = new MySqlCommand("select count(*) from messages", connection))
                {
                    using (MySqlDataReader reader = cmd.ExecuteReader())
                    {
                        StringBuilder sb = new StringBuilder();
                        while (reader.Read())
                        {
                            sb.Append(reader.GetInt32(0).ToString());
                        }
                        total_messages = int.Parse(sb.ToString());
                    }
                }
            }
            Console.WriteLine(total_messages.ToString());
            connection.Close();
            for (int i = 1; i <= total_messages; i++)
            {
                connection.Open();
                using (new MySqlConnection(ConnectiongString))
                {
                    using (MySqlCommand cmd = new MySqlCommand("select * from messages WHERE id="+i+"", connection))
                    {
                        using (MySqlDataReader reader = cmd.ExecuteReader())
                        {
                            StringBuilder sb = new StringBuilder();
                            while (reader.Read())
                            {
                                username = reader["username"].ToString();
                                message = reader["message"].ToString();
                            }
                        }
                    }
                }
                connection.Close();

                Console.Write("\r{0}   ", i);

                int aiCount = 0;
                aiCount += Regex.Matches(message, "ai", RegexOptions.IgnoreCase).Count;
                azCheck += aiCount;

//There are ~42 of the regex.matches after the first one.

MySqlCommand cmd1 = connection.CreateCommand();
                connection.Open();
                cmd1.CommandText = "SELECT username FROM users";
                cmd1.CommandType = CommandType.Text;
                cmd1.Connection = connection;
                MySqlDataReader dr = cmd1.ExecuteReader();
                while (dr.Read())
                {
                    if (dr[0].ToString() == username)
                    {
                        check++;
                    }
                }
                connection.Close();

if (check == 0)
                {
                    MySqlConnection connection2 = new MySqlConnection(ConnectiongString);
                    connection2.Open();
                    try
                    {
                        MySqlCommand cmd2 = connection2.CreateCommand();
                        cmd2.CommandText = "INSERT INTO users (username,aiCount) VALUES (@username,@aiCount)";
                        cmd2.Parameters.AddWithValue("@username", username);
                        cmd2.Parameters.AddWithValue("@aiCount", aiCount);
                        cmd2.ExecuteNonQuery();
                        connection2.Close();


                    }
                    catch (Exception)
                    {
                        throw;
                    }

} else {

int aiCount_old = 0;

if (azCheck > 0)
                    {

//Here we are taking the existing values from this users row, 
//which we then add the new values from above and save.

                        MySqlConnection connection4 = new MySqlConnection(ConnectiongString);
                        connection4.Open();
                        try
                        {
                            MySqlCommand cmd2 = connection4.CreateCommand();
                            cmd2.CommandType = CommandType.Text;
                            cmd2.CommandText = "SELECT * from users WHERE username = @username";
                            cmd2.Parameters.AddWithValue("@username", username);
                            MySqlDataReader reader = cmd2.ExecuteReader();
                            while (reader.Read())
                            {
                                aiCount_old = Convert.ToInt32(reader["aiCount"].ToString());
}

                        }
                        catch (Exception)
                        {
                            throw;
                        }
                        connection4.Close();

                        aiCount += aiCount_old;

MySqlConnection connection5 = new MySqlConnection(ConnectiongString);

                        connection5.Open();
                        try
                        {
                            MySqlCommand cmd4 = connection5.CreateCommand();
                            cmd4.CommandType = CommandType.Text;
                            cmd4.CommandText = "UPDATE users SET aiCount = @aiCount WHERE LOWER(LTRIM(RTRIM(username))) = @username";
                            cmd4.Parameters.AddWithValue("@username", username.Trim().ToLower());
                            cmd4.Parameters.AddWithValue("@aiCount", aiCount.ToString());
                              cmd4.ExecuteNonQuery();
                            Console.WriteLine("User updated.");

                        }
                        catch (Exception ex)
                        {
                            throw;
                        }
                        connection5.Close();
7
  • Is there a way to do your datamining in a query so you can use INSERT INTO..SELECT? The fastest way, of course, is to do it all in one query. Not sure if that's possible, though. Can you post some sample data? Commented Aug 27, 2015 at 16:07
  • 3
    Why not do one query to get all records select * from messages & then iterate through the result set with the MySqlDataReader, rather than perform 58000 queries to get a single row each time. Commented Aug 27, 2015 at 16:11
  • @McAdam331 I'm not sure that'll work with the data. Best way to think about it is, if my existing column has a value of 2, and in the message that i've looked there are 3 more instances of the text, then I will need to add 3 to that. INSERT INTO won't work since I need to pull the existing value in order to add. Commented Aug 27, 2015 at 16:15
  • @PaulF That is a good suggestion, I've never used the MySqlDataReader for this large amount of information so I was not sure how well it'd work. I'll assess and try that out! Commented Aug 27, 2015 at 16:15
  • @Larklen I see why the INSERT INTO won't work, my mistake. I agree with Paul, though. It's better to select all the rows and iterate through them, then to make tons of tiny queries. Going back and forth like that could bottleneck performance. Commented Aug 27, 2015 at 16:18

1 Answer 1

3

You have several inefficiencies that I can spot right off the bat.

You are constantly opening and closing your connection string. This is probably your biggest bottleneck. Open the connection once, then close it once when all your processing is done, and you'll probably see a massive increase in performance.

You also use different connection objects when one will do, which will reduce your need to open and close connections.

You also seem to have a misunderstanding of the use of "using" on connection objects. I see using (new MySqlConnection(ConnectiongString)), but that code is completely useless, as it doesn't do anything but initialize a connection object, which is immediately lost since it is not assigned to an object.

Since you are processing everything sequentially, use connection as your connection object in every case, opening it only at the start of your processing, and closing it when processing is complete, then executing the Dispose method (the point of the using statement).

This change alone might reduce processing time by an order of magnitude.

NOTE: You will need a separate connection for your datareader if you need to do updates or another query while the datareader is open.

Sign up to request clarification or add additional context in comments.

5 Comments

Hey Russ, thanks for the advice! The reason for the Open() and Close() of the connection strings is that, if I leave it open, I get the error There is already an open DataReader associated with this Connection which must be closed first., and closing then reopening resolved that. There may be an underlying issue. -- However, I have been able to remove some of these unneeded open/close (as well as the using portion), but the speed remains the same. I currently now only have two sets of connection.Open()/Close(). Note: It is slightly faster , from 1.3 seconds per record to about 0.9 secs
The reason you are getting that error is because you are trying to execute a SQL Command on the connection while the connection is locked by a datareader. One way to resolve this is to use a second connection that you also keep open until done that you use only for the datareader. The other way is to close the datareader (not the connection) before attempting any other SQL command.
Your error is probably caused by this line: MySqlDataReader reader = cmd2.ExecuteReader();. Since you don't have it within a using statement, the datareader never gets closed and disposed, so it locks the connection until the connection is closed.
Sounds good - i've updated the reader portion to reflect a using statement. I'm sitting about about 2 records per second if they are non-existing about about 1.4 records a second if they are both existing (reading and updating). I think, connection string-wise, this is the best it'll get for me. There is another suggestion above to pull all records from the messages table then sort, rather than getting each individually. I'll work out how I need it down and hopefully it'll add improvement on what you've provided me :)
Hey Russ, okay I have combined both your suggestion and PaulF's, and I think I am at the limit in terms of how fast this can go. It is slightly faster than what I was looking at originally, which is a win in my books. I will have to come up with a method to start where I left of if I end this process early. Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.