0

I'm trying to add tens of thousands of Twitter profiles to a PostgreSQL database using the function below. The Twitter part works great - I get batches of 5000 ids, push them to a master array allFollowers, then pause for 15 mins when the API rate limit hits (after fetching 75000 profiles) before getting the next batch of followers.

The problem is, I think, that by the time the function pauses for 15 mins, the database connection closes. When the function is done running, I get the error Connection terminated unexpectedly. When I run this on a Twitter account with under 75000 followers, so it doesn't have to pause, there are no problems.

I'm connecting to the database using pg-promise with the default options. I've tried many of the suggestions in this post - namely a huge idleTimeoutMillis value (longer than the time the function takes to run) or setting it to 0, but no luck. They all seem to perform the same as the default settings.

I also tried explicitly calling db.connect() before the await addProfiles line, but that didn't change anything either.

async function getTwitterFollowers() {
    const allFollowers = []
    const targetAccount = 'TwitterDev'

    await getGroupOfFollowers(-1) // Default cursor for the Twitter API is -1

    async function getGroupOfFollowers(cursor) {
        await T.get('followers/ids', {
            screen_name: targetAccount,
            stringify_ids: true,
            cursor: cursor,
            count: 5000,
        })
            .then(async (res) => {
                const followerIds = await res.data.ids
                followerIds.map((id) => {
                    allFollowers.push(id)
                })

                // Paginate and get the next 5000 followers if the first page is full
                if (followerIds.length === 5000) {
                    await getGroupOfFollowers(res.data.next_cursor)
                }
            })
            .catch(async (err) => {
                // This is expected to fail due to Twitter rate limiting, so we pause and try again
                // This is why the queries are so long
                console.error('Trying again in 15 mins.', err.twitterReply.errors[0].message)

                // Pause for 15 mins
                await new Promise((resolve) => setTimeout(resolve, 900000))

                // Try again
                await getGroupOfFollowers(cursor)
            })
    }

    // It could take many hours to get to this point for large Twitter accounts
    try {
        await addProfiles(
            allFollowers.map((id) => ({
                id: id,
                added: new Date(),
            }))
        )

        console.log(`Added/updated ${allFollowers.length} profiles to the database`)
    } catch (err) {
        console.error("Error adding target accounts' followers to the database.", err)
        // This is the "Error: Connection terminated unexpectedly"
    }

    return allFollowers
}

I'm wondering if my problem is in the database initiation options (again, of which I use all default settings because other suggested answers didn't work), or keeping the connection live another way. Would appreciate any ideas.

3
  • What does the database log file say? Commented Apr 17, 2022 at 12:34
  • @jjanes I'm using Railway's PostgreSQL service so not sure I can access a log file. Will try this same thing with a local db and get back Commented Apr 17, 2022 at 14:47
  • @jjanes update - it was just a Railway issue. It works as expected on a local database. Appreciate you trying to help! Commented Apr 18, 2022 at 17:48

1 Answer 1

0

Turns out it was just a Railway issue (the PostgreSQL host I was using). The same issue appears when using Render and other similar platforms. Using AWS or Google Cloud works as expected.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.