Skip to main content
12 events
when toggle format what by license comment
Jul 4, 2020 at 12:06 comment added Ed Morton The max calculation happens once per line of fileA, not once total for all of fileA, and if we can't have the dots string populated before we read fileA then, as you can see in the new script, we need to loop through every value we read from fileA again to populate the str2dots array instead of doing it when we read each line the first time and there we have to access the str2lgth array to get the length for that string when on the first pass we had it in the lgth scalar variable so it will impact performance even if not by much.
Jul 4, 2020 at 11:49 comment added dizcza Thank you. But why are you asking "not even 100,000 chars"? I mean, it shouldn't impact the performance since you read fileA only once to determine the max length and fileA is relatively small.
Jul 4, 2020 at 11:45 comment added Ed Morton I added a version that does both.
Jul 4, 2020 at 11:44 history edited Ed Morton CC BY-SA 4.0
added 1002 characters in body
Jul 3, 2020 at 19:58 comment added dizcza How to dynamically calculate the largest line length of fileA and put it in awk? And also, if I assume that fileA is already lowercased, can I remove { lc = tolower($0) } block and substitute $0 for lc?
Jul 1, 2020 at 18:57 comment added dizcza Yeap. I should have googled it. Thanks.
Jul 1, 2020 at 16:53 comment added Ed Morton @dizcza google says... stackoverflow.com/q/40049546/1745001. So apparently you just need to set LC_ALL=C (which is almost always good advice anyway unless you have a specific reason not to).
Jul 1, 2020 at 16:50 comment added dizcza It throws a warning however saying "awk: tst.awk:7: (FILENAME=fileB FNR=606894) warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale."
Jul 1, 2020 at 16:49 vote accept dizcza
Jul 1, 2020 at 16:49 comment added dizcza Nice! 3 sec and 02:05 min compared to 10 sec and 9:30 min of Python.
Jul 1, 2020 at 15:37 history edited Ed Morton CC BY-SA 4.0
edited body
Jul 1, 2020 at 15:29 history answered Ed Morton CC BY-SA 4.0