Skip to main content
8 votes
Accepted

Integrating TeX into a Java desktop application

I have no experience with those specific libraries and tools, but since you asked this here on SE.SE, let me give you an answer on how to approach such an evaluation process as a software engineer. It ...
Doc Brown's user avatar
  • 220k
6 votes
Accepted

Database structure for word co-occurrence frequencies in a large corpus

Do I need related SQL tables (e.g. with the book metadata in one table and word data in another) A separate table for words can be used, but it is probably not necessary. A "word" is identified by all ...
Doc Brown's user avatar
  • 220k
5 votes

How does the Arabic typographic layout system work at a high level?

I figured it out. Took me all day but got the alignment pretty decent for that. What I did was convert a font to SVG, manually copy-paste out the glyphs into JavaScript. Import each one individually ...
Lance Pollard's user avatar
4 votes

Find the least words that will use all given letters

If you throw out all the uninteresting letters and words, then reduce each word to the subset of interesting letters contained, you have a Set cover problem. This is NP-complete. If you don't want to ...
kevin cline's user avatar
  • 33.9k
3 votes

Name and code to space between lines/paragraphs

There are a number of terms that might be what you're looking for. It's hard to tell from your description, but here are some suggestions what to search for: Leading - AKA Line Spacing - This controls ...
user1118321's user avatar
  • 4,981
3 votes
Accepted

Find the least words that will use all given letters

While there may be a more ideal solution out there than this, I think it will get you closer. Right now your algorithm is rating words that have "rare" letters as more important than overall coverage ...
Becuzz's user avatar
  • 4,865
2 votes
Accepted

How to implement tracking of changes in text documents à la MS-Word/Apple Pages

Your use of CriticMarkup is flawed. It is suitable for showing the differences between two version or for showing suggested changes. It is not a suitable format for representing the complete history ...
amon's user avatar
  • 136k
2 votes

How to identify whether or not 2 pieces of text are identical?

The purpose of a hash would be to save resources. The chances of a collision using a hash with good distribution would be very small. You would not be worried about someone having changed something ...
Martin Maat's user avatar
  • 18.6k
2 votes

Applying a file diff to a new file

I believe git uses an analogous approach to track changes in a repository, apply stashes and so forth Misbelieving detected. Git (and all other VCSes) operates with objects "whole string" (...
Lazy Badger's user avatar
  • 1,937
1 vote

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

As others have said, you must first define "distance". Once you have done so, however, standard approaches can be used. I have implemented Levenshtein this way--most changes were counted ...
Loren Pechtel's user avatar
1 vote

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

First: Define what the distance between two characters is. For example, p and b, g and k, d and t Are similar. A and e are reasonably similar. If I had software converting speech to text, and the ...
gnasher729's user avatar
  • 49.4k
1 vote

How to identify whether or not 2 pieces of text are identical?

It's unclear what you mean by "anything else" but whitespace is simple. Just use a regex e.g. \s+ and replace with something deterministic like a single space or some other character of ...
JimmyJames's user avatar
  • 30.9k
1 vote

How does the Arabic typographic layout system work at a high level?

Letters in arabic can use different glyphs as you saw (to make it possible to connect cleanly with other letters), and glyphs don't have a fixed shape - certain glyphs can and will be stretched for ...
gnasher729's user avatar
  • 49.4k
1 vote
Accepted

Database of big text documents many-to-many: one big relationship table, a lot of small ones, or a better way to link abstract text data?

If your document tables have all the same (or very similar) structure, do yourself a favor and use one document table for all of them (and a child table "Paragraph"). Using one table per ...
Doc Brown's user avatar
  • 220k
1 vote
Accepted

Convert RTF to HTML when it's saved to the database or when it's rendered?

As ever, the most important decision about storing data is not about how to store data. It's how you're going to use the data after you've stored it. If you're going to have to read the HTML, convert ...
Phill  W.'s user avatar
  • 13.1k
1 vote

Algorithm for line breaking in monospace text

How about: Scan forward in the text X characters. While scanning forward if you encounter a line break, stop immediately and respect it. While scanning forward count how many sections of white text ...
Kain0_0's user avatar
  • 16.6k
1 vote

Name and code to space between lines/paragraphs

For reasons beyond my "rookie-python level" understanding, the example that user8734617 gave me, which described passing '\n\n' escape sequence through the print function along with the dictionary(my ...
Iam Pyre's user avatar
1 vote

Windows compatibility with Unix/Linux newline "\n"

This question is really about a software application's "customer base". To answer your question, you have to know whether your customers might be inconvenienced if your application generates output ...
rwong's user avatar
  • 17.2k
1 vote

Windows compatibility with Unix/Linux newline "\n"

As far as Windows and C# is involved you can always use the Environment.Newline to determine the default new line character of the system the program is ran on. also, you can use text.Replace("\n",...
Xeorge Xeorge's user avatar
1 vote

Windows compatibility with Unix/Linux newline "\n"

Windows Notepad (notepad.exe) doesn't interpret a standalone \n as a new line. It's not necessarily "modern" but pretty much "mainstream". If you're writing text files, every day user should be able ...
Mario's user avatar
  • 1,509

Only top scored, non community-wiki answers of a minimum length are eligible