8
votes
Accepted
Integrating TeX into a Java desktop application
I have no experience with those specific libraries and tools, but since you asked this here on SE.SE, let me give you an answer on how to approach such an evaluation process as a software engineer.
It ...
6
votes
Accepted
Database structure for word co-occurrence frequencies in a large corpus
Do I need related SQL tables (e.g. with the book metadata in one table and word data in another)
A separate table for words can be used, but it is probably not necessary. A "word" is identified by all ...
5
votes
How does the Arabic typographic layout system work at a high level?
I figured it out. Took me all day but got the alignment pretty decent for that.
What I did was convert a font to SVG, manually copy-paste out the glyphs into JavaScript. Import each one individually ...
4
votes
Find the least words that will use all given letters
If you throw out all the uninteresting letters and words, then reduce each word to the subset of interesting letters contained, you have a Set cover problem. This is NP-complete. If you don't want to ...
3
votes
Name and code to space between lines/paragraphs
There are a number of terms that might be what you're looking for. It's hard to tell from your description, but here are some suggestions what to search for:
Leading - AKA Line Spacing - This controls ...
3
votes
Accepted
Find the least words that will use all given letters
While there may be a more ideal solution out there than this, I think it will get you closer. Right now your algorithm is rating words that have "rare" letters as more important than overall coverage ...
2
votes
Accepted
How to implement tracking of changes in text documents à la MS-Word/Apple Pages
Your use of CriticMarkup is flawed. It is suitable for showing the differences between two version or for showing suggested changes. It is not a suitable format for representing the complete history ...
2
votes
How to identify whether or not 2 pieces of text are identical?
The purpose of a hash would be to save resources. The chances of a collision using a hash with good distribution would be very small.
You would not be worried about someone having changed something ...
2
votes
Applying a file diff to a new file
I believe git uses an analogous approach to track changes in a repository, apply stashes and so forth
Misbelieving detected. Git (and all other VCSes) operates with objects "whole string" (...
1
vote
Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?
As others have said, you must first define "distance". Once you have done so, however, standard approaches can be used. I have implemented Levenshtein this way--most changes were counted ...
1
vote
Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?
First: Define what the distance between two characters is. For example, p and b, g and k, d and t Are similar. A and e are reasonably similar. If I had software converting speech to text, and the ...
1
vote
How to identify whether or not 2 pieces of text are identical?
It's unclear what you mean by "anything else" but whitespace is simple. Just use a regex e.g. \s+ and replace with something deterministic like a single space or some other character of ...
1
vote
How does the Arabic typographic layout system work at a high level?
Letters in arabic can use different glyphs as you saw (to make it possible to connect cleanly with other letters), and glyphs don't have a fixed shape - certain glyphs can and will be stretched for ...
1
vote
Accepted
Database of big text documents many-to-many: one big relationship table, a lot of small ones, or a better way to link abstract text data?
If your document tables have all the same (or very similar) structure, do yourself a favor and use one document table for all of them (and a child table "Paragraph"). Using one table per ...
1
vote
Accepted
Convert RTF to HTML when it's saved to the database or when it's rendered?
As ever, the most important decision about storing data is not about how to store data.
It's how you're going to use the data after you've stored it.
If you're going to have to read the HTML, convert ...
1
vote
Algorithm for line breaking in monospace text
How about:
Scan forward in the text X characters.
While scanning forward if you encounter a line break, stop immediately and respect it.
While scanning forward count how many sections of white text ...
1
vote
Name and code to space between lines/paragraphs
For reasons beyond my "rookie-python level" understanding, the example that user8734617 gave me, which described passing '\n\n' escape sequence through the print function along with the dictionary(my ...
1
vote
Windows compatibility with Unix/Linux newline "\n"
This question is really about a software application's "customer base".
To answer your question, you have to know whether your customers might be inconvenienced if your application generates output ...
1
vote
Windows compatibility with Unix/Linux newline "\n"
As far as Windows and C# is involved you can always use the Environment.Newline
to determine the default new line character of the system the program is ran on.
also, you can use text.Replace("\n",...
1
vote
Windows compatibility with Unix/Linux newline "\n"
Windows Notepad (notepad.exe) doesn't interpret a standalone \n as a new line. It's not necessarily "modern" but pretty much "mainstream".
If you're writing text files, every day user should be able ...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
text-processing × 55algorithms × 9
natural-language-processing × 6
java × 4
database × 4
parsing × 4
strings × 4
python × 3
html × 3
xml × 3
search × 3
machine-learning × 3
text-editor × 3
design × 2
c# × 2
architecture × 2
php × 2
programming-languages × 2
version-control × 2
data-structures × 2
data × 2
modeling × 2
sorting × 2
comparison × 2
design-patterns × 1