What is design? Making a lot of choices/decisions from the many options of the vastly larger design/problem space. Which leads to certain properties, advantages/benefits, limitations/constraints, etc., as most of these tend to be trade-offs (for example between convenience of implementation or use versus execution performance versus powerful features, etc.).
What is a programming language? Usually textual notation as an interface for a human programmer to instruct some machine/device to operate in a certain way. Essentially a textual machine interface for a human to use. There's other media like visual programming languages, regarding design of the notation, but for now let's stay with text?
There's various types of (programming) languages. Typically, for general computation of formal logic to be done by automatic machine, one model is "Turing-completeness", which is a set of properties/features a programming language needs to have in order to allow the standard meta-circular "self-modifyable programmation per software (opposed to non-programmable hard-wired devices or simple replay machinery, which may have programming/software in the form of fixed runs of instructions that can be swapped out, but does not have software that can create or modify software). OK that's the theoretical concept, but in practice, it means a "sufficient" programming language needs to have at least the foundational elements of
- logic (means conditional logic)
- control (jumps)
- memory
- input
- output
as Robert C. Martin nicely condensed in his "Programming 101" video.
Logic means, some mechanism to evaluate and check the result of an expression. With Shannon information encoding theory, and because it might mostly be about programming digital = binary computers, this comes down to the comparison of two bits = electrical signals flowing into a transistor, whether these are the same or not, to emit a comparison result (to then chain a whole bunch of these together into more complex logic/arithmetic gates). In the programming language, such evaluation of expressions that produce a binary result is what goes into the conditional of an if or to determine whether a while/for loop should continue with the next iteration or abort.
Control is the mechanism to jump to other places of code located in memory. Depending on language design and/or programming style of whether subscribe to Dijksta or not, in unstructured programming or machine/assembler anyway, it's goto, and under structured programming, once an if has evaluated/computed/compared the binary result of some logic/arithmetic expression, jumps might be made to the sequence of code that's associated with the then or else case/branch. Loops are just a jump back to their starting point of a sequence. Functions are just jumps to somewhere else (while storing in a variable on a stack memory where left off, so at the end of the execution of the function, can return to the spot to continue after where the call was located).
Memory for storing values at were the result of evaluating/computing an expression, so these values per named variable reference/lookup might go into other expressions to be evaluated. Also takes what's entered as input. Also likely contains other code/instructions, so control may jump to it, based on the automated, programmed logic/conditional rules which evaluate values as found or stored into memory. The mechanisms described above also allow for new code/software to be written & stored into memory, so could then jump to and execute such.
Input could come from peripheral devices, onboard sensors like the clock, components that are hooked up on the bus, etc. Includes keyboard/mouse input or reading data into memory from a hard disk device (with the help of an operating system that provides a standardized interface for "streams"/"files" for abstracting driver details away, or without).
Output is desired to somehow obtain and expect the results of the otherwise internal computation. Usually coloring pixels on a screen, print output on paper, send something out into a network, things of that nature.
Depending on your design decisions, if you want to design your language to be of the "functional" type, you may, for your language as an textual interface, not need to support "memory" per se. Turing-machine might refer to the computer CPU hardware, but in a language, in addition to not support random jumps with state set/fitting for one context to a completely different sequence of instructions that assume/expect a different state context to be set, your language could avoid side-effects of state/value sharing across function contexts (as found in random spaghetti code goto jumps or encapsulated inside object values in object-oriented programming), and have all values be passed as parameter arguments put on the stack into functions for respective localized value/state context each. Let's call it "Church-"/"McCarthy"-complete. Instead of memory, it could all be input (hypothetically, likely very crude and per specialized hardware?).
Input/output is similar, just a difference of direction.
So regarding design, the fundamental question is how you want to go about these essential elements. What to allow and enable, and what to prevent by artificially not supporting it via a language construct. Language constructs then, on top of these basic essentials, are features added to a language to save the programmer from doing the lower-level logistics oneself. What about variable names and assignment (value copy or reference)? What about allocating and freeing memory? Support first-order functions as expressions for interpretation? Is there a notion of arrays with dedicated access operator syntax? How to represent which data types (how many, or why have any?), do some type-checking or dynamically determine or switch types? How much do you offer built-in out-of-the-box you take care of in your language's implementation (as a base layer or "framework" for other software to build/rely upon), vs. how much you leave out and therefore offload it to app or library programmers? A lot of questions like these, because for every element you've seen in a language, you could design it differently, or leave it out, or add some other ones.
Programming languages have the levels of syntax, which is the notation for structuring the text, semantics, which is the grammar of allowed/reasonable expressions, and constructs which is your language features.
Syntax as the textual human interface of notation and structuring is what a programmer needs to write your language in, because your interpreter or compiler is going to read a source written in this syntax. In order to execute any of the instructions using the primitive elements of a target machine, the source syntax gets parsed/tokenized. For example, you can throw out all whitespaces and comments (if your language supports any), because the machine doesn't understand and has no need for such information that targets other human readers of the source. Likely you're going to assign internal "types" on the tokens/words found in the source input, for example if it's a name/"symbol", or a literal (say, the value 5 found somewhere in the code can clearly be identified as a number literal, and is not a constant/variable). To categorize consecutive characters that belong together, and to determine their general type (independent of implementation and internal representation, just in their external textual representation), so you can later analyze if the source code supplies the expected textual token categories in the required places & order. At this stage, you're "deserializing" the source code into your internal representation inside your compiler/interpreter front end, so you get an "abstract syntax tree" (like the DOM document object model inside the browser of a parsed HTML page), so your compiler/interpreter software can perform checks and translations on it (you're throwing away the syntax that was just the human interface).
Semantics as the grammar is what you check next. Not all statements in your language are valid. Some errors you might want to reject because it's too difficult or expensive to implement such a feature, others because the user applied the syntax/notation incorrectly (and there's a better/valid way to express the desired operation), and so on. You decide i.e. for 5 = a if this is a valid assignment of first numeric literal 5 then assignment operator = then variable name a is OK or if this is a conditional expression of whether the value of the variable name a is 5, of whether conditionals are allowed to be statements with no assignment on their own or if only expected/allowed in certain places (then you check if this is inside an if condition, or preceded by an assignment like b := 5 = a, and so on). If your language supports assignment at all. Likewise, if names like a or b are used, you do some checking of whether this has been declared/defined before, or if your language allows these to be found/defined later (to then go back and do type checking later, if not dynamic), and so on. The goal of this is in the end to have a checked, validated internal representation of the code in memory, where you also added your internal info/annotations of what these tokens are, because this is what's then fed into the interpreter for execution, into your compiler backend to generate CPU machine code instructions out of it, or to transpile it into another language (where you start adding the different syntax of the different target language).
Often, this involves some stages of optimization, which may replace the expressions written by the human programmer with a more performant and equivalent internal alternative, but this complicates the matter quite a bit, so better not start out with too much of this, as with Knuth, premature optimization is the root of all evil.
Once you have translated the text in the syntax of your language into the internal model that's validated to be syntactically correct to only use your features/constructs in the grammatically expected order & fashion, it's time to move from the design space to the implementation of the language. If it's only about language design, that can be done theoretically and on paper, just as there's pseudocode and hypothetical languages or abstract notations that serve as a notational standard without corresponding implementation (or multiple ones, potentially with different syntax). Of course, it makes sense to consider in the design the difficulty, performance and memory cost of implementing constructs, as well as notational convenience for the human interface (syntactic sugar), etc.
In the implementation, you take the cleaned-up and checked-to-be-valid internal representation of the former source code, and "translate"/map it to your target. If you're building an interpreter for your language, your former syntax for the addition operator might have been the character +, or the command keyword add, or the command keyword PLUS, or whatever. The interpreter implementation, on encountering this instruction fed into it, a lookup table might recognize the addition operation and dispatch the execution to the addition operation of the host language. As this expects two operands, in the internal model, there must be two operands to pass along with it (the earlier semantics check made sure that every addition operation is supplied with two operands), and the implementation will likely need to take the result and store it in some place, as later code might want to read/compare the result. If the operands are variable names the interpreter created in its own memory earlier, and had the source code store a value into it, now it's time to do a lookup and retrieve the value that's currently stored there, to pass it towards the host language. If internally you have variable name a, operator assignment, literal value 5, the interpreter code should allocate memory for an integer, store the value 5 into it, and add the name "a" as a reference name to this memory for later lookup. If there's a print instruction, you may forward the string that has been validated to be the next argument into your host language (so if that would be Java, you could load the string from the source code's string literal into a variable internal to the interpreter (say, printText) to then pass it into System.out.print(printText);.
If the target and corresponding "back-end" is assembler code, or directly machine code of a CPU, the instructions for these target languages need to be written into a file. To the extend these go towards lower-level machine-details, more of the memory logistics and jumping coordination have to be generated by your compiler back-end implementation.
Regarding your questions: Parsing is technically done by starting to read source code in your language one character at a time. Say, you read the first character. Is it whitespace? Might be irrelevant, ignore and continue with the next character. Do you find a special character, like # or a number like 5 or a character like a? For the design of your syntax/notation, you can dedicate characters to have a special meaning (then often too in need of an escape notation to cancel the special meaning and get the character as an ordinary literal/value). These could be operators, or be a special feature, or similar. If, for example, # starts a preprocessor directive, you may continue reading characters and collect them into a token as long as you find more characters, and stop if whitespace, a number or another special character is found. Do you have a constraint that says, after the #, this must immediately be followed by characters that name the preprocessor directive? # is not allowed to be followed by a number or whitespace or another special character? This checks the notational/syntactical demands, and too collects/prepares the source into internal representation, into for example a language item that's marked to be a preprocessor directive with the name "include" or "define". At this stage, you may not yet need to check whether you support/implement the directive name, as the parsing stage often first deals with collecting what the input tokens are (type), collects the values/words, and strips away whitespace etc. The parsed/tokenized result can then be analyzed, by going over it in a loop, and each time you find an item marked to be a preprocessor directive, you may then check if it is one of the names you support (if not: error?). If you encounter an operator that was extracted from an input of ! or NOT, you may dispatch further checking to whether the correct number of operands of a compatible type were supplied. Doing this on tokens with whitespace stripped away means you don't need to do the character/text handling at the syntactical stage anymore, as you have all the values/symbols available in a cleaned-up fashion. Parsing not much different from = "deserializing" a CSV or XML or HTML or JSON into an internal representation in memory, in order to apply operations onto the cleaned internal structures instead of on the source text. Also, your implementation code can maintain lookup indexes and restructure the items, which would be much harder to do in a linear text stream (which might be read-only and/or uni-directional). In Language code just has a few more internal name or address references to check and potentially resolve (at the time needed, could be at runtime to get a value referenced by a name).
OK some time ago I made a few videos Reading & Processing Data and Text Input, Structuring Data and Text Inline, Tokenization & Lexical Analysis and Parsers, Interpretation & Control Flow which are not that great, but hopefully illustrate that one can do this rather easily oneself with regular standard programming as done for many other applications, and especially so by not getting confused by the need to deploy lex/yacc, or the need to build a formal grammar in some notation (EBNF) that needs to be written in grammar notation that then again needs to be parsed, or the complications for more complex notations or expression trees. For example, with the standard mathematical notation of arithmetic with the operator in the middle of the two operands, and then some operands being communicative and others not, a little bit of more extra effort needs to be done to first parse the entire expression before evaluating it. Or similar with many language constructs, if it's not context-free grammars or other simplifying factors, it could well be that a long statement is ended with a keyword that changes the meaning entirely, at which point one has to backtrack and fix up or mark earlier statements (single-pass or more than once?). This is where a lot of optimization effort is spent, as these scenarios can be performance-expensive, be it for compile-time or during interpreter runtime. But for learning language design, you can simplify by "cheating", for example demanding polish notation.
Separate from language design, I found "Language Implementation Patterns -- Create Your Own Domain-Specific and General Programming Languages" by Terence Parr a great help for the parsing part, as it starts out describing the mechanisms using familiar programming language constructs. There's a few more beginner-friendly/hands-on (less academic theory) books out there, but didn't find the time yet to read & review :-(
For language design, besides just for learning, it's more about what style or features to support and encourage, and what to take away by preventing, in order to make an interface for humans to express instructions to machines in a certain way for a particular purpose/application. Can include considerations and tradeoffs in the areas of security, safety/reliability, powerfulness, correctness, convenience, performance, ease of implementation, interoperability, portability, etc. For many features & constraints, there's a lot of conceptual debates and examples/experiments, unfortunately often cluttered and obfuscated by additional optimization techniques and theoretical examination.
Syntax design, could be you want to play around, which awards some insight into why existing notations are the way they are (names not allowed to start with a number, just to make tokenization easier, of numerical literals not allowed to contain arbitrary text characters, and if the first character is a letter, can assume it's not a numerical literal, etc.). Of course, can go against that and invent & implement some other new rules. However, new benefits of convenience, concisiveness, ease of implementation, compatibility and so on should only come from syntax if there's a really good reason/justification for it. Otherwise, there's too many languages/notations already that introduced different notation for the same thing, for no other good reason than being different and incompatible to everything else. Just consider "enumerated"/iterator foreach. If there's a equivalent construct with known notation in some other language programmers are already familiar with, there's no benefit gained from arbitrarily demanding a different notation. Notwithstanding real benefits of syntactic sugar, as languages being textual interfaces or domain-specific languages can help expressing something clearer, conciser, etc. But in general, for language design, syntax is largely solved, per parsers and grammars for parsers, so notation too could simply become a language/notational feature/construct of the language, as already every feature creeps into every language eventually, and for cross-compilation/transpiling, Language Servers and what not, at implementation, the job is anyway to undo the notational interface and work with the constructs according to grammar/semantics.
Paradigm, semantics and/or constructs design, one can look at the different approaches of current, new and historical languages, what advantages these offer for which purposes, and where the respective limitations are, or worse, design flaws. This is a big and serious area, because imagine, for production languages, once there's investments made and code written in it, to maintain and support it can lead to quite some trouble. Or similarly, maybe something can be done design-wise, to mitigate various issues, etc.