Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

What you need is a true parser. Regular expressions handle lexing, not parsing. That is, they identify tokens within your input stream. Parsing is the context of the tokens, I.E. who goes where and in what order.

The classic parsing tool is yacc/bison. The classic lexer is lex/flex. Since php allows for integrating C codeintegrating C code, you can use flex and bison to build your parser, have php call it on the input file/stream, and then get your results.

It will be blazing fast, and far easier to work with once you understand the tools. I suggest reading Lex and Yacc 2nd Ed. from O'Reilly. For an example, I've set up a flex and bison project on github, with a makefile. It is cross compilable for windows if necessary.

It is complex, but as you found out, what you need done is complex. There is a great deal of "stuff" that must be done for a properly working parser, and flex and bison deal with the mechanical bits. Otherwise, you find yourself in the unenviable position of writing code at the same abstraction layer as assembly.

What you need is a true parser. Regular expressions handle lexing, not parsing. That is, they identify tokens within your input stream. Parsing is the context of the tokens, I.E. who goes where and in what order.

The classic parsing tool is yacc/bison. The classic lexer is lex/flex. Since php allows for integrating C code, you can use flex and bison to build your parser, have php call it on the input file/stream, and then get your results.

It will be blazing fast, and far easier to work with once you understand the tools. I suggest reading Lex and Yacc 2nd Ed. from O'Reilly. For an example, I've set up a flex and bison project on github, with a makefile. It is cross compilable for windows if necessary.

It is complex, but as you found out, what you need done is complex. There is a great deal of "stuff" that must be done for a properly working parser, and flex and bison deal with the mechanical bits. Otherwise, you find yourself in the unenviable position of writing code at the same abstraction layer as assembly.

What you need is a true parser. Regular expressions handle lexing, not parsing. That is, they identify tokens within your input stream. Parsing is the context of the tokens, I.E. who goes where and in what order.

The classic parsing tool is yacc/bison. The classic lexer is lex/flex. Since php allows for integrating C code, you can use flex and bison to build your parser, have php call it on the input file/stream, and then get your results.

It will be blazing fast, and far easier to work with once you understand the tools. I suggest reading Lex and Yacc 2nd Ed. from O'Reilly. For an example, I've set up a flex and bison project on github, with a makefile. It is cross compilable for windows if necessary.

It is complex, but as you found out, what you need done is complex. There is a great deal of "stuff" that must be done for a properly working parser, and flex and bison deal with the mechanical bits. Otherwise, you find yourself in the unenviable position of writing code at the same abstraction layer as assembly.

Source Link
Spencer Rathbun
  • 3.6k
  • 1
  • 23
  • 28

What you need is a true parser. Regular expressions handle lexing, not parsing. That is, they identify tokens within your input stream. Parsing is the context of the tokens, I.E. who goes where and in what order.

The classic parsing tool is yacc/bison. The classic lexer is lex/flex. Since php allows for integrating C code, you can use flex and bison to build your parser, have php call it on the input file/stream, and then get your results.

It will be blazing fast, and far easier to work with once you understand the tools. I suggest reading Lex and Yacc 2nd Ed. from O'Reilly. For an example, I've set up a flex and bison project on github, with a makefile. It is cross compilable for windows if necessary.

It is complex, but as you found out, what you need done is complex. There is a great deal of "stuff" that must be done for a properly working parser, and flex and bison deal with the mechanical bits. Otherwise, you find yourself in the unenviable position of writing code at the same abstraction layer as assembly.