Nick Schrock

@schrockn

Founder of Elementl. Working on Dagster. Ex-Facebook Engineer. GraphQL co-creator.

Joined October 2008

Tweets

You blocked @schrockn

Are you sure you want to view these Tweets? Viewing Tweets won't unblock @schrockn

  1. Pinned Tweet
    Jul 8

    1/ Today we at Elementl are excited to launch an early release of Dagster, an open-source Python library for building data applications. Here's a post about what Dagster is, why I moved to data infra, why data is hard, and why we need a new system.

    Show this thread
    Undo
  2. Jul 8

    24/ We are also looking for additional founding team members! All the way from full stack, dev tools/PL folks to data eng/science. Must have a passion for tools and belief in abstractions to reshape more than dev workflow, but orgs and industries. DMs open or email (see above)

    Show this thread
    Undo
  3. Jul 8

    23/ We are early with this project and looking for just a few additional design partners/adopters to work with. The idea is to directly work/embed with your team and get into a fast feedback cycle etc to ensure that you are successful. DMs open or email hello at elementl dot com

    Show this thread
    Undo
  4. Jul 8

    22/ For the GraphQL-aware: Structurally this serves a similar role in data as GraphQL in the API domain. A software abstraction backed by arbitrary compute that one can build shared tooling on top of and deploy to any infrastructure. Type system, metadata etc software-defined.

    Show this thread
    Undo
  5. Jul 8

    21/ We believe that these issues are best addressed with a software abstraction. In this case, we believe there should be a layer that can describe and model a data app regardless of programming language, computational runtime, orchestration engine etc.

    Show this thread
    Undo
  6. Jul 8

    20/ We’re not claiming to “solve” testability, but providing a software structure make it more possible. We’re not claiming to make the impossible easy; we are claiming that we can make the impossible possible.

    Show this thread
    Undo
  7. Jul 8

    19/ High latency/computationally intensive make for extraordinary long developer feedback loop cycles. Can be hours when it ideally should be seconds. Changing the system very high cost. Can easily result in poorly structured systems with low code quality and low productivity.

    Show this thread
    Undo
  8. Jul 8

    18/ Really hard to test. They have dependencies on external, hosted services (e.g. Redshift, Snowflake) or heavyweight runtimes (e.g. Spark). Business logic encoded in these systems. Cannot faithfully mock out or fake. Doing so is too much effort.

    Show this thread
    Undo
  9. Jul 8

    17/ Data apps are multi-tool and -persona. Often you have analysts, eng, data eng/science all collaborating on the same logical app. They use a variety tools (spark, data warehouse, notebooks, python etc). Massive amount of context lost as data flow across tool boundaries.

    Show this thread
    Undo
  10. Jul 8

    16/ First data apps don’t control their inputs. A normal app can reject invalid input from users. Not true with data apps. Incoming data changes all the time. Can't update data so you have to update the code. Data apps must account for this unfortunate reality.

    Show this thread
    Undo
  11. Jul 8

    15/ We define data applications as graphs of functional computations that produce and consume data assets. They are increasingly complex and mission-critical to businesses today. They also require unique approaches because they have unique properties.

    Show this thread
    Undo
  12. Jul 8

    14/ We believe that ETL, ELT, ML Pipelines, data integration, etc are a single category of software. ETL produces a file/table; ML pipeline produces a model. Other than that structurally similar/identical: They are data applications.

    Show this thread
    Undo
  13. Jul 8

    13/ We believe the data domain is on the cusp of a similar transition, and we want help drive that. Data engs/scientists should no longer be stitching together disconnected jobs. They should be building full data applications.

    Show this thread
    Undo
  14. Jul 8

    12/ React also respected the discipline. Devs were not scripting web pages; they were building full apps. React acknowledged the *essential* complexity of this domain and built constructs to match that complexity. JS used to be considered eng backwater. No longer true.

    Show this thread
    Undo
  15. Jul 8

    11/ React got a lot of things right. Defined its domain well, nailed the abstraction for that domain, adopted formal comp sci constructs to frontend and made them approachable, and was both a step function improvement and incrementally adoptable.

    Show this thread
    Undo
  16. Jul 8

    10/ Fast forward 10 years, and no one says that anymore in frontend. Browsers got better but it is the software abstractions that proved decisive, especially but not exclusively React. People still complain, but no one really says they waste 80% of their time.

    Show this thread
    Undo
  17. Jul 8

    9/ Reminded me of the frontend ecosystem circa a decade ago. Back then engineers would say they spend "80% of their time fighting the browser, and 20% of their time building their app”. Again they said one thing and meant another. The problem was primarily software abstraction.

    Show this thread
    Undo
  18. Jul 8

    8/ Taking this statement literally one would work exclusively on making data cleaning faster. However that is not what people *mean*. They mean they waste lots of time. Building one-off infra, doing systemically repetitive things, unable to truly build on others work, etc.

    Show this thread
    Undo
  19. Jul 8

    7/ The most direct expression of this is when people say “I spent 80% of my time cleaning the data, and 20% of my time doing my job.” While they say that, they are actually describing deeper pathologies.

    Show this thread
    Undo
  20. Jul 8

    6/ Origin: I left FB in Feb ‘17 and started looking for my next challenge. I kept on hearing from people that their biggest tech problem was "their data is totally broken". I didn't understand what that meant initially.

    Show this thread
    Undo
  21. Jul 8

    5/ These computational graphs are (a) abstract and (b) queryable and operable over an API. They can be deployed to arbitrary compute targets, e.g. Airflow, Dask, FaaS, k8s-based engines. Dagster tools are shared regardless of physical compute substrates.

    Show this thread
    Undo

Loading seems to be taking a while.

Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

    You may also like

    ·