Testing, Deployment, and Maintenance Resources

DZone's Featured Testing, Deployment, and Maintenance Resources

Observability for the Invisible: Tracing Message Drops in Kafka Pipelines

By Prakash Wagle

When an event drops silently in a distributed system, it is not a bug, it is an architectural blind spot. In high-scale messaging platforms, particularly those serving real-time APIs like WhatsApp Business or IoT command chains, telemetry failures are often mistaken for application errors. But the root cause lies deeper: observability gaps in event streams. This article explores how backend engineers and DevOps teams can detect, debug, and prevent message loss in Kafka-based streaming pipelines using tools like OpenTelemetry, Fluent Bit, Jaeger, and dead-letter queues. If your distributed messaging system handles millions of events, this guide outlines exactly how to make those events accountable. It also briefly surfaces key production concerns around multi-tenant access control, communications security, and the potential for ML-powered anomaly detection in observability pipelines, areas that are foundational to modern, large-scale infrastructure. When Kafka Metrics Lie: Lag ≠ Delivery Say your Kafka consumer group reports zero lag. The pipeline appears stable. But a downstream service dashboard shows stale or missing data. bash $ kafka-consumer-groups.sh --bootstrap-server kafka:9092 --describe --group user-analytics TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG user-events 0 24567 24567 0 No visible lag. No alert. Still, payloads are missing. This is where many distributed pipelines fail, not with crashes, but with silent non-events. The system appears healthy but is semantically broken. Without deep traceability, your metrics are just performance theater. One senior infra engineer described this gap as the "perfect Kafka illusion", systems that deliver bytes, but not outcomes. This is where architecture, not tooling, must evolve. Add Trace Context to Kafka Consumers for Debug Visibility OpenTelemetry spans must be embedded at the event-handling layer of your consumer logic. This is the only way to establish causal visibility between an incoming payload and its downstream effect (or lack thereof). Java ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<String, String> record : records) { Span span = tracer.spanBuilder("consume_user_event") .setAttribute("topic", record.topic()) .setAttribute("partition", record.partition()) .setAttribute("offset", record.offset()) .startSpan(); try { processRecord(record); } catch (Exception e) { span.recordException(e); } finally { span.end(); } } By emitting trace data directly within the consumer loop, engineers can detect missing, slow, or corrupt message patterns that Kafka lag metrics alone can never reveal. Stop Schema Drift Before It Silently Fails Your Pipeline One of the most overlooked failure modes in stream processing is schema drift, when producers silently introduce changes that consumers cannot handle. This issue rarely crashes the system. Instead, it causes semantic degradation: fields go missing, types mismatch, events are misclassified. These failures are quiet, cumulative, and corrosive. Python from fastavro.validation import validate from my_schemas import user_event_schema if not validate(event_data, user_event_schema): logger.warning("Schema violation: %s", event_data) send_to_dead_letter_queue(event_data) Inline schema validation like this becomes your first line of defense. It converts structural drift into observable exceptions. Use DLQs as a Debugging Feed, Not a Graveyard, DLQs are too often a compliance checkbox. But in resilient systems, DLQs are an active observability layer. JavaScript function sendToDLQ(event, error) { const payload = { originalEvent: event, reason: error.message, timestamp: new Date().toISOString() }; dlqProducer.send({ topic: 'user-events-dlq', messages: [{ value: JSON.stringify(payload) }], }); } Surface these payloads into dashboards. Create alerts based on DLQ velocity. Feed them into analytics to track error provenance. DLQ monitoring is the canary for Kafka integrity. Fluent Bit + Trace ID = Cross-System Log Correlation Disconnected logs create blind spots. The moment you embed trace IDs into logs, metrics, and spans, they become a causal graph. ini [PARSER]Name trace-jsonFormat jsonTime_Key timeTime_Format%Y-%m-%dT%H:%M:%STrace_Keytrace_id Now your observability tools, Jaeger, Tempo, or Grafana, can visualize the full lifecycle of an event, across microservices and topics. This is not logging. This is distributed forensics. How Message Loss Surfaces in Real Systems Message loss rarely appears as a 500 error. Instead, it manifests as: - Stale dashboards: data pipelines silently stall - Missing audit trails: events never persisted - Downstream inconsistencies: analytics skewed by phantom gaps - SLO violations: alerts never trigger because triggers never arrived These symptoms confuse even experienced SREs. But they almost always trace back to the same root cause: the message was sent, but it was never seen. Design Your Messaging Infrastructure for Message Loss, Not Just Load Most streaming architectures optimize for throughput. Few account for invisible failure conditions like: - Missing timestamps - Unparsable payloads - Clock skew or out-of-order arrival - Regional lag or partition mismatch Key observability controls: - Trace every Kafka consumer event - Validate schemas at ingest - Route and monitor DLQ flows - Correlate logs via trace ID across all layers Security and data integrity also matter: in multi-tenant architectures, each trace must be attributable to a tenant ID or scoped identity token. This is where IAM enforcement and access-aware observability become critical. Trace IDs should be tied to access contexts. Looking ahead, some teams are already experimenting with ML anomaly detection over trace flows and DLQ growth rates, flagging novel failure patterns before they affect downstream SLOs. You cannot eliminate message loss. But you can eliminate its invisibility. For engineers maintaining distributed systems with Kafka or similar queues, observability must be designed into the system, not added after incident response. Instrument deeply. Trace cross-service flow. Monitor tenant-level behavior. Learn from what breaks. More

How to Use ALB as a Firewall in IBM Cloud

By Pari Biswas

Do you have a use case where you want to implement a network firewall in IBM Cloud VPC that filters traffic based on hostname? For example, you may want to allow connections only to www.microsoft.com and www.apple.com, while blocking access to all other destinations. Currently, IBM Cloud does not provide a managed firewall service. However, it does support a bring-your-own-firewall approach with vendors such as Fortinet or Juniper, though customers are responsible for deploying and managing these solutions. This article explains how you can leverage existing IBM Cloud services to address this requirement. By combining IBM Cloud DNS and the Application Load Balancer (ALB), you can implement a practical solution. Let’s take a closer look at how this works. The diagram below illustrates the high-level architecture pattern: Figure 1: Architecture pattern for ALB as Firewall All source hosts reside in subnets that are restricted using Network Access Control Lists (ACLs) and Security Groups. These ACLs and Security Groups are configured so that no resource in the subnet can access the public network directly. Instead, they are allowed to communicate only with the Application Load Balancer (ALB) over the private network (using its private IP). The ALB is placed in a dedicated subnet, and its Security Group allows outbound connections to the public network. The final piece of this architecture is the private DNS service. This DNS is configured to resolve only selected hostnames to the ALB. In our example, the allowed hostnames are apple.com and microsoft.com. Any virtual server in the source subnet attempting to reach other hostnames will not be resolved to the ALB, and with the Security Group restrictions in place, it will also be unable to connect to the public internet. Now, let’s do a step-by-step deep dive into this solution. First step in the process is to create a private DNS Add DNS zones for the hostnames you want - example: apple.com and microsoft.com Make sure to set the permitted network for each of these zones as the VPC you are using for this configuration. Once a permitted network is added, the dns zones get active as shown in the picture above. You will have to create canonical name for the load balancer in these zones to resolve to the load balancer. Hence, you need to create the Application Load Balancer first. Follow the steps outlined here to create a ALB with following configuration: The ALB needs to be PrivateUse a dedicated subnet for the ALB. The reason for being it in a dedicated subnet is that this subnet will need to be allowed to connect to public network. Now that you have the ALB is created, you can add CNAME in each zone as shown in the image below (as an example). This creates an alias for the load balancer. That's all we need from the DNS side. Let's turn our attention to the ALB now. In the ALB, create a back-end pool for each zone you created in the DNS. In our example, we used one pool each for Microsoft and Apple. The pool details should match to the figure below (use protocol as TCP). In each of the back-end pool, you will have to add members for your destination. To do that, you first need to get the public IP for your destination. You can use nslookup to get the IPs. Once you get the IPs, add a member to the appropriate back-end pool with each of the IPs. Select the pool you want to add member toClick on the Members tabClick on Attach Member Select Other Devices tabAdd IP of the destination you want it to reach to and appropriate port (443) Once you add the members, make sure the health statuses are "Passing". If it is not passing, there could be one or both of the two following reasons: IP/port you provided is not correctSecurity Group of the ALB and subnet ACL rules do not allow outbound traffic Next step in the process is to create and configure a front-end listener for the ALB. Go to Front-end listeners tabSelect create listenersProvide a name and select TCP as protocolSelect listener port as 443 The final steps is to configure front-end listener policies. You need to create one policy per back-end pool. A listener policy will have forwarding rules that matches with the host name to send the request to pool you will be specifying here. In our example, we need to specify that when the SNI hostname (Server Name Indication) matches to "www.apple.com", it should forward to the pool we created for "apple.com" and similar process for "Microsoft.com". Here is an example of the listener policy: Make sure you add one policy per the hostname you are trying to reach. In our example one policy for "www.apple.com" and one policy for "www.microsoft.com". The last step you need to make sure that source hosts from where you are trying to connect to external host has the security group setup to reach the load balancer. The final data flow of the traffic is now setup: VSI from you VPC tries to connect to external site ("www.apple.com")The Private DNS resolves this hots name to the ALBALB listener forwards the request to the matching ALB poolALB pool forwards the request to one of the member IPs Tips for testing: From any virtual server in the source subnet, fire a curl commend - for example "curl https://www.apple.com" . You should get a response back from apple.com. If you try the curl command for any web url other than what you configured in the ALB, you will not get any response. Known limitation of this architecture pattern: The external hosts IPs can change. This configuration cannot detect any IP change event. It needs to be manually updated in the back-end pool.This configuration works with DNS CNAME. This inherits a limitation that the name of the hostname has to be prefixed by the CNAME name field. In our example, "www" needs to appear in the hostname. So will work for "www.apple.com", but will not work for "apple.com". More

Developing a Nationwide Real-Time Telemetry Analytics Platform Using Google Cloud Platform and Apache Airflow

By Sruthi Erra Hareram

Stop Leaking Secrets: The Hidden Danger in Test Automation and How Vault Can Fix It

By Pradeepkumar Palanisamy

*You* Can Shape Trend Reports: Join DZone's Database Systems Research

By DZone Editorial

How Healthy Is Your Data in the Age of AI? An In-Depth Checklist to Assess Data Accuracy, Governance, and AI Readiness

Editor's Note: The following is an article written for and published in DZone's 2025 Trend Report, Data Engineering: Scaling Intelligence With the Modern Data Stack. Data has evolved from a byproduct of business processes to a vital asset for innovation and strategic decision making, and even more so as AI's capabilities continue to advance and are integrated further into the fabric of software development. The effectiveness of AI relies heavily on high-quality, reliable data; without it, even the most advanced AI tools can fail. Therefore, organizations must ask: How healthy is our data? Whether initiating a new AI project or refining existing data pipelines, this checklist provides a structured framework that will not only guarantee the success of your AI initiatives but also cultivate a culture of data responsibility and long-term digital resiliency. Ensuring Data Quality Across Architectures, Models, and Monitoring Systems Data quality is the backbone of an AI system's integrity and performance. As AI applications become ubiquitous across diverse industries, the reliability of the data that our AI model learns from and runs on is crucial. Even the most advanced algorithms may fail to deliver appropriate and unbiased results when fed with low-quality data, consequences that can be costly in many ways. Moreover, biased data may extend or strengthen existing societal and economic disparities and, consequently, make unjustified decisions. 1. Assess the Core Dimensions of Data Quality Evaluating the health of your data should cover the core dimensions of data quality: accuracy, completeness, consistency, timeliness, and validity. These dimensions play a critical role in realizing a robust, ethical, and trustworthy AI solution that will be reliable and succeed in meeting its potential: Accuracy Confirm that data values are correct and error freeEnforce validation checks (e.g., dropdowns, input masks) at data entryAutomatically and regularly cross-check data against trusted sources and known standards (e.g., via address validation APIs)Implement mechanisms to tag abnormalities in real time Completeness Ensure all required fields in forms and ingestion pipelines are populatedTrace missing values to specific sources or systemsIdentify recurring gaps in critical data using profiling toolsTrack completeness over time to determine data gaps or failed integration Consistency Implement single naming standards, codified code lists, and standard data types on ETL processesCreate and maintain a data dictionary that each team uses when field mappingReconcile redundant datasets regularly to identify and eliminate discrepancies Uniqueness Detect duplicate records (e.g., customer profiles)Ensure primary keys are unique and enforced strictly Timeliness Identify requirements of your use case (e.g., monthly reports with a batch load)Ensure data is up to date and available when neededMonitor latency between data generation and delivery, and send a warning if SLAs are at riskAlign ingestion frequencies (hourly, daily, real time) with stakeholder requirements Validity Perform schema validation automatically on ingestion against a metadata registry (e.g., data type, structure, and format)Use automated validators to flag, quarantine, or discard outliers and invalid recordsConfirm that deduplication logic is embedded in ETL jobsCheck and monitor validity regulations regularly as business needs change Integrity Enforce database constraints (e.g., primary keys, foreign keys) to maintain referential integrityExecute cross-table validation scripts to detect inconsistencies and reference violations across related tablesTrack data lineage metadata to verify that derived tables accurately map back to their source systemsVerify parent-child relationships between related tables during routine data quality audits 2. Monitor Data Quality Continuously As systems evolve, data should be monitored continuously to maintain reliability. Putting the right checks in place (e.g., automated alerts, performance metrics) makes it easier to catch problems early without relying on manual reviews. When these tools are integrated into daily workflows, teams can respond faster to issues, reduce risk, and build trust in the data that powers their analytics and AI systems across the organization: Implement automated tools to detect anomalies (e.g., nulls, schema drift)Automate profiling and integrate into pipelines before production deploymentProfile datasets regularly and align frequency with data volatility (e.g., daily, weekly)Integrate checks into ETL workflows with alerts and custom rules for batch/streaming dataEliminate manual checks using threshold logic and statistical anomaly detection Create dashboards that display key metrics; use targets and color indicators to highlight issues and track trendsEnable drill-down views to trace problems to their source Assign data quality ownership across teams with defined KPIsPromote shared accountability through visibility and ongoing reporting 3. Strengthen Data Governance and Ownership Strong data governance and clearly assigned data ownership are the foundation of high-quality data. Governance defines how data is accessed, secured, and used across an organization, while ownership ensures accountability for the data's accuracy and proper use. Together, they reduce risk, improve consistency, and turn data into a reliable business asset. With clear roles, well-documented policies, and proactive oversight, organizations can build trust in their data and meet regulatory demands without slowing innovation: Assign data owners to oversee dataset strategy, access, and quality for key datasetsDesignate data stewards to enforce governance standards and monitor data qualityEstablish core policies for access control, retention, sharing, and privacyCreate and maintain a data catalog to centralize metadata and improve data discoverabilityDefine data quality processes for monitoring, cleansing, and enhancing data throughout its lifecycle Document and distribute governance policies covering usage, compliance, and security expectationsIntegrate governance controls into existing workflows and tools for enforcementTrack compliance metrics to measure policy adherence and identify gapsReview and update governance practices regularly to keep pace with organizational and legal changesPromote a culture of responsibility around data through visibility and training 4. Track Data Lineage and Traceability Understanding where data comes from, how it's transformed, and where it flows is crucial for debugging issues, meeting compliance requirements, and building trust. Data lineage provides that visibility, capturing the full history of every dataset across your ecosystem. From initial ingestion to final output, traceability helps ensure accuracy, enable audits, and support reproducibility. Implementing solid lineage practices with change tracking and version control creates transparency across both technical and business users: Map data origins and transformations across pipelines, including API sources, transactional systems, and flat filesCapture lineage metadata to log merges, filters, and transformations for full processing visibilityIntegrate lineage tools with ETL processes to track changes from ingestion to outputLog schema changes and dataset updates with metadata on who changed what, when, and whyMaintain a version history for key datasets to support rollback and auditability Use version control tools to manage schema evolution and prevent conflicting updates in collaborative environmentsRetain historical lineage and transformation records to ensure reproducibility of resultsTrace anomalies to their source with minimal friction to support audits and investigationsLink lineage insights with change logs and data dependencies to facilitate impact analysis 5. Validate Readiness for AI and Machine Learning Preparing data for AI and machine learning requires thoughtful structuring and labeling, plus mitigating bias and ensuring the richness needed for deeper, more accurate predictions. Whether you're building a classification model or a real-time recommendation engine, upfront investment in data quality pays off in model performance, trust, and fairness: Label datasets with clear, granular, and compliant tags that match AI/ML model objectivesOrganize data into feature stores or structured tables with consistent formats, column names, and typesInclude essential metadata (e.g., timestamps, data source origins)Remove duplicates, fill or impute missing values, and standardize formats to reduce training errorsValidate column consistency to prevent schema mismatches during modelingDocument preprocessing steps to support reproducibility and troubleshootingDetect bias in features and outcomes using statistical tests (e.g., disparate impact ratio) Visualize demographic and feature distributions to surface imbalance or overrepresentationApply mitigation techniques (e.g., re-sampling, synthetic data generation)Track audit results and interventions to maintain transparency and meet regulatory standardsInclude fine-grained data (e.g., geolocation, user logs) for deeper modelingAugment with external sources (e.g., demographics, economic indicators) where relevantEnsure datasets are dense enough to support pattern recognition and generalization without noise or sparsity 6. Ensure Data Security and Compliance As industry and global regulations evolve and data volumes grow, ensuring privacy and protecting sensitive information is essential. Compliance frameworks like GDPR, CCPA, and HIPAA set legal expectations, but it's the combination of policy, process, and technical safeguards that keeps data protected and organizations accountable. Meeting these requirements, which can be done through the following steps, builds trust and reduces the risk of costly violations: Map datasets that include personal or regulated information across systemsAudit consent management, user rights (access, correction, deletion), and breach notification proceduresReview data residency requirements and ensure processing aligns with legal boundariesDocument processing activities to support audits and demonstrate accountabilityPartner with legal, privacy, and security teams to track regulation changes Mask sensitive fields when using data in non-production or analytic environmentsEncrypt data at rest and in transit using TLS/SSL and secure encryption standardsApply field-level encryption for high-risk values (e.g., payment data)Enforce RBAC to restrict data access based on job functionImplement key management and rotation policies to protect decryption credentialsCombine masking and encryption to reduce the impact of any potential data breach 7. Invest in Culture and Continuous Improvement Data quality requires sustained effort, clear processes, and a culture that values accuracy. By building structured review cycles and open feedback loops, and investing in data literacy, organizations can improve the reliability of their data while remaining aligned with their evolving AI and analytics needs. A consistent commitment to improvement ensures long-term value and trust in your data assets: Schedule regular data quality reviews (monthly, by delivery cycle)Evaluate core quality dimensions against historical benchmarksDocument issues, trends, and resolutions to create a living archive of quality progressIntegrate assessments into governance workflows to ensure accountabilitySet up clear communication channels between data producers and consumers Troubleshoot collaboratively to resolve issues quickly and define new data needsHighlight how upstream actions affect downstream outcomes to promote shared ownershipInvest in data training programs to improve awareness of quality and responsible AI useEstablish stewardship roles within each department to lead local quality effortsCelebrate quality improvements to reinforce positive behaviors Conclusion The impact of any AI or analytics initiative depends on the quality of the data behind it. Inaccurate, incomplete, or outdated data can erode trust, produce misleading results, waste valuable resources, and cause costly consequences. To avoid these pitfalls, organizations must take a well-rounded and comprehensive approach: assess data quality across the key dimensions, perform ongoing monitoring, adhere to governance and compliance practices, establish continuous feedback loops, and take action where gaps exist. As regulations evolve and data demands grow, building a culture that values quality will set your organization apart. Ultimately, this entails regular reviews, targeted training, and investing in tools that embed data quality into everyday practices. Using this checklist as a guide, you can take practical, proactive steps to strengthen your data and lay the foundation for responsible, high-impact AI. The payoff is clear: better decisions, greater trust, and a durable competitive advantage in a data-driven world. Additional resources and related reading: "Data Governance Essentials: Policies and Procedures (Part 6)" by Sukanya Konatam"AI Governance: Building Ethical and Transparent Systems for the Future" by Sukanya KonatamGetting Started With Data Quality by Miguel Garcia Lorenzo, DZone RefcardData Pipeline Essentials by Sudip Sengupta, DZone RefcardOpen-Source Data Management Practices and Patterns by Abhishek Gupta, DZone RefcardMachine Learning Patterns and Anti-Patterns by Tuhin Chattopadhyay, DZone RefcardAI Automation Essentials by Tuhin Chattopadhyay, DZone RefcardGetting Started With Agentic AI by Lahiru Fernando, DZone RefcardAI Policy Labs This is an excerpt from DZone's 2025 Trend Report, Data Engineering: Scaling Intelligence With the Modern Data Stack.Read the Free Report

By Sukanya Konatam

Implementing Scalable IoT Architectures on Azure

The Internet of Things (IoT) comprises smart devices connected to a network, sending and receiving large amounts of data to and from other devices, which generates a substantial amount of data to be processed and analyzed. Edge computing, a strategy for computing on location where data is collected or used, allows IoT data to be gathered and processed at the edge, rather than sending the data back to a data center or cloud. Together, IoT and edge computing are a powerful way to rapidly analyze data in real-time. In this Tutorial, I am trying to lay out the components and considerations for designing IoT solutions based on Azure IoT and services. Azure IoT offers a robust, flexible cloud platform designed to handle the massive data, device management, and analytics that modern IoT systems demand. Why Choose Azure IoT? Key advantages include: Scalability: Whether a handful of devices or millions, Azure’s cloud infrastructure scales effortlessly.Security: Built-in end-to-end security features protect data and devices from cyber threats.Integration: Seamlessly connects with existing Microsoft tools like Azure AI, Power BI, and Dynamics 365.Global reach: Microsoft’s global data centers ensure low latency and compliance with regional regulations. Core Azure IoT Components Azure IoT Hub: Centralized management of IoT devices with secure, bi-directional communication.Azure Digital Twins: Create comprehensive digital models of physical environments to optimize operations.Azure Sphere: Secure microcontroller units designed to safeguard IoT devices from threats.Azure Stream Analytics: Real-time data processing and analysis to enable immediate decision-making. For businesses aiming for scale, Azure provides tools that simplify device provisioning, firmware updates, and data ingestion — all while maintaining reliability. How to Build Scalable IoT Solutions With Azure IoT With Azure IoT Hub, companies can manage device identities, monitor device health, and securely transmit data. This reduces manual overhead and streamlines operations. Azure IoT’s layered approach includes: Hardware-based security modules (Azure Sphere)Device authentication and access controlData encryption at rest and in transitThreat detection with Azure Security Center This comprehensive security framework protects critical business assets. Successfully leveraging Azure IoT requires deep expertise in cloud architecture, security, and integration. IoT consultants guide businesses through: Solution design aligned with strategic goalsSecure device provisioning and managementCustom analytics and reporting dashboardsCompliance with industry regulations This ensures rapid deployment and maximized ROI. Core Building Blocks of a Scalable IoT Solution There are six foundational components: Modular edge devices: Using devices capable of handling more data types, protocols, or workloads prepares the system for future enhancementsEdge-to-cloud architecture: Real-time processing at the edge with long-term analytics in the cloud—is critical for responsiveness and scaleScalable data pipelines: This includes event streaming, transformation, and storage layers that can dynamically adjust.Centralized management and provisioning: Remote provisioning tools and cloud-based dashboards that support secure lifecycle management.Future-ready analytics layer: Integrating a cloud-agnostic analytics engine — capable of anomaly detection, predictive maintenance, and trend analysis.API-first integration approach: APIs ensure that the IoT system can integrate with existing asset management tools and industry-specific software. Mistakes to Avoid When Scaling IoT Skipping a pilot that includes scale planning: Don’t just prove it works — prove it grows.Building for today’s traffic only: Plan for 10X the number of devices and data volume.Locking into one vendor without flexibility: Use open APIs and portable formats to reduce vendor risk.Treating security as a plug-in: It must be designed from the start and built into every component.Underestimating operational complexity: Especially when support, maintenance, and updates kick in. Key Practical Challenges and Solutions for Scalable IoT 1. Edge Processing and Local Intelligence Devices that only collect data aren’t scalable. They need to filter, compress, or even analyze data at the edge before sending it upstream. This keeps bandwidth manageable and lowers latency for time-sensitive decisions. 2. Cloud-Native Backend (Azure IoT) The backend is where most scale issues live or die. Choose cloud-native platforms that provide: Autoscaling message brokers (MQTT, AMQP)Managed databases (for structured + time-series data)Easy integrations with analytics toolsSecure API gateways 3. Unified Device Management A pilot with 10 sensors is easy. Managing 10,000 across countries is not. Invest early in device lifecycle management tools that: Handle provisioning, updates, and decommissionsTrack firmware versions and configurationsProvide automated alerts and health checks This is where experienced IoT consultants can guide you in picking a platform that matches your hardware and business goals. 4. Scalable Security and Access Controls Security is about ensuring that only the right users, systems, and apps have access to the right data. Key points to consider: Role-based access control (RBAC)Multi-tenant security layers (if you serve multiple customers or sites)End-to-end encryption across every nodeRegular key rotation and patch automation Scalability means being able to onboard 500 new devices without creating 500 new headaches. 5. Data Governance and Normalization Imagine 50 device types all reporting “temperature” — but each one does it differently. That’s why standardized data models and semantic labeling matter. Your architecture should include: Stream processing for cleanupSchema validationData cataloging and taggingIntegration with your BI and ML systems Smart IoT strategy ensures you don’t drown in your own data once scale hits. Scalability in IoT isn’t about planning for massive growth — it’s about removing obstacles to growth when it happens. Whether it’s 10 sensors today or 10,000 tomorrow, your architecture should support the same performance, security, and agility. As IoT continues to evolve, Azure will undoubtedly remain at the forefront of this exciting and transformative field, helping businesses drive innovation and stay competitive in an increasingly connected world. Learn more about Azure IoT here.

By Bhimraj Ghadge

Do You Understand the Debt You Have to Pay?

I was fortunate to start my career with people who truly cared about code quality. Early on, I learned why this matters and how continuous attention to quality positively impacts customer satisfaction. This experience made it natural for me to improve legacy code and constantly seek further enhancements. However, at the beginning of my journey, my perspective was narrow—I saw only the code. So, my efforts focused solely on refactoring. Fast forward to today, I've learned that there are many more ways to improve software. There are also several strategies you can explore to choose the right approach for your situation. Improvements = Change It’s important to remember that any refactoring, redesign, or re-architecture introduces a risk of breaking something that currently works. While in theory these changes shouldn't break functionality, in practice, the risk varies across applications and depends heavily on the development practices surrounding them. But make no mistake: the risk is real. If you decide to address technical debt, accept that risk and communicate it clearly with everyone involved. There are ways to mitigate and manage it effectively, but something may still go wrong. Transparency helps ensure that one failure doesn't undermine the overall improvement effort. What Strategies Do You Have? Leave It As It Is Our inner perfectionist may protest, but sometimes the best approach to dealing with legacy systems is to leave them untouched. No refactoring, no redesign—just minimal maintenance as needed. Focus your energy elsewhere, particularly when under pressure from other priorities. Continuous Refactoring Not every project needs a full modernization or complete re-architecture. In some cases, ongoing refactoring can make a huge difference. Within months, you may observe improvements in both code quality and delivery speed. Introducing Code Architecture Sometimes, refactoring alone isn't enough. Certain applications need a deliberate plan toward an agreed-upon architecture. This architecture must be well-communicated, understood, and supported by the entire team. Everyone should know how to move toward it. Divide and Conquer Some systems are large enough to support multiple subdomains. In such cases, visualizing and implementing boundaries between them helps maintain consistent language, encapsulate implementation details, and reduce cognitive load during development. How to Make the Right Decision? Here are some questions that can help you make a more informed decision. Remember, context matters. The more you learn and data you gather, the better your decision will be. Should I Leave It As It Is? If the application is running smoothly and nobody is actively working on it, you might be able to leave it alone. Still, assess potential upgrades to the language, framework, libraries, or infrastructure. Sometimes it makes sense to invest in reducing technical debt to make future upgrades less risky. Understand how the application aligns with your company's strategy. If it’s scheduled for decommissioning, your efforts might be better spent elsewhere. However, even in this scenario, check what the decommissioning process entails. If migrating data or users is required, some improvements may still be worthwhile to minimize risk and pain later. Can Continuous Refactoring Pay the Debt? Refactoring is a great way to continuously counteract code degradation. But when is it a suitable strategy for legacy code? If the application is less than a year old, continuous refactoring can lead to impressive improvements in a short time. The codebase is likely still small enough to be manageable, and upcoming features can be implemented with better design in mind. For older applications (2–3 years or more), ask yourself: how many teams are involved? If a single team is maintaining it and delivery pace remains steady, continuous refactoring may still work. But at this stage, it may be time to consider other strategies as well. Will a New Code Architecture Be Enough? If the application is over a year old and maintained by one or two teams, introducing a shared architectural vision can help. A well-defined architecture gives engineers clarity on the domain and guides how to add new features. With only one or two teams, it’s easier to make collective decisions. The risk of misalignment is low, and such discussions shouldn't block delivery. But if more than two teams are involved, another strategy may be more appropriate. Divide and Conquer Now we’re talking about complex systems with multiple teams actively working on them. In this case, clearly define boundaries within the platform and assign code ownership to each team. These boundaries may be architectural, infrastructural, or both. Goals include: Eliminating shared code and infrastructure where possibleEnabling team autonomy to improve their respective code areas Reducing inter-team dependencies accelerates delivery. While cross-team collaboration is valuable, autonomy helps teams make long-term improvements with confidence, knowing their decisions won't be undone by others. Summary There are many nuances to the above strategies, but even this level of detail provides a good starting point. I encourage you to challenge these ideas and ask questions in the comments—I’d be happy to explore this topic further. And remember: Not every legacy system needs to be refactoredA new architecture may not always helpA rewrite isn’t always the best choiceSometimes, leaving things as they are can be a smart move

By Sebastian Malaca

Scaling Playwright Test Automation: A Practical Framework Guide

As web applications become increasingly dynamic and feature-rich, the complexity of ensuring their quality rises just as fast. Playwright has emerged as a powerful end-to-end testing tool, supporting modern browsers and offering capabilities like auto-waiting, multi-browser testing, and network interception. But writing isolated test cases is only a small part of successful automation. To support maintainability, collaboration, and long-term scalability, a structured test automation framework is essential. This article walks through how to build a scalable Playwright testing framework from scratch, with clean architecture, modular design, reusable components, and CI/CD readiness. Whether you're starting fresh or refining an existing setup, this guide will help you design your Playwright test suite the right way. Why a Testing Framework Matters It's tempting to dive into Playwright by writing a few test scripts to validate core user flows. While that’s a great starting point, it quickly becomes clear that as your application evolves, raw scripts alone can’t keep up with the growing test complexity. Some common challenges include: Copy-pasting login and setup steps across multiple filesDifficulty switching between staging, production, and local environmentsHardcoded selectors and inconsistent naming conventionsLack of a shared structure as more contributors join the project A well-designed testing framework addresses these issues by: Promoting code reuse through utilities, fixtures, and page objectsEncouraging the separation of concerns, so that configuration, test logic, and UI interaction are decoupledImproving collaboration, especially in teams where multiple people contribute to automation The goal is to create a flexible foundation that can support hundreds — or even thousands — of tests without becoming a tangled mess. Project Architecture: Folder Structure That Scales Before writing any test cases, it’s crucial to define a clear and logical folder structure. A consistent layout not only helps organize your code but also makes onboarding, debugging, and scaling much easier as the test suite grows. Here’s a recommended structure for a scalable Playwright project: Plain Text playwright-framework/ ├── tests/ # All test specs go here (e.g., login.spec.ts, checkout.spec.ts) ├── pages/ # Page Object Models to encapsulate UI logic ├── fixtures/ # Custom test fixtures for shared setup/teardown ├── utils/ # Helper methods, test data generators, custom loggers, etc. ├── config/ # Environment-specific configs (URLs, credentials, etc.) ├── reports/ # Output folder for HTML, Allure, or JSON test reports ├── playwright.config.ts # Global configuration for Playwright └── package.json # Project metadata and NPM dependencies Why This Structure? Each folder plays a specific role in keeping your framework modular: tests/: Your actual test cases. Keeping them separate from logic files ensures they stay focused on validation, not implementation.pages/: This folder holds Page Object Model classes — these abstract UI interactions like login, navigation, or form input, making tests easier to write and maintain.fixtures/: Custom test fixtures can set up data, sessions, or test state before the tests run. More on this later.utils/: Handy for storing shared functions like random data generators, timeouts, file handlers, etc.config/: Allows switching environments (e.g., dev, staging, production) by changing a single file or flag.reports/: Keeps test reports and media assets organized.playwright.config.ts: The central configuration hub, defining how Playwright behaves during test runs. Setting Up playwright.config.ts: The Nerve Center of Your Test Suite One of the first — and most important — steps when building a Playwright framework is configuring the playwright.config.ts file. Think of it as the control panel for your entire test run: it defines what gets executed, how it behaves, and under what conditions. Here's a breakdown of what a well-thought-out configuration looks like: TypeScript import { defineConfig, devices } from '@playwright/test'; export default defineConfig({ testDir: './tests', timeout: 60000, expect: { timeout: 5000, }, retries: 1, reporter: [['html'], ['list']], use: { baseURL: 'https://staging.myapp.com', headless: true, video: 'retain-on-failure', screenshot: 'only-on-failure', }, projects: [ { name: 'Chromium', use: { ...devices['Desktop Chrome'] } }, { name: 'Firefox', use: { ...devices['Desktop Firefox'] } }, { name: 'WebKit', use: { ...devices['Desktop Safari'] } }, ], }); A Few Things to Note testDir: Defines where your test specs are located. Keep this consistent with your folder structure.Timeouts: Use global and expectation-level timeouts to handle slower environments without masking real performance issues.Retries: Enable retries (carefully). One retry can save CI jobs from flakiness without hiding actual bugs.Reporters: HTML is useful locally; a CLI reporter like list is helpful in CI pipelines. You can add Allure, JSON, or custom reporters too.Screenshots and video: Capturing failures can drastically speed up debugging. Use retain-on-failure instead of recording every test.Multi-browser support: This example runs tests on Chrome, Firefox, and Safari via WebKit. Great for catching browser-specific issues early. Page Object Model (POM): Keep Tests Clean and Focused As your test suite grows, UI interactions tend to repeat — filling forms, clicking buttons, logging in. Hardcoding these actions in every test quickly leads to clutter and duplication. Page Object Model (POM) solves this by separating UI logic into reusable classes. Example: LoginPage.ts TypeScript export class LoginPage { constructor(private page: Page) {} async goto() { await this.page.goto('/login'); } async login(user: string, pass: string) { await this.page.fill('#username', user); await this.page.fill('#password', pass); await this.page.click('text=Login'); } } Test: login.spec.ts import { test, expect } from '@playwright/test'; import { LoginPage } from '../pages/LoginPage'; test('should login successfully', async ({ page }) => { const login = new LoginPage(page); await login.goto(); await login.login('user', 'pass'); await expect(page).toHaveURL('/dashboard'); }); Fixtures: Shared Setup Without the Repetition Fixtures in Playwright are a powerful way to share setup and teardown logic across tests, like logging in, creating test data, or bootstrapping a user session. Instead of repeating setup steps in every test, define them once in a custom fixture. Example: fixtures.ts TypeScript import { test as base } from '@playwright/test'; export const test = base.extend({ authenticatedPage: async ({ page }, use) => { await page.goto('/login'); await page.fill('#username', 'user'); await page.fill('#password', 'pass'); await page.click('text=Login'); await use(page); }, }); Usage: dashboard.spec.ts TypeScript import { test, expect } from '../fixtures'; test('dashboard loads for logged-in user', async ({ authenticatedPage }) => { await authenticatedPage.goto('/dashboard'); await expect(authenticatedPage).toHaveText('Welcome'); }); Utilities and Test Data: Keep Logic Out of Your Tests Tests should describe behavior, not manage random data, time formatting, or file operations. That’s where utilities come in — move repetitive logic out of your test files and into a separate utils/ folder. Example: generateUser.ts TypeScript export function generateUser() { return { username: `user_${Date.now()}`, password: 'Test@1234', email: `user_${Date.now()}@test.com`, }; } Example: waitForDownload.ts TypeScript export async function waitForDownload(page) { const [download] = await Promise.all([ page.waitForEvent('download'), page.click('text=Download Report'), ]); return download.path(); } Final Thoughts Getting started with Playwright is simple, but scaling your tests is where the real work begins. A clean folder structure, reusable page objects, shared fixtures, and utility functions go a long way in keeping your framework organized and future-proof. You don’t need to over-engineer things on day one. Start small, stick to good practices, and evolve your framework as your project grows. The goal is to write tests that are easy to understand, easy to maintain, and hard to break. With the right structure in place, Playwright can be much more than just a testing tool — it becomes a solid part of your quality engineering strategy.

By Priti Gaikwad

Deploying a Scalable Golang Application on Kubernetes: A Practical Guide

Golang is the finest programming language for constructing applications that can scale well and at high density due to the concurrency and performance inherent in the language itself. Kubernetes is the best standard for container orchestration, which gives a platform for deploying, managing, and scaling applications. Together, they constitute a formidable pair for creating unobtrusive and bulletproof microservices. This blog will lead readers through the process of deploying a scalable Golang application on Kubernetes, highlighting essential considerations alongside the more practical 'doing it' steps. Why Golang and Kubernetes? Before we go into the how-tos, let's just touch briefly on the whys: With Golang for performance and concurrency: Go's extremely efficient lightweight goroutines and channels enable this concurrency. The fast compilation time and statically linked binaries make deployments easy. Kubernetes for scalability and resilience: A containerized application can be deployed, scaled, and managed with Kubernetes. It also enables self-healing, load balancing, and rolling updates, ensuring that the application remains up and fully functional even under heavy load. Key Considerations for Scalability in Golang Applications The following are certain principles you may wish to consider while designing your application and making the best use of Kubernetes for your Go applications: Statelessness: The Go application should be designed as stateless, meaning that no session data or persistent state should be stored within the application instances in memory. Instead, all states should be handled by leveraging external services such as databases (Postgres, MongoDB), caching services (Redis, Memcached), or message queues (Kafka, RabbitMQ). This will allow Kubernetes to scale your pods freely without any risk of data loss or data consistency.Concurrency: Go channels and goroutines should be employed for concurrent processing. This allows one Go instance to spawn multiple requests and thus can truly take advantage of the CPU resources. Graceful shutdowns: The Go application must have graceful shutdown logic. On receiving a terminate command from Kubernetes, the application should finish processing ongoing requests and free its resources before exiting. This way, no need for cleanup has to be made to protect against data corruption or dropped connections. Listening to a `SIGTERM` signal is only one common way to implement this.Configuration: The application will store its configuration externally through environment variables or config files. The application thus becomes highly portable and configurable within Kubernetes without needing to build a new Docker image.Health checks: Your app should provide HTTP endpoints for liveness and readiness checks. Kubernetes uses these to ascertain whether your application is up and running, ready to handle traffic, and healthy. Step-by-Step Deployment on Kubernetes Let's walk through the process with a hypothetical simple Go API. 1. Containerize Your Golang Application First, you need a `Dockerfile` to package your Go application into a Docker image. Go package main import ( "fmt" "log" "net/http" "os" "os/signal" "syscall" "time" ) func main() { http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(w, "Hello from Golang App! Pod: %s\n", os.Getenv("HOSTNAME")) }) port := os.Getenv("PORT") if port == "" { port = "8080" } server := &http.Server{Addr: ":" + port} // Start HTTP server in a goroutine go func() { log.Printf("Server starting on port %s...", port) if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed { log.Fatalf("Could not listen on %s: %v\n", port, err) } }() // Graceful shutdown sigChan := make(chan os.Signal, 1) signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM) <-sigChan // Block until a signal is received log.Println("Shutting down server gracefully...") shutdownCtx, cancel := time.WithTimeout(server.Context(), 5*time.Second) defer cancel() if err := server.Shutdown(shutdownCtx); err != nil { log.Fatalf("Server shutdown failed: %v", err) } log.Println("Server gracefully stopped.") } Dockerfile # Use a minimal base image for the build stage FROM golang:1.22 AS builder WORKDIR /app # Copy go mod and sum files to download dependencies COPY go.mod go.sum ./ RUN go mod download # Copy the source code COPY . . # Build the Go application RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main . # Use a scratch image for the final, very small image FROM alpine:latest WORKDIR /root/ # Install ca-certificates for HTTPS calls if needed (optional but recommended) RUN apk --no-cache add ca-certificates # Copy the compiled binary from the builder stage COPY --from=builder /app/main . # Expose the port your application listens on EXPOSE 8080 # Run the executable CMD ["./main"] Build and push the Docker images to the registry: docker build -t your-docker-repo/golang-app:1.0.0 . docker push your-docker-repo/golang-app:1.0.0 2. Define Kubernetes Manifests Now, let's create the Kubernetes resource definitions. A. deployment.yaml YAML apiVersion: apps/v1 kind: Deployment metadata: name: golang-app-deployment labels: app: golang-app spec: replicas: 3 # Start with 3 replicas for high availability selector: matchLabels: app: golang-app template: metadata: labels: app: golang-app spec: containers: - name: golang-app image: your-docker-repo/golang-app:1.0.0 # Replace with your image ports: - containerPort: 8080 env: - name: PORT value: "8080" resources: # Define resource requests and limits for better scheduling and stability requests: memory: "64Mi" cpu: "100m" # 100 millicores (0.1 CPU core) limits: memory: "128Mi" cpu: "200m" # 200 millicores (0.2 CPU core) livenessProbe: # Checks if the app is still running httpGet: path: / # Or a dedicated /health endpoint port: 8080 initialDelaySeconds: 5 periodSeconds: 10 readinessProbe: # Checks if the app is ready to serve traffic httpGet: path: / # Or a dedicated /ready endpoint port: 8080 initialDelaySeconds: 10 periodSeconds: 15 timeoutSeconds: 5 B. service.yaml YAML apiVersion: v1 kind: Service metadata: name: golang-app-service labels: app: golang-app spec: selector: app: golang-app ports: - protocol: TCP port: 80 # The port the service exposes targetPort: 8080 # The port your application listens on inside the container type: ClusterIP # Exposes the Service on a cluster-internal IP. Default for many apps. C. Horizontal Pod Autoscaler (hpa.yaml) YAML apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: golang-app-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: golang-app-deployment minReplicas: 3 # Minimum number of pods maxReplicas: 10 # Maximum number of pods metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 # Target CPU utilization (percentage) # You can also scale based on memory or custom metrics # - type: Resource # resource: # name: memory # target: # type: Utilization # averageUtilization: 80 3. Deploy to Kubernetes Markdown kubectl apply -f deployment.yaml kubectl apply -f service.yaml kubectl apply -f hpa.yaml Some of the Best tools to monitor your Kubernetes resources are: GrafanaMiddlewarePrometheusFluentD Conclusion Scale up a Golang application in Kubernetes to reap its user-friendliness in building modern, resilient, and performance-oriented systems. Ensure that you adopt best practices for Go application design, such as making the application stateless, concurrent, and ensuring graceful shutdowns. You will allow Kubernetes features such as deployments, services, and horizontal pod autoscaling to balance the workload on your application accordingly. Come on, adopt this combination to perfect your microservices architecture!

By Neel Shah

Getting Started With Android UI Development With Jetpack Compose

I was contacted by my friend Aditya last month with tremendous enthusiasm regarding a new Android feature he had discovered. "Mohit, you won't believe it! I just used Jetpack Compose and it's insane!" At first, I was like, "Whatever dude, another Google framework that will be obsolete next year." But then, Dan showed me his project, and frankly, I was kind of blown away. Why We Started Working With Jetpack Compose Aditya had been struggling with a dating app he was building using XML layouts for roughly six months. The UI would always get broken on different devices, and animations were a nightmare. Within two weeks of using Compose, he rewrote the entire thing, and it worked way better. What persuaded us: Coding required way less code (like, for real, 30-40% less).Changes show up while you type (no longer waiting forever for Gradle).Animations are now finally making sense.Apps are more responsive on our elderly test phones.When we are stuck, there are hundreds of individuals online to assist. Constructing Our First Project Together When Aditya and I decided to work together, the start was very smooth. Getting Our Tools Ready We downloaded Android Studio.Ensured we had lots of coffee and snacks.Created a new project with the "Empty Compose Activity" option. This is what we had in our build files (after messing with it three times to start with). Groovy // In your project's build.gradle ext { compose_version = '1.5.4' } // In your app's build.gradle android { buildFeatures { compose true } composeOptions { kotlinCompilerExtensionVersion compose_version } } dependencies { implementation platform('androidx.compose:compose-bom:2023.10.01') implementation 'androidx.compose.material3:material3' implementation 'androidx.compose.ui:ui' implementation 'androidx.compose.ui:ui-tooling-preview' } Our First Screen Aditya was insisting on starting with something complex, but I convinced him to keep it simple — a welcome card. Here's what we did: Kotlin @Composable fun WelcomeCard(name: String) { Card( modifier = Modifier .padding(16.dp) .fillMaxWidth() ) { Column( modifier = Modifier.padding(16.dp), horizontalAlignment = Alignment.CenterHorizontally ) { Text( text = "Welcome, $name!", style = MaterialTheme.typography.headlineMedium ) Spacer(modifier = Modifier.height(8.dp)) Text( text = "Ready to build something awesome?", style = MaterialTheme.typography.bodyLarge ) } } } It took us about 10 minutes to write, and it was really good! Making Sense of the Building Blocks When I tried to convey this to our new Android buddy, here's how I phrased it: Composable: That only informs Android, 'Hey, this function creates UI things!'Modifier: Sort of like Android's CSS, it makes stuff show up and actMaterial components: Google's ready-to-use UI material that is beautiful with minimal effort Adding Some Interaction Static screens are just so booooring, so we added a simple card that opens up when you click it: Kotlin @Composable fun InteractiveCard() { var isExpanded by remember { mutableStateOf(false) } Card( modifier = Modifier .padding(16.dp) .fillMaxWidth() .clickable { isExpanded = !isExpanded } ) { Column(modifier = Modifier.padding(16.dp)) { Text( text = "Click me to see more!", style = MaterialTheme.typography.titleMedium ) if (isExpanded) { Spacer(modifier = Modifier.height(8.dp)) Text( text = "Hey, you found the hidden content! ", style = MaterialTheme.typography.bodyMedium ) } } } } The Magic of State The first time we encountered remember {mutableStateOf(false)}, it seemed quite challenging, but with time, it actually feels easier. remember is used to save data with respect to the screen being refreshed.mutableStateOf implies a state that can be changed. When the state revalidates, only those parts of the UI that are relevant will get altered. Layouts That Actually Make Sense The biggest moment was when we saw how layouts are done. XML layouts were a nightmare of deep hierarchies of ViewGroups, but Compose removes all of that. Row: Fill Side-by-Side We created this profile header with Row. Flutter widgets basically work the same way with Row: fill side-by-side. Kotlin @Composable fun ProfileHeader(name: String, title: String) { Row( modifier = Modifier .padding(16.dp) .fillMaxWidth(), verticalAlignment = Alignment.CenterVertically, horizontalArrangement = Arrangement.spacedBy(8.dp) ) { // Profile pic (just a gray circle for now) Box( modifier = Modifier .size(48.dp) .background(Color.Gray, CircleShape) ) // Text details Column { Text( text = name, style = MaterialTheme.typography.titleMedium ) Text( text = title, style = MaterialTheme.typography.bodyMedium, color = MaterialTheme.colorScheme.onSurfaceVariant ) } } } Column: Stuff That Should Be Stacked Vertically For this part of the messaging application, we utilized the Column implementation. Kotlin @Composable fun MessageCard(message: String, time: String, isUnread: Boolean) { Column( modifier = Modifier .fillMaxWidth() .padding(16.dp) .background( color = if (isUnread) MaterialTheme.colorScheme.primaryContainer else MaterialTheme.colorScheme.surface, shape = RoundedCornerShape(8.dp) ) .padding(16.dp) ) { // Message content Text( text = message, style = MaterialTheme.typography.bodyLarge ) Spacer(modifier = Modifier.height(8.dp)) // Time and status Row( modifier = Modifier.fillMaxWidth(), horizontalArrangement = Arrangement.SpaceBetween ) { Text( text = time, style = MaterialTheme.typography.bodySmall ) if (isUnread) { Text( text = "NEW", style = MaterialTheme.typography.labelSmall, color = MaterialTheme.colorScheme.primary ) } } } } Things We Did Wrong (So You Don't Have To) We committed several errors on the way. Here are some of the things for you to consider: 1. Modifier Order Matters a Lot Kotlin // Wrong (padding gets applied to full width) Modifier.padding(16.dp).fillMaxWidth() // Right (padding applies after width) Modifier.fillMaxWidth().padding(16.dp) 2. State Management Kotlin // Wrong (resets every time the UI updates) var count = 0 // Right (survives UI updates) var count by remember { mutableStateOf(0) } What We Knew Once Aditya and I had developed our first application using Compose, we wanted to know the following first: Begin small: Don't try to change your whole app at once. Change one screen at a time. Use preview: The Preview thing is actually very handy — use it to preview your changes without having to run the app.Think in pieces: Break your UI into smaller, reusable pieces. Aditya also sometimes plays around with nested modifiers (I'm always having to correct him lol), but we both improve daily. If you are new to all this, don't worry so much about making everything perfect. Just play around with it and have fun! Good luck!

By Mohit Agrawal

AWS Step Functions IDE Extension: A Game Changer, But What’s Next?

If you’ve ever worked with AWS Step Functions, you know the struggle. Debugging workflows locally? A nightmare. Testing small changes? Deploy, wait, check logs, repeat. The experience has been far from smooth — until now. AWS just launched a Step Functions extension for VSCode, and it’s a huge step forward. But as great as this update is, one big question remains: why is this limited to VSCode? What about IntelliJ, Eclipse, and other IDEs? Let’s break down what’s new, why it matters, and what AWS should do next. The Pain Before This Update Before this extension, working with AWS Step Functions was a constant back-and-forth between AWS Console and local development. If you wanted to: Debug a workflow? You’d rely on CloudWatch logs, which meant extra time and effort. Test changes? You had to deploy to AWS every time, even for minor tweaks. Validate the output of an individual state? Practically impossible without running the complete workflow. For developers, this meant longer development cycles, slower debugging, and more friction when building serverless workflows. What’s New in the AWS Step Functions IDE Extension? AWS finally heard our pain and introduced an enhanced local development experience for Step Functions in VS Code. Here’s what’s cool about it: Visualize workflows directly in VS Code: No more switching between AWS Console and your IDE. Run Step Functions locally: Test before deploying, saving time and resources.Live debugging support: Easily catch and fix errors without digging through logs.Seamless AWS integration: Sync changes faster without losing context. State-by-state testing: Validate the output of an individual state by providing custom input, making development quicker and more efficient. This last feature, the state-level testing, is a game-changer. Now you don’t have to execute the entire workflow just to see if a specific state is working. You can provide input, get instant validation, and debug much faster — all from your IDE. Step Functions in Action: VS Code Workflow Example Below is a sample Step Functions workflow I tested using the new VS Code extension. The workflow consists of multiple states, invoking two AWS Lambda functions with error handling and retry logic. AWS Step Functions workflow in VSCode JSON { "Comment": "An example AWS Step Function with multiple Lambda functions and states", "StartAt": "StartState", "States": { "StartState": { "Type": "Pass", "Next": "InvokeFirstLambda" }, "InvokeFirstLambda": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789012:function:FirstLambda", "Next": "CheckResult" }, "CheckResult": { "Type": "Choice", "Choices": [ { "Variable": "$.status", "StringEquals": "success", "Next": "InvokeSecondLambda" }, { "Variable": "$.status", "StringEquals": "retry", "Next": "WaitState" } ], "Default": "FailState" }, "WaitState": { "Type": "Wait", "Seconds": 5, "Next": "InvokeFirstLambda" }, "InvokeSecondLambda": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789012:function:SecondLambda", "Next": "FinalSuccess" }, "FinalSuccess": { "Type": "Succeed" }, "FailState": { "Type": "Fail", "Error": "ExecutionFailed", "Cause": "The process failed after retry" } } } With the new testing feature, I was able to test individual states by providing sample input and verifying their output, right inside VS Code. This cut down my development time significantly compared to deploying every change to AWS. Why This Matters This update is a huge productivity boost for developers using Step Functions. We no longer have to: Constantly deploy for minor updates Guess what’s happening in our workflowsWaste time switching between the AWS Console and local code Debug an entire workflow to validate a single state With this new VS Code extension, we can now test, visualize, and debug workflows faster than ever. But there’s a catch… The Missing Piece: What About IntelliJ and Eclipse? AWS made a great move by integrating Step Functions with VS Code, but let’s be honest, not everyone uses VS Code. Many enterprise developers rely on IntelliJ, Eclipse, or JetBrains IDEs. So why is this extension limited to just one IDE? AWS already provides toolkits for IntelliJ and Eclipse for services like Lambda and CloudFormation. Expanding Step Functions support to these IDEs would: Reach a wider audience of developers who don’t use VS Code Make AWS Step Functions more accessible in enterprise environments Provide a consistent development experience across multiple IDEs Final Thoughts AWS is heading in the right direction by improving local development for Step Functions, but they shouldn’t stop at VS Code. The state-by-state testing feature is one of the most exciting additions, making Step Function development faster and easier. However, expanding this capability beyond VS Code would truly unlock its full potential. I’d love to see AWS bring this to IntelliJ, Eclipse, and other popular IDEs. What do you think? Do you use AWS Step Functions in VS Code?Would you like to see this extension in IntelliJ or Eclipse?How has state testing improved your workflow? Let’s start the conversation!

By VIGNESHWAR Somasundaram

How to Set Up Selenium Grid 4 With Docker Compose

Selenium WebDriver is a popular web automation tool. It automates browsers and enables software teams to perform web automation testing across multiple popular browsers, including Google Chrome, Mozilla Firefox, Microsoft Edge, and Safari. To scale this testing across different platforms and browser versions, Selenium Grid 4 can be utilized. It works seamlessly with Selenium WebDriver, allowing tests to run in parallel across different browsers, making cross-browser and cross-platform testing faster and efficient. In this tutorial, we will learn about Selenium Grid 4, its components, and when to use it. We will also perform a hands-on activity by starting Selenium Grid 4 with Docker Compose. What Is Selenium Grid 4? Selenium Grid 4 allows running Selenium WebDriver tests on different machines(virtual or real) by forwarding commands from test scripts to browsers running on remote systems. This makes it easier to test across multiple environments and speeds up test execution. The main objective of Selenium Grid is to: Make it simple to run tests in parallel on multiple platforms and browsers.Enable testing across different browser versions.Perform cross-platform testing. Selenium Grid 4 Components There are six main components of Selenium Grid, which are explained below: 1. Router The Router acts as an entry point. It manages all the incoming requests and performs the following actions: Handling New Session Request When the Router receives a new session request, it forwards this request to the “New Session Queue”, which is responsible for managing and scheduling session creation. Handling Existing Sessions When a session request is received by the Router that belongs to an existing session, it performs the following actions: It queries the Session Map, which tracks the existing sessions.The Session Map provides the Router with the Node ID where the session is running.The Router then forwards the request directly to the Node. Load Balancing The Router also acts as a load balancer in the Grid. It ensures that the requests are sent to the Nodes or Queues that can handle them in the best way. It avoids overloading any part of the Grid and helps distribute the test execution evenly. 2. Distributor The Distributor manages the registration and capabilities of Nodes in the grid. The following are its two main responsibilities: Register and Keep Records of All Nodes and Their Capabilities A Node sends the registration event through the Event Bus.The Distributor listens to the Event Bus. It picks up the registration event and begins the verification process.The Distributor sends an HTTP request to the Node to verify if it's online and ready to accept sessions.On successful verification of the HTTP request, the Distributor registers the Node as part of the Grid.The Distributor keeps track of all Nodes' capabilities through the GridModel. Query the New Session Queue and Process Any Pending New Session Requests The Router forwards the New Session Request to the New Session Queue.The request will wait in the queue until the Grid is ready to process it.The Distributor continuously polls the New Session Queue to check if there are any pending session requests.When a pending request is found, the Distributor looks for a suitable Node that matches the required capabilities and available capacity to run the session, and then the session is created.Next, the Distributor records the session ID and the Node handling it in the Session Map. 3. Session Map The Session Map is a storage system that keeps a record of which Session ID is running on which Node. It helps the Router figure out where to send each request by telling it which Node is handling a specific session. 4. New Session Queue The New Session Queue stores all incoming session requests in the order they arrive(FIFO). It has configurable parameters for how long a request can wait before timing out, and how often the system should check for these timeouts. The following are the processes where the New Session Queue is used: The Router Adds Requests to the Queue The Router receives the New Session Request and adds it to the New Session Queue. It then waits for a response while the request stays in the queue. Queue Checks for Timeouts The New Session Queue, based on the timeout setting, checks if any requests have waited too long. If the request has timed out, it is rejected and removed from the queue immediately. The Distributor Checks for the Available Slots, Matches, and Assigns the Node The Distributor continuously checks for available Nodes. On finding an available Node, the Distributor polls the queue for a Session Request that matches the slot’s configuration and attempts to create a new session. If a request matches the available Node’s capabilities, the Distributor tries to assign it to that slot. If no slots are available, the Distributor sends the request back to the queue to retry later. The request is rejected if the request times out while waiting to be retried or added back to the front of the queue. On successful creation of the session: The Distributor sends the session details back to the New Session Queue.The New Session Queue sends it back to the Router.The Router finally returns the session to the client. 5. Node A Selenium Grid can have several Nodes, each running on a separate machine. Each Node handles the browser instances available on the machine it’s running on. The Node connects to the Distributor via the Event Bus and sends its configuration details as part of the registration message. By default, the Node automatically registers all browser drivers found in the system path of the machine on which it is running. It creates one slot per available CPU for Chromium-based browsers and Firefox, while only one slot is created for Safari. With custom configuration, the Node can also run sessions inside Docker containers or forward commands as a relay. A Node simply runs the commands it receives — it doesn’t make decisions or manage anything beyond executing instructions and sending back responses. Also, the machine hosting the Node can use a different operating system from the rest of the Grid components. Event Bus The Event Bus facilitates internal communication between the different parts of the Grids, i.e., the Nodes, Distributor, New Session Queue, and Session Map. Instead of using heavy HTTP calls, the Grid uses simple messages to talk internally, which makes things work faster. The Event Bus should be started first when setting up the Grid in fully distributed mode. 6. Selenium Grid Node and Hub Configuration Flags The list of all related flags that can be used while configuring the Node and Hub for Selenium Grid 4 is listed here. When to Use Selenium Grid? In general, there are two reasons to use Selenium Grid: To execute tests across different browsers, different browser versions, and on browsers running on different operating systems.To reduce the time of test execution and get faster feedback. Getting Started With Selenium Grid 4 Selenium Grid 4 can be started in the following four ways: Standalone modeHub and Node modeDistributed modeDocker-based We will be using the Docker-based approach in this tutorial and using Docker Compose to start the Selenium Grid 4. What Is Docker Compose? Docker Compose is a tool designed to simplify the setup and management of multi-container applications. It lets us define all the services in a single YAML file, so the entire stack can be started or stopped with just one command. One of the biggest benefits of Docker Compose is that it allows us to keep the application’s setup in a single, version-controlled file at the root of the project. This makes it easy for others to contribute. Using the Docker Compose file and running the below command: Plain Text docker compose up Within minutes, everything is up and running. This convenience and speed are what make Docker Compose so powerful. How to Set Up Selenium Grid 4 With Docker Compose To set up Selenium Grid 4 with Docker Compose, we need to create the following YAML file with all the respective components of Selenium Grid 4: YAML version: "3" services: chrome: image: selenium/node-chromium:latest shm_size: 2gb depends_on: - selenium-hub environment: - SE_EVENT_BUS_HOST=selenium-hub - SE_EVENT_BUS_PUBLISH_PORT=4442 - SE_EVENT_BUS_SUBSCRIBE_PORT=4443 - SE_NODE_MAX_INSTANCES=4 - SE_NODE_MAX_SESSIONS=4 - SE_NODE_SESSION_TIMEOUT=180 networks: - selenium-jenkins-network firefox: image: selenium/node-firefox:latest shm_size: 2gb depends_on: - selenium-hub environment: - SE_EVENT_BUS_HOST=selenium-hub - SE_EVENT_BUS_PUBLISH_PORT=4442 - SE_EVENT_BUS_SUBSCRIBE_PORT=4443 - SE_NODE_MAX_INSTANCES=1 - SE_NODE_MAX_SESSIONS=1 - SE_NODE_SESSION_TIMEOUT=180 networks: - selenium-jenkins-network selenium-hub: image: selenium/hub:latest container_name: selenium-hub ports: - "4442:4442" - "4443:4443" - "4444:4444" networks: - selenium-jenkins-network networks: selenium-jenkins-network: external: true This Docker Compose file spins up a distributed Selenium Grid 4 setup with 3 services: Selenium HubChrome NodeFirefox Node All these services are connected to the same Docker Network, so the Grid could be accessed or used by other services such as Jenkins agents. Decoding the Docker Compose File Version YAML version: "3" The version specifies the Docker Compose file format version. It is defined by the Compose Specification for backward compatibility. Services It defines all the services that make up the application. In our case, there are three services defined, “chrome”, “firefox”, and “selenium-hub”. Chrome Service YAML chrome: image: selenium/node-chromium:latest shm_size: 2gb depends_on: - selenium-hub environment: - SE_EVENT_BUS_HOST=selenium-hub - SE_EVENT_BUS_PUBLISH_PORT=4442 - SE_EVENT_BUS_SUBSCRIBE_PORT=4443 - SE_NODE_MAX_INSTANCES=4 - SE_NODE_MAX_SESSIONS=4 - SE_NODE_SESSION_TIMEOUT=180 networks: - selenium-jenkins-network The Chrome service pulls the node-chromium latest image and runs a Selenium node with Chromium browser support. I have a MacBook M2 machine, so I am using images with arm architecture. Check out Multi-arch images via Docker Selenium for more details. Check out Docker Selenium GitHub repository for more Docker files setup. The shm_size sets the 2 GB shared memory size to prevent browser crashes inside the Docker container. The depends_on controls the order in which the service starts up. It ensures that the selenium-hub service starts before the Chrome service. Environment variables are defined using the environment key. YAML environment: - SE_EVENT_BUS_HOST=selenium-hub - SE_EVENT_BUS_PUBLISH_PORT=4442 - SE_EVENT_BUS_SUBSCRIBE_PORT=4443 These specific variables guide the Chrome node on how to connect to the Event Bus provided by the Selenium Hub. YAML - SE_NODE_MAX_INSTANCES=4 - SE_NODE_MAX_SESSIONS=4 - SE_NODE_SESSION_TIMEOUT=180 The SE_NODE_MAX_INSTANCES and SE_NODE_MAX_SESSIONS Environment variables allow Chrome node to handle up to 4 sessions in parallel. The SE_NODE_SESSION_TIMEOUT handles the Chrome node session timeout. If a session remains idle for 180 seconds, it will be cleaned up. YAML networks: - selenium-jenkins-network This tells Docker that the Chrome node should connect to a specific selenium-jenkins-network Docker network. Firefox Service YAML firefox: image: selenium/node-firefox:latest shm_size: 2gb depends_on: - selenium-hub environment: - SE_EVENT_BUS_HOST=selenium-hub - SE_EVENT_BUS_PUBLISH_PORT=4442 - SE_EVENT_BUS_SUBSCRIBE_PORT=4443 - SE_NODE_MAX_INSTANCES=1 - SE_NODE_MAX_SESSIONS=1 - SE_NODE_SESSION_TIMEOUT=180 networks: - selenium-jenkins-network The Firefox service pulls the node-firefox latest image and runs a Selenium node with Firefox browser support. Like Chrome service, this service also has a 2 GB shared memory to prevent browser crashes inside the container and depends on the Selenium-Hub service. Unlike Chrome node, this service can handle only one session as the variables are set to 1; however, it is configurable and can be increased as required. The Firefox node will also be connected to the Docker network selenium-jenkins-network. Selenium Hub YAML selenium-hub: image: selenium/hub:latest container_name: selenium-hub ports: - "4442:4442" - "4443:4443" - "4444:4444" networks: - selenium-jenkins-network The Selenium-Hub service pulls the latest hub image from Docker Hub and creates a Docker container named “selenium-hub”. This name is useful in case we need to check the logs inside the Docker Containers. It runs the hub, which is the main component for Selenium Grid. The following are the details related to the ports: 4442 port is the Event Bus publish port.4443 is the Event Bus subscribe port.4444 is the Web UI / Router endpoint. This is the port used for navigating the Selenium Grid over the browser. The Selenium hub service will also be connected to the Docker network selenium-jenkins-network. Docker Network This network configuration part is not necessarily required in the Selenium Grid Docker Compose. I have planned to run the Selenium automation tests on Selenium Grid with Docker Compose in Jenkins. Hence, this network configuration is set up. YAML networks: selenium-jenkins-network: external: true The Networks key in Docker Compose allows us to define and configure a custom network for our services. This “selenium-jenkins-network” network must already exist. This network, named “selenium-jenkins-network”, was not created by Docker Compose. We can create a Docker network manually by running the following command using the terminal: Plain Text docker network create selenium-jenkins-network Once the network is successfully created, we can move to the next step and start Docker Compose. Starting Selenium Grid 4 With Docker Compose The following command should be run after navigating to the folder where the Docker Compose file is placed. Plain Text docker compose -f docker-compose-seleniumgrid.yml up -d I have named the Docker Compose file “docker-compose-seleniumgrid.yml.” -f options take filename as the parameter. The “-d” in this command stands for “detached mode”. It will run the services in the background, freeing up the terminal for other tasks. In case we need to check the logs, we can run the following command: Plain Text docker compose -f docker-compose-seleniumgrid.yml logs To check the logs for a specific service, we can append the name of the service in the command as given below: Plain Text docker compose -f docker-compose-seleniumgrid.yml logs chrome The Docker Compose can also be started and run without using the “-d” flag. It will start the containers in the foreground and show all the real-time logs in the terminal. Selenium Grid UI After the command is executed successfully, open any browser and navigate to http://localhost:4444 and check out the live Selenium Grid running on your local machine. As defined in Docker Compose, we can see that four instances of the Chrome browser are available, while one instance of Firefox is available. Stopping Selenium Grid 4 With Docker Compose As we learned, two different ways to start Selenium Grid 4 with Docker Compose. The grid can be stopped in the following ways, respectively, how they were started: If the Grid was started in detached mode (using -d in the command), the following command should be used to stop it: Plain Text docker compose -f docker-compose-seleniumgrid.yml down If the Grid was started in foreground mode, press CTRL + C to stop. After the Grid is stopped, open the browser and try navigating to http://localhost:4444. It should not show the Selenium Grid UI. Scaling the Browsers in Selenium Grid 4 With Docker Compose Scaling browsers in Selenium Grid is easy and flexible with Docker Compose. For example, if we want to scale up the Chrome service and run 3 more instances, we can simply stop the current Docker Compose session and start it again using the command below: Plain Text docker compose -f docker-compose-seleniumgrid.yml up -d --scale chrome=4 If we now check the Selenium Grid UI, we should be able to see a total of 4 Chrome service instances up and running. Each Chrome service has four instances of the Chrome browser, so in total, we have 16 Chrome browser instances running. We can use these browsers and execute tests in parallel using Selenium Grid 4 with Docker Compose. Incredible! We can easily scale up browser instances right at our fingertips, whenever needed. Summary Selenium Grid 4 can be of great help to run the regression or end-to-end tests in parallel and reduce the execution time. This can help us get faster feedback on the builds. With Docker Compose, we can spin up the Selenium Grid easily within seconds. It also offers the flexibility to scale the browser instance as required at our fingertips. Happy testing!

By Faisal Khatri

CORE

Development of System Configuration Management: Introduction

Series Overview This article is part 1 of a multi-part series: "Development of system configuration management." The complete series: IntroductionMigration end evolution Working with secrets, IaC, and deserializing data in GoBuilding the CLI and APIHandling exclusive configurations and associated templatesPerformance considerationSummary and reflections Introduction SCM is software that facilitates the widespread deployment of configurations across infrastructure. It is a tool that can orchestrate the parameters of computers to prepare them for the desired environment. The necessity of SCM is recognized across a large number of computer systems. A well-organized SCM can improve the productivity of the SRE team. The larger the number of hosts, the greater the productivity it introduces. Conversely, poorly organized SCM in small infrastructures can lead to decreased productivity. Typically, bare metal and VM-based infrastructure are suitable for deployment via SCM. While the deployment of applications using the SCM API is possible, it is not very convenient. Orchestrators like Kubernetes and Nomad are not designed to work with SCM. Infrastructure as Code (IaC) is more effective for provisioning. As a result, on average, we have at least three different tools for configuration deployment. While this isn't necessarily detrimental, it is common practice. On the other hand, custom infrastructure providers introduce their own challenges. Consequently, any additional tools incur overhead costs related to adjustment, development, and maintenance. My colleagues and I decided to develop our own SCM to address this issue. I authored the initial code, and then my colleagues joined me. Unfortunately, it is not an open-source system, but in this article, we will discuss the challenges we encountered during development, the solutions we found, the approaches we took, and the common principles for developing your own SCM. This information may be useful for those who face the same choice. What I Dislike about Popular SCM The most popular SCM in open source: Ansible, Saltstack, Puppet, Chef, CFEngine. These are good engines to use as SCM for most cases. Our case was the other. First of all, we used Ansible and SaltStack. The primary issue connected with them is the requirement for an up-to-date version of Python and its modules on the server. This necessity can lead to increased maintenance costs. The second issue is that most integration modules do not adapt to our specific use cases. This leads to a situation where the most commonly used features will be a file deployer, service runner, and package installer. Overall, the user will need to describe the integration with services in all cases. If it is a straightforward case, the process will be simple. However, if it is more complex, the difficulty will increase, and users may not notice a significant difference between developing their own SCM and using an open-source SCM. For instance, if we want to bootstrap ACL in Consul, we should perform the following steps: Run query to /v1/acl/bootstrap locally on the server.Store the obtained AccessorID and SecretID in Vault.Make this secret available on all servers in the cluster to enable management of ACLs through SCM. The second example is using internal TLS certificates for mTLS: Generate a new certificate and key.Store them in the vault.Deploy from the vault to a group of servers. The password deployment process is similar. These operations are bidirectional. At the start, we must generate the secret, store it in the secret storage, and then deploy it where necessary. In traditional SCM, such cases lead to the impossibility of deploying the system with just one push to Git and a single SCM call. Motivation to Develop a New SCM There are many pros and cons to changing the SCM to find a better solution for us. However, based on the experience of other colleagues, each SCM has its own advantages and disadvantages. This is acceptable, as even if we develop a new one, it will also have some drawbacks. Nevertheless, we can focus our efforts on increasing the benefits of our development. Ultimately, we identified several reasons why we believe developing a new SCM will lead us to success: Dissatisfaction with the old SCM: It may sound strange, but when many engineers struggle with a particular tool, they are often motivated to participate in developing and pushing for a new tool.Complaints about requirements and conditions: For instance, our new SCM would need to closely integrate with our private cloud and our own CMDB, taking into account the roles and host group semantics (in the future I will refer to it as hostgroup) we use in our live processes, as well as the specific open-source tools we integrate with.Development of new functionality: The new SCM can offer features relevant to our SRE needs that are not available in the current open-source SCM options. However, developing this codebase will require time. For instance, it can include: Automatic restoration of servicesIaC functionalityAutomation of cluster assembly and node joiningIndependence from irrelevant features: A new SCM, developed as ordinary software, will mitigate issues relating to security updates, unnecessary feature overload, and potential backward compatibility breaks. Specifically, the new SCM will: Have updates implemented only when we need them.Include only relevant functions (avoiding unnecessary functionality and bugs).Maintain backward compatibility in many cases where the open-source SCM cannot do so due to its universality and irrelevant features for us.Improved configurations of services: Even if we miss our goals, moving the configuration from one SCM to another allows us to eliminate irrelevant elements, remove unnecessary workarounds in service configuration, and ultimately create a cleaner configuration.Interest from other teams: Numerous teams are keen to learn from our experiences and may be interested in making similar decisions. How We Envisioned an Effective SCM For Us In our opinion, the SCM can prepare the empty host to be production-ready without the participation of engineers. SCM should create directories, manipulate files, run services, initialize and join nodes to the clusters, add users to the software, set permissions, and so on. This approach will improve the productivity of the SRE team and ensure the reproducibility of infrastructure. Earlier, we envisioned a system that could build new services in infrastructure, from creating and pushing a single file to Git. Moreover, we wanted to unite SCM and IaC. In our company, we use a self-developed private cloud to provision VMs. At that time, we did not have Terraform integration, and creating it from scratch was similarly labor-intensive. In our vision, a new SCM must create VMs, connect to them, and provision them to production in just 10 minutes. We wanted to write this in Go to minimize the number of dependencies installed on each machine. The main manifest for host groups is a simple YAML file and can be parsed by yamllint. This opens up opportunities for pre-commit checks that highlight syntax-level issues. The next critical integration is with a persistent database to store dynamically configured parameters. We integrated it with Consul, which allows us to deploy applications dynamically by setting new versions of applications — such as changing Docker images on the fly with an API — rather than hardcoding them into files in the Git repository. Another important aspect for us is integration with Vault, which enables the creation and retrieval of secrets and certificates for deployment on hosts. This would allow for bidirectional schemas, where our developed SCM generates secrets, automatically stores them in the Vault, and deploys them to the hosts. From What We Started, the Development Overview Both IaC and SCM functionalities operate as follows: According to the scheme, the SR Engineer pushes the configuration file to Git that describes the hostgroups, including resources and the number of replicas. This file contains all configuration details for the hostgroups: resources, software, and settings. The API periodically checks the API of the inventory manager for key fields: Do we have such host groups? If no, create thisDo we have enough hosts in this host group? If no, loop to create more hosts Once a host starts, the initial scripts (which can be the old SCM running in automatic mode, kickstarts in RHEL-based environment, or cloud-init) initiate the installation of the SCM agent. After the SCM agent starts, it registers with the SCM API and periodically retrieves its configuration. From that point onward, all hosts will be managed by the SCM agent for new configurations and deployments. There are many sources of data. The SCM API retrieves the data from all sources and merges it. We have a default.yaml file that contains the configuration relevant for all hosts. The target configuration for a hostgroup is stored in '{hostgroup name}.yaml'. Additionally, Consul and Vault provide extra information, including dynamic configuration and secrets. Consul is an important part of SCM as it allows for dynamic configurations to be stored without requiring a push to Git, which is useful for deployments. The SCM API includes a reverse proxy feature to route requests to Consul. This approach provides a unified access model and a single entry point for interactions with the configuration. As a result, SCM provides an interface to store dynamic configurations in Consul while keeping static configurations on the filesystem under Git control. There was a small service developed in Golang, consisting of three parts: the API, the agent, and the CD client. The first functionality that was developed was package installation. On CentOS 7, it uses YUM for this purpose. Our hosts were united in the host group in our resource manager (CMDB). The main idea was that the hosts in the host group must be similar or the same. The API then returned the configuration based on the determined host group of each host. Each host group had its own unique configuration, as follows: YAML packages: lsof: name: lsof-4.87-4 There are two main repositories: The SCM source codeConfiguration files that describe the hostgroup manifests for deployment The first repository contains the source code for the SCM, which operates according to the declarative configuration specified in the second repository. It checks various files, packages, and services for compliance with specified conditions. This code also provides for complex cases developed in Golang. In SaltStack or Ansible terminology, it is referred to as roles or formulas. The second repository contains declarative configurations in YAML. In Saltstack or Ansible, it is referred to as facts or pillars/grains. For convenience, a file containing the default settings for all hosts was introduced. This allows the option to avoid using SaltStack to deploy packages widely in the early stages of development, providing deployment opportunities for a base configuration as extensive as possible. The first modules introduced in the SCM included: Directory managerFile managerCommand run managerService managerPackage managerUser manager Code Explanation Configuration files open up opportunities for us to configure most resources on the system. These common managers had the following components: Declarative Config Handler It operates only with user-defined host groups via YAML files. For example, below is a piece of code that implements the file state checking logic: YAML func FilesDeclarativeHandler(ApiResponse map[string]interface{}, parsed map[string][]resources.File) { for _, file := range parsed[key] { if file.State == "absent" { FileAbsent(file.Path) continue } err := FileMkdir(file.Path, file.DirMode) if err != nil { logger.FilesLog.Println("Cannot create directory", err) } if file.Template == "go" { TemplateFileGo(file.Path, file.Data, file.FileMode, ApiResponse) } else if file.Symlink != "" { CreateSymLink(file.Path, file.Symlink, file.DirMode) } else if file.Data != "" { temp_file := GenTmpFileName(file.Path) ioutil.WriteFile(GenFile, data, file.FileMode) TemplateFileGo(file.Path, file.Data, file.FileMode) if CompareAndMoveFile(file.Path, GenFile, file.FileMode, file.FileUser, file.FileGroup) { CompareAndMoveFile(temporaryPath, file) FileServiceAction(file) } } ... The API part responsible for retrieving files from the filesystem and sharing them with specific hosts is as follows: YAML func FilesMergeLoader() { ... if Fdata.From != "" { filesPath := conf.LConf.FilesDir + "/data/" + Fdata.From loadedBuffer, err := ioutil.ReadFile(filesPath) if err != nil { logger.FilesLog.Println("hg", hostgroup, "err:", err) continue } Fdata.Data = base64.StdEncoding.EncodeToString(loadedBuffer) } ... If the 'from' field is defined, the API loads this file as a base64-encoded string into a new JSON field called data, allowing binary files to be transferred within JSON. An agent with a pull model periodically checks the API, retrieves these fields, and stores them on the hosts' destination filesystems. This is just a small part of the functionality that allows for file configurations with parameters such as: YAML files: /path/to/destination/filesystem: from: /path/from/source/filesystem template: go /etc/localtime: symlink: /usr/share/zoneinfo/UTC /etc/yum.repos.d/os.repo: state: absent These two parts of the code function in a similar manner: The Git repository consists of a set of declarative configuration files and the files that must be transferred to the agents. Two other modules operate similarly, although with slight differences. They contain significantly more logic related to their area. Almost all managers have flags for restarting or reloading services after making changes, which necessitates identifying differences before changes are made. As a result, if the SCM agent wants to create a directory, it must first check for its existence. The "running" state of service indicates that the service must be running and enabled, while the "dead" state signifies that the service must be disabled and stopped. In our infrastructure, such an operation doesn't need to be separated, and we have not implemented functionality to distinguish between the functions that enable and run services. The package handler workflow is illustrated in the following flowchart: On the other hand, the package handler is similar but with its own specific requirements: Macros for Calling Managers at the API Level In many cases, the merger part creates declarative configurations with common elements from certain macros. The API merely enriches the main YAML declaration for each host group. In the future mentions I will refer to them as Mergers. YAML // ApiResponse is a JSON that contains all fields declared by the user in group.yaml func HTTPdMerger(ApiResponse map[string]interface{}) { if ApiResponse == nil { return } // Check for the existence of the 'httpd' field. If not specified, skip since it's not a group for the httpd service. if ApiResponse["httpd"] == nil { return } // Get the statically typed struct from the main YAML var httpd resources.Httpd err := mapstructure.WeakDecode(ApiResponse["httpd"], &httpd) // Define the service name that should be run on the destination hosts httpdService := "httpd.service" if httpd.ServiceName != "" { httpdService = httpd.ServiceName } // Define the expected service state state := httpd.State if state != "" { // Add the service state to the response JSON common.APISvcSetState(ApiResponse, httpdService, state) } else { // Default to "running" if no state is specified common.APISvcSetState(ApiResponse, httpdService, "running") } // Define the httpd package name httpPackage := "httpd" if httpd.PackageName != "" { httpPackage = httpd.PackageName } // Add the package name to the response JSON common.APIPackagesAdd(ApiResponse, httpPackage, "", "", []string{}, []string{httpdService}, []string{}) Envs := map[string]interface{}{ "LANG": "C", } // Add user for the httpd service common.UsersAdd(ApiResponse, "httpd", Envs, "", "", "", "", 0, []string{}, "", false) // Add an empty directory for logs common.DirectoryAdd(ApiResponse, "/var/log/httpd/", "0755", "httpd", "nobody") // Add the Go templated file httpd.conf, which should be obtained from httpd/httpd.conf on the SCM API host from the GIT directory, and passed to /etc/httpd/httpd.conf on the destination server. common.FileAdd(ApiResponse, "/etc/httpd/httpd.conf", "httpd/httpd.conf", "go", "present", "root", "root", "", []string{}, []string{}, []string{httpdService}, []string{}) Url := "http://localhost/server-status" // We utilize the Alligator monitoring agent to collect metrics from HTTPd. Add configuration context with httpd. AlligatorAddAggregate(ApiResponse, "httpd", Url, []string{}) } As a result, the user can work in two ways: Declare the resources themselves.Declare a macro like httpd, and everything relevant to this service is automatically enriched in the resulting response. To create your own macros, you need to write Go code. To support, there are many functions, like common.DirectoryAdd and common.FileAdd only enriches the JSON. For example, here is an example of the FileAdd function: Go func FileAdd(ApiResponse map[string]interface{}, path, from, template, state, file_user, file_group, file_mode string, restart, reload, flags, cmdrun []string) { if ApiResponse == nil { return } if ApiResponse["files"] == nil { ApiResponse["files"] = map[string]interface{}{} } Files := ApiResponse["files"].(map[string]interface{}) NewFile := map[string]interface{}{ "from": from, "state": state, "template": template, "services_restart": restart, "services_reload": reload, "flags": flags, "cmd_run": cmdrun, "file_user": file_user, "file_group": file_group, "file_mode": file_mode, } Files[path] = NewFile } func APIPackagesAdd(ApiResponse map[string]interface{}, pkg string, Name string, Name9 string, Restart []string, Reload []string, CmdRun []string) { if ApiResponse["packages"] == nil { ApiResponse["packages"] = map[string]interface{}{} } Packages := ApiResponse["packages"].(map[string]interface{}) if Packages[pkg] == nil { NewPkg := map[string]interface{}{} if Name != "" { NewPkg["name"] = Name } if Name9 != "" { NewPkg["el9"] = Name9 } if Restart != nil { NewPkg["services_restart"] = Restart } if Reload != nil { NewPkg["services_reload"] = Reload } if CmdRun != nil { NewPkg["cmd_run"] = CmdRun } Packages[pkg] = NewPkg } } func APISvcSetState(ApiResponse map[string]interface{}, svcname string, state string) { if ApiResponse["services"] == nil { ApiResponse["services"] = map[string]interface{}{} } service := ApiResponse["services"].(map[string]interface{}) _, serviceDefined := service[svcname] if !serviceDefined { service[svcname] = map[string]interface{}{"state": state} } service[svcname].(map[string]interface{})["state"] = state } func DirectoryAdd(ApiResponse map[string]interface{}, Path string, Mode string, User string, Group string) { if ApiResponse["directory"] == nil { ApiResponse["directory"] = map[string]interface{}{} } Directory := ApiResponse["directory"].(map[string]interface{}) NewFile := map[string]interface{}{ "dir_mode": Mode, "user": User, "group": Group, } Directory[Path] = NewFile } func UsersAdd(ApiResponse map[string]interface{}, UserName string, Envs map[string]interface{}, Home string, Shell string, Group, Groups string, Uid int, Keys []string, Password string, CreateHomeDir bool) { if ApiResponse == nil { return } if ApiResponse["users"] == nil { ApiResponse["users"] = map[string]interface{}{} } Users := ApiResponse["users"].(map[string]interface{}) NewUser := map[string]interface{}{ "envs": Envs, "home": Home, "shell": Shell, "groups": Groups, "uid": Uid, "keys": Keys, "genpasswd": Password, "group": Group, "create_home_dir": CreateHomeDir, } Users[UserName] = NewUser } func AlligatorAddAggregate(ApiResponse map[string]interface{}, Parser string, Url string, Params []string) { if ApiResponse["alligator"] == nil { return } AlligatorMap := ApiResponse["alligator"].(map[string]interface{}) if AlligatorMap["aggregate"] == nil { var Aggregate []interface{} AlligatorMap["aggregate"] = Aggregate } AggregateMap := AlligatorMap["aggregate"].([]interface{}) AggregateNode := map[string]interface{}{ "parser": Parser, "url": Url, "params": Params, } AggregateMap = append(AggregateMap, AggregateNode) AlligatorMap["aggregate"] = AggregateMap } However, the file manager has additional logic due to the necessity to load the file body into the JSON. It works well by adding the file loader at the end of scanning other mergers. Other cases function similarly but are simpler. For the end user, the definition: YAML httpd: state: running Will be transformed into: YAML httpd: state: running packages: httpd: name: httpd flags: - httpd.service service: httpd.service: state: running files: /etc/httpd/httpd.conf: template: go from: httpd/httpd.conf services_reload: - httpd.service user: root group: root directory: /var/log/httpd/: dir_mode: "0755" user: httpd group: nobody users: httpd: envs: LANG: C alligator: aggregate: - url: http://localhost/server-status parser: httpd This opens up all the opportunities of modern SCM and remains flexible enough to change parameters. The Codebase That Operates at the Agent Level SCM allows for custom resource definitions. This is part of the role that must be performed on destination servers within the general JSON pulled from the SCM API. The SCM agent provides an interface with many functions to synchronize or template files, create symlinks, install packages on the operating system, and start or stop services. These functions act as wrappers that check for differences between the state declared by the SCM and the state at the host level. For example, before changing a file, the agent should check for its existence, identify differences, and synchronize that file from the SCM. This process is necessary to trigger actions related to state changes, such as running commands, restarting or reloading services, or performing other tasks. Many configuration parameters on Linux can be transferred via files, services, packages, and so on, and in most cases, there is no need for additional custom logic. However, sometimes there are cases where certain services cannot be restarted simultaneously across multiple servers. In such instances, we can describe the logic using locks, as shown in the code below: Go func HTTPdParser(ApiResponse map[string]interface{}) { if ApiResponse == nil { return } if ApiResponse["httpd"] == nil { return } var httpd resources.Httpd err := mapstructure.WeakDecode(ApiResponse["httpd"], &httpd) HttpdFlagName := "httpd.service" var Group string if ApiResponse["group"] != nil { Gropu = ApiResponse["group"].(string) } if common.GetFlag(HttpdFlagName) { LockKey := Hostgroup + "/" + HttpdFlagName LockRestartKey := "restart-" + HttpdFlagName if common.SharedLock(LockKey, "0", ApiResponse["IP"].(string)) { if !common.GetFlag(LockRestartKey) { common.SetFlag(LockRestartKey) common.DaemonReload() common.ServiceRestart(HttpdFlagName) } } if common.GetFlag(LockRestartKey) { if WaitHealthcheck(httpd, ApiResponse) { common.SharedUnlock(LockKey) common.DelFlag(LockRestartKey) common.DelFlag(HttpdFlagName) } } } } Visually, it works like this: This is just one example of such a case, but there can be many more. For instance, as I mentioned earlier, bootstrapping Consul ACLs must also be performed on the local node. Parsers only process JSON and perform actions to bring the configuration into compliance. Author Contributions Primary author: Kashintsev Georgii Developed the concept, outlined the structure, and authored diagrams as well as the majority of the content.Co-author: Alexander Agrytskov wrote key sections in Evolution, Unsatisfied Expectations, and Incidents. Also contributed to editing and review across all other sections to improve clarity and technical consistency. Reviewed the final draft.

By Georgii Kashintsev

Testing Approaches for Java Enterprise Applications With Jakarta NoSQL and Jakarta Data

When discussing software development and its history, we often hear quotes emphasizing the importance of testing; however, in practice, we usually prioritize it as the last step, perhaps ahead of documentation. Without proper testing, ensuring the quality of your software is nearly impossible. Tests work as a safety certification, catching issues early and ensuring that your software behaves as expected. Despite their clear advantages — improved code quality, easier refactoring, and better adaptability for future changes — tests are often neglected. The reality is that testing is a long-term investment, and many software engineers and tech leaders tend to underestimate its importance. This article aims to highlight why testing should be an integral part of your development workflow, particularly when working with Jakarta EE. What and Why? The test works as both a code guarantee and as documentation of the behavior. There are many types of tests in software development, each serving a unique purpose, including unit tests, integration tests, and system tests. Those are vital in ensuring that your application delivers the expected behavior under all conditions, while also making it easier to refactor and adapt your code. As systems grow, the complexity increases, and refactoring becomes necessary. Without a robust testing strategy, the risk of introducing bugs while refactoring is high. When discussing metrics on tests, test coverage is often viewed as the first line of defense in this process. It refers to the percentage of code that is executed while running your tests. However, it’s important to note that good test coverage does not necessarily equate to strong tests. Be careful with this metric alone: a test suite with 100% does not guarantee that you are testing the business properly. Therefore, while test coverage is useful in ensuring that a significant portion of the code is exercised, it doesn’t guarantee that your tests are thoroughly validating all aspects of your application. This is where mutation testing comes in; it involves small changes, or "mutations," in the code to check if your tests can detect the changes. If the tests fail when the code is mutated, it means they are effectively validating the functionality. Mutation testing helps identify weaknesses in the tests themselves, ensuring that they not only run the code but also verify its correctness under various scenarios. AspectTest CoverageMutation TestingDefinitionMeasures the percentage of code executed during tests.Introduces small changes (mutations) to the code to check if tests detect these changes.FocusFocuses on how much of the code is being tested.Focuses on the quality of tests by ensuring they catch faults in the code.GoalTo ensure that a significant portion of the code is exercised during testing.To ensure that the tests are robust and capable of detecting errors.Key MetricPercentage of code covered by tests (e.g., 80% coverage).Mutation score: the percentage of mutants (code changes) that are detected by tests.StrengthsProvides a quick indication of testing comprehensiveness.Identifies weaknesses in the tests themselves by ensuring they catch introduced faults.LimitationsHigh coverage doesn’t guarantee effective validation (tests may miss edge cases or logical errors).May require additional resources to run due to the nature of mutation creation and testing.Complementary RoleEnsures that tests run over all parts of the code.Ensures that the tests actually validate the correctness of the code.Best Use CaseFor measuring how much of the code is tested and ensuring minimal gaps.For verifying the effectiveness of your tests and ensuring they are strong enough to catch faults. Combining high test coverage with mutation testing gives you a stronger guarantee of quality. While test coverage ensures that a significant portion of your code is being tested, mutation testing ensures that your tests are strong and actually validate the functionality correctly. By combining both, you can significantly improve the reliability and robustness of your tests, providing a solid foundation for maintaining high-quality software. For Jakarta EE developers, testing becomes even more important as the platform involves multiple layers, ranging from. Tools and Best Practices When we talk about the Jakarta EE platform, we have several test stacks where we can work on: Arquillian: One of the most well-known testing frameworks in the Jakarta EE ecosystem, Arquillian works on integration testing by managing the lifecycle of a container for you. But, while Arquillian is powerful, I’m not a huge fan due to its complexity and the overhead it introduces to write a single test case. It’s often seen as overkill for smaller projects.TestContainers: TestContainers, on the other hand, provides a simpler and more flexible approach to testing by allowing you to run real containers for integration testing. We can check this even in several open-source projects, such as Eclipse JNoSQL, which uses TestContainers to check the integration with NoSQL databases. This solution is fantastic for ensuring that your code interacts correctly with external dependencies, such as databases or messaging queues, without the need to manually configure those services on your local machine.JUnit Jupiter and AssertJ: JUnit Jupiter is the foundation for writing unit tests with JUnit 5, while AssertJ provides powerful assertions that make tests more readable and expressive. These two libraries are a great combination when writing unit and integration tests in Jakarta EE.WeldTest: As for my personal favorite, WeldTest is a powerful testing solution that makes it easy to test Jakarta EE CDI beans in isolation. You can define it using only annotations; furthermore, you can define the classes to be detected by the container, making it lighter than the CDI that goes on production. Furthermore, you can isolate the injection and select the extensions that will work for each test. Additionally, platforms like Quarkus have streamlined testing. With just a few annotations, you can run tests in a native container environment, offering a simpler approach for Jakarta EE developers. Test-Driven Development (TDD): A Deeper Dive Test-driven development (TDD) is a methodology where tests are written before the code itself. The TDD is defined in three steps: Write a failing testMake the test passRefactor TDD offers a clear path to building reliable, maintainable software. It forces developers to think about the behavior of the system before implementation, which leads to more thoughtful and well-designed code. Additionally, since you write tests first, you can easily spot edge cases and requirements before they become problems. An important point of TDD is that the start is hard: it can slow down development, especially in the beginning. The need to write tests for every feature may feel burdensome, and the process can be more time-consuming compared to just jumping straight into coding. Furthermore, while TDD encourages clean code, it can lead to overly simplistic tests or, in some cases, the risk of over-testing. Still, the long-term benefits — such as improved code quality and easier refactoring—make it a worthwhile investment for many teams. Data-Driven Testing (DDT): A Practical Approach Data-driven testing (DDT) focuses on testing the same piece of functionality with a variety of input data. DDT allows you to run the same test multiple times with different data, ensuring that your code works under different conditions. This approach is particularly useful when dealing with applications that need to support a wide range of inputs, such as forms or APIs. For instance, a simple test case might verify the validation of user input. By using DDT, you can execute the same validation logic with different sets of data, ensuring comprehensive test coverage. ApproachTest Driven DevelopmentData Driven TestingFocusCode structure and designInput/output validationTest DesignWrite tests firstTest with multiple datasetsSpeedSlower development due to upfront test writingCan be faster for multiple data sets but might result in more complex test casesFlexibilityHighly structuredFlexible for various inputsCoverageGood for logic-heavy codeGood for input-heavy scenarios Both TDD and DDT have their place in modern software development, depending on the nature of your project. For highly structured code with clear design goals, TDD is ideal. For projects with complex input scenarios, DDT might be more appropriate. Each methodology offers distinct advantages, and when used together, they can provide comprehensive test coverage. Show Me the Code In this section, we will walk through a practical example of integrating Jakarta Data and Jakarta NoSQL with an Oracle NoSQL database, using Eclipse JNoSQL. We’ll build a basic Hotel Management System to manage Room entities and their associated behaviors. The focus will be on setting up the Room entity, defining the repository, and preparing the test environment to verify functionality. We begin by defining a Room entity that represents a room in a hotel. This entity will include various attributes like the room number, type, status, cleanliness, and whether smoking is allowed or not. Java @Entity public class Room { @Id private String id; @Column private int number; @Column private RoomType type; @Column private RoomStatus status; @Column private CleanStatus cleanStatus; @Column private boolean smokingAllowed; @Column private boolean underMaintenance; } RoomType, RoomStatus, and CleanStatus are enumerations that define the type, status, and cleanliness of the room, respectively.The @Id annotation marks the unique identifier for the room entity.The @Column annotations define the database columns that correspond to the attributes. Next, we define a RoomRepository interface, which will handle the data access logic for the Room entity. We use Jakarta Data annotations like @Query and @Save to define specific queries and persistence methods. Java @Query("WHERE type <> 'VIP_SUITE' AND status = 'AVAILABLE' AND cleanStatus = 'CLEAN'") List<Room> findAvailableStandardRooms(); @Query("WHERE cleanStatus <> 'CLEAN' AND status <> 'OUT_OF_SERVICE'") List<Room> findRoomsNeedingCleaning(); @Query("WHERE smokingAllowed = true AND status = 'AVAILABLE'") List<Room> findAvailableSmokingRooms(); @Save void save(List<Room> rooms); @Save Room newRoom(Room room); void deleteBy(); @Query("WHERE type = :type") List<Room> findByType(@Param("type") String type); } To facilitate testing in a Dockerized environment, we define a ManagerSupplier class that provides a DatabaseManager. This ensures that we connect to the Oracle NoSQL instance running in a Docker container during tests. Java @ApplicationScoped @Alternative @Priority(Interceptor.Priority.APPLICATION) public class ManagerSupplier implements Supplier<DatabaseManager> { @Produces @Database(DatabaseType.DOCUMENT) @Default public DatabaseManager get() { return DatabaseContainer.INSTANCE.get("hotel"); } } The ManagerSupplier produces a DatabaseManager that connects to the Oracle NoSQL database, allowing us to run tests on the Room entity and repository. The DatabaseContainer class is responsible for managing the lifecycle of the Oracle NoSQL container. We use TestContainers to spin up the container for our tests, ensuring that the database is available for interaction during the test phase. Java public enum DatabaseContainer { INSTANCE; private final GenericContainer<?> container = new GenericContainer<> (DockerImageName.parse("ghcr.io/oracle/nosql:latest-ce")) .withExposedPorts(8080); { container.start(); } public DatabaseManager get(String database) { DatabaseManagerFactory factory = managerFactory(); return factory.apply(database); } public DatabaseManagerFactory managerFactory() { var configuration = DatabaseConfiguration.getConfiguration(); Settings settings = Settings.builder() .put(OracleNoSQLConfigurations.HOST, host()) .build(); return configuration.apply(settings); } public String host() { return "http://" + container.getHost() + ":" + container.getFirstMappedPort(); } } The container is configured to run the Oracle NoSQL Docker image, exposing port 8080.DatabaseContainer.INSTANCE.get("hotel") retrieves the DatabaseManager configured for the hotel database, ensuring the connection string points to the Docker container. The test setup involves Weld for dependency injection (DI), JUnit 5 for testing, and SoftAssertions from AssertJ for flexible test validation. Let's review the main components. The first one is the Weld Test that enables CDI on a test scenario, where we can check the basic annotations: @EnableAutoWeld: This annotation initializes Weld's DI container for the test class, allowing us to inject dependencies into the test class automatically.@AddPackages: This annotation is used to specify the packages that contain the beans and other components (such as Room, RoomRepository, ManagerSupplier) that need to be injected into the test class.@AddExtensions: The @AddExtensions annotation includes extensions required for Weld to support EntityConverter and DocumentTemplate. This setup ensures that the test class is managed by the Weld container, meaning the dependencies (like RoomRepository) are injected properly without needing manual initialization. Here is how it is used: Java @EnableAutoWeld @AddPackages(value = {Database.class, EntityConverter.class, DocumentTemplate.class}) @AddPackages(Room.class) @AddPackages(ManagerSupplier.class) @AddPackages(Reflections.class) @AddPackages(Converters.class) @AddExtensions({ReflectionEntityMetadataExtension.class, DocumentExtension.class}) class AppTest { } The annotations specify which classes and packages Weld should manage and load during the test lifecycle, ensuring the correct beans are injected. This allows you to run the test in an environment that mimics a real Jakarta EE container. Java @EnableAutoWeld @AddPackages(value = {Database.class, EntityConverter.class, DocumentTemplate.class}) @AddPackages(Room.class) @AddPackages(ManagerSupplier.class) @AddPackages(Reflections.class) @AddPackages(Converters.class) @AddExtensions({ReflectionEntityMetadataExtension.class, DocumentExtension.class}) class AppTest { @Inject private DocumentTemplate template; @Test void shouldTest() { Room room = new RoomBuilder() .id("room-1") .roomNumber(101) .type(RoomType.SUITE) .status(RoomStatus.AVAILABLE) .cleanStatus(CleanStatus.CLEAN) .smokingAllowed(false) .underMaintenance(false) .build(); Room insert = template.insert(room); SoftAssertions.assertSoftly(softly -> { softly.assertThat(room.getId()).isEqualTo(insert.getId()); softly.assertThat(room.getNumber()).isEqualTo(insert.getNumber()); softly.assertThat(room.getType()).isEqualTo(insert.getType()); softly.assertThat(room.getStatus()).isEqualTo(insert.getStatus()); softly.assertThat(room.getCleanStatus()).isEqualTo(insert.getCleanStatus()); softly.assertThat(room.isSmokingAllowed()).isEqualTo(insert.isSmokingAllowed()); softly.assertThat(room.isUnderMaintenance()).isEqualTo(insert.isUnderMaintenance()); softly.assertThat(insert.getId()).isNotNull(); }); } } @EnableAutoWeld initializes Weld for dependency injection in the test.@Inject ensures that the DocumentTemplate is injected, which provides methods for interacting with the database.The shouldTest() method tests inserting a Room object into the database and verifies that the entity was correctly persisted and retrieved using SoftAssertions. The RoomServiceTest class is a more comprehensive test suite that focuses on different room query scenarios using RoomRepository. Java @EnableAutoWeld @AddPackages(value = {Database.class, EntityConverter.class, DocumentTemplate.class}) @AddPackages(Room.class) @AddPackages(ManagerSupplier.class) @AddPackages(Reflections.class) @AddPackages(Converters.class) @AddExtensions({ReflectionEntityMetadataExtension.class, DocumentExtension.class}) class RoomServiceTest { @Inject private RoomRepository repository; private static final Faker FAKER = new Faker(); @BeforeEach void setUP() { // Populate database with various Room entities } @AfterEach void cleanUp() { repository.deleteBy(); // Ensures the database is reset after each test } @ParameterizedTest(name = "should find rooms by type {0}") @EnumSource(RoomType.class) void shouldFindRoomByType(RoomType type) { List<Room> rooms = this.repository.findByType(type.name()); SoftAssertions.assertSoftly(softly -> softly.assertThat(rooms).allMatch(room -> room.getType().equals(type))); } @ParameterizedTest @MethodSource("room") void shouldSaveRoom(Room room) { Room updateRoom = this.repository.newRoom(room); SoftAssertions.assertSoftly(softly -> { softly.assertThat(updateRoom).isNotNull(); softly.assertThat(updateRoom.getId()).isNotNull(); softly.assertThat(updateRoom.getNumber()).isEqualTo(room.getNumber()); softly.assertThat(updateRoom.getType()).isEqualTo(room.getType()); softly.assertThat(updateRoom.getStatus()).isEqualTo(room.getStatus()); softly.assertThat(updateRoom.getCleanStatus()).isEqualTo(room.getCleanStatus()); softly.assertThat(updateRoom.isSmokingAllowed()).isEqualTo(room.isSmokingAllowed()); }); } @Test void shouldFindRoomReadyToGuest() { List<Room> rooms = this.repository.findAvailableStandardRooms(); SoftAssertions.assertSoftly(softly -> { softly.assertThat(rooms).hasSize(3); softly.assertThat(rooms).allMatch(room -> room.getStatus().equals(RoomStatus.AVAILABLE)); softly.assertThat(rooms).allMatch(room -> !room.isUnderMaintenance()); }); } static Stream<Arguments> room() { return Stream.of(Arguments.of(getRoom(), Arguments.of(getRoom(), Arguments.of(getRoom())))); } private static Room getRoom() { return new RoomBuilder() .id(UUID.randomUUID().toString()) .roomNumber(FAKER.number().numberBetween(100, 999)) .type(randomEnum(RoomType.class)) .status(randomEnum(RoomStatus.class)) .cleanStatus(randomEnum(CleanStatus.class)) .smokingAllowed(FAKER.bool().bool()) .build(); } private static <T extends Enum<?>> T randomEnum(Class<T> enumClass) { T[] constants = enumClass.getEnumConstants(); int index = ThreadLocalRandom.current().nextInt(constants.length); return constants[index]; } } RoomServiceTest includes tests for various scenarios like saving rooms, finding rooms by type, and checking room availability and cleanliness.@ParameterizedTest with @EnumSource and @MethodSource helps run the same tests for multiple inputs, ensuring better test coverage and validation.SoftAssertions is used to validate the entity’s properties flexibly, allowing all assertions to run even if one fails. Conclusion The test classes you've set up make use of modern testing techniques such as Weld for dependency injection, JUnit 5 Parameterized Tests for running tests with various inputs, and SoftAssertions for comprehensive error reporting. The combination of these tools, along with data-driven testing, ensures your application logic is well-tested and resilient to a variety of conditions. By using Weld for DI, you are mimicking a real Jakarta EE environment in your tests, making sure that your RoomRepository and RoomService behave correctly in a containerized environment. Furthermore, leveraging data-driven tests with ParameterizedTest and MethodSource enables comprehensive test coverage, while SoftAssertions provides insightful feedback without prematurely terminating the tests.

By Otavio Santana

CORE