Class CassandraDataLayer

    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected CassandraDataLayer​(java.lang.String keyspace, java.lang.String table, boolean quoteIdentifiers, java.lang.String snapshotName, java.lang.String datacenter, org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig, org.apache.cassandra.secrets.SslConfig sslConfig, org.apache.cassandra.spark.data.CqlTable cqlTable, org.apache.cassandra.spark.data.partitioner.TokenPartitioner tokenPartitioner, org.apache.cassandra.bridge.CassandraVersion version, org.apache.cassandra.spark.data.partitioner.ConsistencyLevel consistencyLevel, java.lang.String sidecarInstances, int sidecarPort, java.util.Map<java.lang.String,​PartitionedDataLayer.AvailabilityHint> availabilityHints, java.util.Map<java.lang.String,​org.apache.cassandra.bridge.BigNumberConfigImpl> bigNumberConfigMap, boolean enableStats, boolean readIndexOffset, boolean useIncrementalRepair, java.lang.String lastModifiedTimestampField, java.util.List<org.apache.cassandra.spark.config.SchemaFeature> requestedFeatures, java.util.Map<java.lang.String,​org.apache.cassandra.spark.data.ReplicationFactor> rfMap, org.apache.cassandra.spark.utils.TimeProvider timeProvider, org.apache.cassandra.spark.sparksql.filters.SSTableTimeRangeFilter sstableTimeRangeFilter)  
        CassandraDataLayer​(ClientConfig options, org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig, org.apache.cassandra.secrets.SslConfig sslConfig)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void await​(java.util.concurrent.CountDownLatch latch)  
      org.apache.cassandra.bridge.BigNumberConfig bigNumberConfig​(org.apache.cassandra.spark.data.CqlField field)
      DataLayer can override this method to return the BigInteger/BigDecimal precision/scale values for a given column
      java.util.Map<java.lang.String,​org.apache.cassandra.bridge.BigNumberConfigImpl> bigNumberConfigMap()  
      org.apache.cassandra.bridge.CassandraBridge bridge()  
      protected void clearSnapshot​(java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig, ClientConfig options)  
      org.apache.cassandra.spark.data.CqlTable cqlTable()  
      org.apache.cassandra.spark.data.partitioner.CassandraRing createCassandraRingFromRing​(org.apache.cassandra.spark.data.partitioner.Partitioner partitioner, org.apache.cassandra.spark.data.ReplicationFactor replicationFactor, o.a.c.sidecar.client.shaded.common.response.RingResponse ring)  
      protected void dialHome​(ClientConfig options)  
      boolean equals​(java.lang.Object other)  
      protected java.util.concurrent.ExecutorService executorService()
      DataLayer implementation should provide an ExecutorService for doing blocking I/O when opening SSTable readers.
      protected PartitionedDataLayer.AvailabilityHint getAvailability​(org.apache.cassandra.spark.data.partitioner.CassandraInstance instance)
      Data Layer can override this method to hint availability of a Cassandra instance so Bulk Reader attempts UP instances first, and avoids instances known to be down e.g.
      protected java.lang.String getEffectiveCassandraVersionForRead​(java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig, o.a.c.sidecar.client.shaded.common.response.NodeSettings nodeSettings)  
      protected org.apache.cassandra.spark.data.Sizing getSizing​(java.util.concurrent.CompletableFuture<o.a.c.sidecar.client.shaded.common.response.RingResponse> ringFuture, org.apache.cassandra.spark.data.ReplicationFactor replicationFactor, ClientConfig options)
      Returns the Sizing object based on the sizing option provided by the user, or DefaultSizing as the default sizing
      int hashCode()  
      void initialize​(ClientConfig options)  
      protected java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> initializeClusterConfig​(ClientConfig options)  
      protected void initInstanceMap()  
      protected void initSidecarClient()  
      protected boolean isExhausted​(java.lang.Throwable throwable)  
      java.lang.String jobId()  
      java.util.concurrent.CompletableFuture<java.util.stream.Stream<org.apache.cassandra.spark.data.SSTable>> listInstance​(int partitionId, com.google.common.collect.Range<java.math.BigInteger> range, org.apache.cassandra.spark.data.partitioner.CassandraInstance instance)  
      boolean readIndexOffset()
      When true the SSTableReader should attempt to find the offset into the Data.db file for the Spark worker's token range.
      org.apache.cassandra.spark.data.ReplicationFactor replicationFactor​(java.lang.String keyspace)  
      java.util.List<org.apache.cassandra.spark.config.SchemaFeature> requestedFeatures()  
      org.apache.cassandra.spark.data.partitioner.CassandraRing ring()  
      protected void shutdownHook​(ClientConfig options)  
      org.apache.cassandra.spark.sparksql.filters.SSTableTimeRangeFilter sstableTimeRangeFilter()
      Returns SSTableTimeRangeFilter to filter out SSTables based on min and max timestamp.
      void startupValidate()
      Performs startup validation using StartupValidator with currently registered StartupValidations, throws a RuntimeException if any violations are found, needs to be invoked once per execution before any actual work is started
      org.apache.cassandra.analytics.stats.Stats stats()
      Override to plug in your own Stats instrumentation for recording internal events
      org.apache.cassandra.spark.utils.TimeProvider timeProvider()  
      org.apache.cassandra.spark.data.partitioner.TokenPartitioner tokenPartitioner()  
      boolean useIncrementalRepair()
      When true the SSTableReader should only read repaired SSTables from a single 'primary repair' replica and read unrepaired SSTables at the user set consistency level
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOGGER

        public static final org.slf4j.Logger LOGGER
      • snapshotName

        protected java.lang.String snapshotName
      • quoteIdentifiers

        protected boolean quoteIdentifiers
      • keyspace

        protected java.lang.String keyspace
      • table

        protected java.lang.String table
      • maybeQuotedKeyspace

        protected java.lang.String maybeQuotedKeyspace
      • maybeQuotedTable

        protected java.lang.String maybeQuotedTable
      • bridge

        protected org.apache.cassandra.bridge.CassandraBridge bridge
      • sidecarInstances

        protected java.lang.String sidecarInstances
      • sidecarPort

        protected int sidecarPort
      • clusterConfig

        protected transient java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig
      • tokenPartitioner

        protected org.apache.cassandra.spark.data.partitioner.TokenPartitioner tokenPartitioner
      • sidecarClientConfig

        protected org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig
      • bigNumberConfigMap

        protected java.util.Map<java.lang.String,​org.apache.cassandra.bridge.BigNumberConfigImpl> bigNumberConfigMap
      • enableStats

        protected boolean enableStats
      • readIndexOffset

        protected boolean readIndexOffset
      • useIncrementalRepair

        protected boolean useIncrementalRepair
      • requestedFeatures

        protected java.util.List<org.apache.cassandra.spark.config.SchemaFeature> requestedFeatures
      • rfMap

        protected java.util.Map<java.lang.String,​org.apache.cassandra.spark.data.ReplicationFactor> rfMap
      • lastModifiedTimestampField

        @Nullable
        protected java.lang.String lastModifiedTimestampField
      • cqlTable

        protected volatile org.apache.cassandra.spark.data.CqlTable cqlTable
      • timeProvider

        protected transient org.apache.cassandra.spark.utils.TimeProvider timeProvider
      • sidecar

        protected transient o.a.c.sidecar.client.shaded.client.SidecarClient sidecar
    • Constructor Detail

      • CassandraDataLayer

        public CassandraDataLayer​(@NotNull
                                  ClientConfig options,
                                  @NotNull
                                  org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig,
                                  @Nullable
                                  org.apache.cassandra.secrets.SslConfig sslConfig)
      • CassandraDataLayer

        protected CassandraDataLayer​(@Nullable
                                     java.lang.String keyspace,
                                     @Nullable
                                     java.lang.String table,
                                     boolean quoteIdentifiers,
                                     @NotNull
                                     java.lang.String snapshotName,
                                     @Nullable
                                     java.lang.String datacenter,
                                     @NotNull
                                     org.apache.cassandra.clients.Sidecar.ClientConfig sidecarClientConfig,
                                     @Nullable
                                     org.apache.cassandra.secrets.SslConfig sslConfig,
                                     @NotNull
                                     org.apache.cassandra.spark.data.CqlTable cqlTable,
                                     @NotNull
                                     org.apache.cassandra.spark.data.partitioner.TokenPartitioner tokenPartitioner,
                                     @NotNull
                                     org.apache.cassandra.bridge.CassandraVersion version,
                                     @NotNull
                                     org.apache.cassandra.spark.data.partitioner.ConsistencyLevel consistencyLevel,
                                     @NotNull
                                     java.lang.String sidecarInstances,
                                     @NotNull
                                     int sidecarPort,
                                     @NotNull
                                     java.util.Map<java.lang.String,​PartitionedDataLayer.AvailabilityHint> availabilityHints,
                                     @NotNull
                                     java.util.Map<java.lang.String,​org.apache.cassandra.bridge.BigNumberConfigImpl> bigNumberConfigMap,
                                     boolean enableStats,
                                     boolean readIndexOffset,
                                     boolean useIncrementalRepair,
                                     @Nullable
                                     java.lang.String lastModifiedTimestampField,
                                     java.util.List<org.apache.cassandra.spark.config.SchemaFeature> requestedFeatures,
                                     @NotNull
                                     java.util.Map<java.lang.String,​org.apache.cassandra.spark.data.ReplicationFactor> rfMap,
                                     org.apache.cassandra.spark.utils.TimeProvider timeProvider,
                                     org.apache.cassandra.spark.sparksql.filters.SSTableTimeRangeFilter sstableTimeRangeFilter)
    • Method Detail

      • initialize

        public void initialize​(@NotNull
                               ClientConfig options)
      • shutdownHook

        protected void shutdownHook​(ClientConfig options)
      • isExhausted

        protected boolean isExhausted​(@Nullable
                                      java.lang.Throwable throwable)
      • timeProvider

        public org.apache.cassandra.spark.utils.TimeProvider timeProvider()
        Specified by:
        timeProvider in class DataLayer
        Returns:
        a TimeProvider
      • useIncrementalRepair

        public boolean useIncrementalRepair()
        Description copied from class: DataLayer
        When true the SSTableReader should only read repaired SSTables from a single 'primary repair' replica and read unrepaired SSTables at the user set consistency level
        Overrides:
        useIncrementalRepair in class DataLayer
        Returns:
        true if the SSTableReader should only read repaired SSTables on single 'repair primary' replica
      • readIndexOffset

        public boolean readIndexOffset()
        Description copied from class: DataLayer
        When true the SSTableReader should attempt to find the offset into the Data.db file for the Spark worker's token range. This works by first binary searching the Summary.db file to find offset into Index.db file, then reading the Index.db from the Summary.db offset to find the first offset in the Data.db file that overlaps with the Spark worker's token range. This enables the reader to start reading from the first in-range partition in the Data.db file, and close after reading the last partition. This feature improves scalability as more Spark workers shard the token range into smaller subranges. This avoids wastefully reading the Data.db file for out-of-range partitions.
        Overrides:
        readIndexOffset in class DataLayer
        Returns:
        true if, the SSTableReader should attempt to read Summary.db and Index.db files to find the start index offset into the Data.db file that overlaps with the Spark workers token range
      • initInstanceMap

        protected void initInstanceMap()
      • initSidecarClient

        protected void initSidecarClient()
      • bridge

        public org.apache.cassandra.bridge.CassandraBridge bridge()
        Specified by:
        bridge in class DataLayer
        Returns:
        version-specific CassandraBridge wrapping shaded packages
      • stats

        public org.apache.cassandra.analytics.stats.Stats stats()
        Description copied from class: DataLayer
        Override to plug in your own Stats instrumentation for recording internal events
        Overrides:
        stats in class DataLayer
        Returns:
        Stats implementation to record internal events
      • requestedFeatures

        public java.util.List<org.apache.cassandra.spark.config.SchemaFeature> requestedFeatures()
        Overrides:
        requestedFeatures in class DataLayer
      • ring

        public org.apache.cassandra.spark.data.partitioner.CassandraRing ring()
        Specified by:
        ring in class PartitionedDataLayer
      • executorService

        protected java.util.concurrent.ExecutorService executorService()
        Description copied from class: DataLayer
        DataLayer implementation should provide an ExecutorService for doing blocking I/O when opening SSTable readers. It is the responsibility of the DataLayer implementation to appropriately size and manage this ExecutorService.
        Specified by:
        executorService in class DataLayer
        Returns:
        executor service
      • jobId

        public java.lang.String jobId()
        Specified by:
        jobId in class DataLayer
        Returns:
        a string that uniquely identifies this Spark job
      • sstableTimeRangeFilter

        @NotNull
        public org.apache.cassandra.spark.sparksql.filters.SSTableTimeRangeFilter sstableTimeRangeFilter()
        Description copied from class: DataLayer
        Returns SSTableTimeRangeFilter to filter out SSTables based on min and max timestamp.
        Overrides:
        sstableTimeRangeFilter in class DataLayer
        Returns:
        SSTableTimeRangeFilter
      • cqlTable

        public org.apache.cassandra.spark.data.CqlTable cqlTable()
        Specified by:
        cqlTable in class DataLayer
        Returns:
        CqlTable object for table being read, batch/bulk read jobs only
      • replicationFactor

        public org.apache.cassandra.spark.data.ReplicationFactor replicationFactor​(java.lang.String keyspace)
        Specified by:
        replicationFactor in class PartitionedDataLayer
      • getAvailability

        protected PartitionedDataLayer.AvailabilityHint getAvailability​(org.apache.cassandra.spark.data.partitioner.CassandraInstance instance)
        Description copied from class: PartitionedDataLayer
        Data Layer can override this method to hint availability of a Cassandra instance so Bulk Reader attempts UP instances first, and avoids instances known to be down e.g. if create snapshot request already failed
        Overrides:
        getAvailability in class PartitionedDataLayer
        Parameters:
        instance - a cassandra instance
        Returns:
        availability hint
      • listInstance

        public java.util.concurrent.CompletableFuture<java.util.stream.Stream<org.apache.cassandra.spark.data.SSTable>> listInstance​(int partitionId,
                                                                                                                                     @NotNull
                                                                                                                                     com.google.common.collect.Range<java.math.BigInteger> range,
                                                                                                                                     @NotNull
                                                                                                                                     org.apache.cassandra.spark.data.partitioner.CassandraInstance instance)
        Specified by:
        listInstance in class PartitionedDataLayer
      • bigNumberConfigMap

        public java.util.Map<java.lang.String,​org.apache.cassandra.bridge.BigNumberConfigImpl> bigNumberConfigMap()
      • bigNumberConfig

        public org.apache.cassandra.bridge.BigNumberConfig bigNumberConfig​(org.apache.cassandra.spark.data.CqlField field)
        Description copied from class: DataLayer
        DataLayer can override this method to return the BigInteger/BigDecimal precision/scale values for a given column
        Overrides:
        bigNumberConfig in class DataLayer
        Parameters:
        field - the CQL field
        Returns:
        a BigNumberConfig object that specifies the desired precision/scale for BigDecimal and BigInteger
      • createCassandraRingFromRing

        public org.apache.cassandra.spark.data.partitioner.CassandraRing createCassandraRingFromRing​(org.apache.cassandra.spark.data.partitioner.Partitioner partitioner,
                                                                                                     org.apache.cassandra.spark.data.ReplicationFactor replicationFactor,
                                                                                                     o.a.c.sidecar.client.shaded.common.response.RingResponse ring)
      • startupValidate

        public void startupValidate()
        Description copied from interface: StartupValidatable
        Performs startup validation using StartupValidator with currently registered StartupValidations, throws a RuntimeException if any violations are found, needs to be invoked once per execution before any actual work is started
        Specified by:
        startupValidate in interface StartupValidatable
      • initializeClusterConfig

        protected java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> initializeClusterConfig​(ClientConfig options)
      • getEffectiveCassandraVersionForRead

        protected java.lang.String getEffectiveCassandraVersionForRead​(java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig,
                                                                       o.a.c.sidecar.client.shaded.common.response.NodeSettings nodeSettings)
      • dialHome

        protected void dialHome​(@NotNull
                                ClientConfig options)
      • clearSnapshot

        protected void clearSnapshot​(java.util.Set<o.a.c.sidecar.client.shaded.client.SidecarInstance> clusterConfig,
                                     @NotNull
                                     ClientConfig options)
      • getSizing

        protected org.apache.cassandra.spark.data.Sizing getSizing​(java.util.concurrent.CompletableFuture<o.a.c.sidecar.client.shaded.common.response.RingResponse> ringFuture,
                                                                   org.apache.cassandra.spark.data.ReplicationFactor replicationFactor,
                                                                   ClientConfig options)
        Returns the Sizing object based on the sizing option provided by the user, or DefaultSizing as the default sizing
        Parameters:
        ringFuture - a future with a view of the ring
        replicationFactor - the replication factor
        options - the ClientConfig options
        Returns:
        the Sizing object based on the sizing option provided by the user
      • await

        protected void await​(java.util.concurrent.CountDownLatch latch)