torch_geometric.data¶
-
class
Data
(x=None, edge_index=None, edge_attr=None, y=None, pos=None, norm=None, face=None, **kwargs)[source]¶ A plain old python object modeling a single graph with various (optional) attributes:
- Parameters
x (Tensor, optional) – Node feature matrix with shape
[num_nodes, num_node_features]
. (default:None
)edge_index (LongTensor, optional) – Graph connectivity in COO format with shape
[2, num_edges]
. (default:None
)edge_attr (Tensor, optional) – Edge feature matrix with shape
[num_edges, num_edge_features]
. (default:None
)y (Tensor, optional) – Graph or node targets with arbitrary shape. (default:
None
)pos (Tensor, optional) – Node position matrix with shape
[num_nodes, num_dimensions]
. (default:None
)norm (Tensor, optional) – Normal vector matrix with shape
[num_nodes, num_dimensions]
. (default:None
)face (LongTensor, optional) – Face adjacency matrix with shape
[3, num_faces]
. (default:None
)
The data object is not restricted to these attributes and can be extented by any other additional data.
Example:
data = Data(x=x, edge_index=edge_index) data.train_idx = torch.tensor([...], dtype=torch.long) data.test_mask = torch.tensor([...], dtype=torch.bool)
-
__call__
(*keys)[source]¶ Iterates over all attributes
*keys
in the data, yielding their attribute names and content. If*keys
is not given this method will iterative over all present attributes.
-
__cat_dim__
(key, value)[source]¶ Returns the dimension for which
value
of attributekey
will get concatenated when creating batches.Note
This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.
-
__inc__
(key, value)[source]¶ “Returns the incremental count to cumulatively increase the value of the next attribute of
key
when creating batches.Note
This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.
-
__iter__
()[source]¶ Iterates over all present attributes in the data, yielding their attribute names and content.
-
apply
(func, *keys)[source]¶ Applies the function
func
to all tensor attributes*keys
. If*keys
is not given,func
is applied to all present attributes.
-
contiguous
(*keys)[source]¶ Ensures a contiguous memory layout for all attributes
*keys
. If*keys
is not given, all present attributes are ensured to have a contiguous memory layout.
-
is_coalesced
()[source]¶ Returns
True
, if edge indices are ordered and do not contain duplicate entries.
-
property
keys
¶ Returns all names of graph attributes.
-
property
num_edge_features
¶ Returns the number of features per edge in the graph.
-
property
num_edges
¶ Returns the number of edges in the graph.
-
property
num_faces
¶ Returns the number of faces in the mesh.
-
property
num_features
¶ Alias for
num_node_features
.
-
property
num_node_features
¶ Returns the number of features per node in the graph.
-
property
num_nodes
¶ Returns or sets the number of nodes in the graph.
Note
The number of nodes in your data object is typically automatically inferred, e.g., when node features
x
are present. In some cases however, a graph may only be given by its edge indicesedge_index
. PyTorch Geometric then guesses the number of nodes according toedge_index.max().item() + 1
, but in case there exists isolated nodes, this number has not to be correct and can therefore result in unexpected batch-wise behavior. Thus, we recommend to set the number of nodes in your data object explicitly viadata.num_nodes = ...
. You will be given a warning that requests you to do so.
-
class
Batch
(batch=None, **kwargs)[source]¶ A plain old python object modeling a batch of graphs as one big (dicconnected) graph. With
torch_geometric.data.Data
being the base class, all its methods can also be used here. In addition, single graphs can be reconstructed via the assignment vectorbatch
, which maps each node to its respective graph identifier.-
static
from_data_list
(data_list, follow_batch=[])[source]¶ Constructs a batch object from a python list holding
torch_geometric.data.Data
objects. The assignment vectorbatch
is created on the fly. Additionally, creates assignment batch vectors for each key infollow_batch
.
-
property
num_graphs
¶ Returns the number of graphs in the batch.
-
to_data_list
()[source]¶ Reconstructs the list of
torch_geometric.data.Data
objects from the batch object. The batch object must have been created viafrom_data_list()
in order to be able reconstruct the initial objects.
-
static
-
class
Dataset
(root=None, transform=None, pre_transform=None, pre_filter=None)[source]¶ Dataset base class for creating graph datasets. See here for the accompanying tutorial.
- Parameters
root (string, optional) – Root directory where the dataset should be saved. (optional:
None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
-
__getitem__
(idx)[source]¶ Gets the data object at index
idx
and transforms it (in case aself.transform
is given). In caseidx
is a slicing object, e.g.,[2:5]
, a list, a tuple, a LongTensor or a BoolTensor, will return a subset of the dataset at the specified indices.
-
property
num_edge_features
¶ Returns the number of features per edge in the dataset.
-
property
num_features
¶ Alias for
num_node_features
.
-
property
num_node_features
¶ Returns the number of features per node in the dataset.
-
property
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
property
processed_paths
¶ The filepaths to find in the
self.processed_dir
folder in order to skip the processing.
-
property
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
property
raw_paths
¶ The filepaths to find in order to skip the download.
-
class
InMemoryDataset
(root=None, transform=None, pre_transform=None, pre_filter=None)[source]¶ Dataset base class for creating graph datasets which fit completely into memory. See here for the accompanying tutorial.
- Parameters
root (string, optional) – Root directory where the dataset should be saved. (default:
None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
-
collate
(data_list)[source]¶ Collates a python list of data objects to the internal storage format of
torch_geometric.data.InMemoryDataset
.
-
property
num_classes
¶ The number of classes in the dataset.
-
property
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.
-
property
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.
-
class
DataLoader
(dataset, batch_size=1, shuffle=False, follow_batch=[], **kwargs)[source]¶ Data loader which merges data objects from a
torch_geometric.data.dataset
to a mini-batch.- Parameters
dataset (Dataset) – The dataset from which to load the data.
batch_size (int, optional) – How many samples per batch to load. (default:
1
)shuffle (bool, optional) – If set to
True
, the data will be reshuffled at every epoch. (default:False
)follow_batch (list or tuple, optional) – Creates assignment batch vectors for each key in the list. (default:
[]
)
-
class
DataListLoader
(dataset, batch_size=1, shuffle=False, **kwargs)[source]¶ Data loader which merges data objects from a
torch_geometric.data.dataset
to a python list.Note
This data loader should be used for multi-gpu support via
torch_geometric.nn.DataParallel
.
-
class
DenseDataLoader
(dataset, batch_size=1, shuffle=False, **kwargs)[source]¶ Data loader which merges data objects from a
torch_geometric.data.dataset
to a mini-batch.Note
To make use of this data loader, all graphs in the dataset needs to have the same shape for each its attributes. Therefore, this data loader should only be used when working with dense adjacency matrices.
-
class
NeighborSampler
(data, size, num_hops, batch_size=1, shuffle=False, drop_last=False, bipartite=True, add_self_loops=False, flow='source_to_target')[source]¶ The neighbor sampler from the “Inductive Representation Learning on Large Graphs” paper which iterates over graph nodes in a mini-batch fashion and constructs sampled subgraphs of size
num_hops
.It returns a generator of
DataFlow
that defines the message passing flow to the root nodes via a list ofnum_hops
bipartite graph objectsedge_index
and the initial start nodesn_id
.- Parameters
data (torch_geometric.data.Data) – The graph data object.
size (int or float or [int] or [float]) – The number of neighbors to sample (for each layer). The value of this parameter can be either set to be the same for each neighborhood or percentage-based.
num_hops (int) – The number of layers to sample.
batch_size (int, optional) – How many samples per batch to load. (default:
1
)shuffle (bool, optional) – If set to
True
, the data will be reshuffled at every epoch. (default:False
)drop_last (bool, optional) – If set to
True
, will drop the last incomplete batch if the number of nodes is not divisible by the batch size. If set toFalse
and the size of graph is not divisible by the batch size, the last batch will be smaller. (default:False
)bipartite (bool, optional) – If set to
False
, will not return a generator ofDataFlow
to mark the computation flow, but instead will return atorch_geometric.data.Data
object holding the subgraph information around each mini-batch. If set toFalse
, theadd_self_loops
option is ignored. (default:True
)add_self_loops (bool, optional) – If set to
True
, will add self-loops to each sampled neigborhood. (default:False
)flow (string, optional) – The flow direction of message passing (
"source_to_target"
or"target_to_source"
). (default:"source_to_target"
)
-
__call__
(subset=None)[source]¶ Returns a generator of
DataFlow
that iterates over the nodes insubset
in a mini-batch fashion.
-
__get_batches__
(subset=None)[source]¶ Returns a list of mini-batches from the initial nodes in
subset
.
-
class
ClusterData
(data, num_parts, recursive=False, save_dir=None)[source]¶ Clusters/partitions a graph data object into multiple subgraphs, as motivated by the “Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper.
- Parameters
data (torch_geometric.data.Data) – The graph data object.
num_parts (int) – The number of partitions.
recursive (bool, optional) – If set to
True
, will use multilevel recursive bisection instead of multilevel k-way partitioning. (default:False
)save_dir (string, optional) – If set, will save the partitioned data to the
save_dir
directory for faster re-use.
-
class
ClusterLoader
(cluster_data, batch_size=1, shuffle=False, **kwargs)[source]¶ The data loader scheme from the “Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper which merges partioned subgraphs and their between-cluster links from a large-scale graph data object to form a mini-batch.
- Parameters
cluster_data (torch_geometric.data.ClusterData) – The already partioned data object.
batch_size (int, optional) – How many samples per batch to load. (default:
1
)shuffle (bool, optional) – If set to
True
, the data will be reshuffled at every epoch. (default:False
)